Assembly With Supernova

Author:Brant C. Faircloth
Copyright:This documentation is available under a Creative Commons (CC-BY) license.

Modification History

See Assembly With Supernova

Purpose

Supernova is a program for assembling 10X Genomics Linked Read Data.

Preliminary Steps

  1. To install Supernova, see Compiling Supernova

Steps

  1. Prior to running Supernova, it’s a good idea to get an idea of the count of reads that you have for a given sample. You want to be inputting roughly 56-60X coverage, per the 10X instructions. You can compute the counts of reads that you have using:

    for i in clean-reads/*; do echo $i; gunzip -c $i/split-adapter-quality-trimmed/*-READ1.fastq.gz | wc -l | awk '{print $1/4}'; done
    
  2. This will output a count of R1 reads by sample to the console. To get the total counts of reads, multiple by 2. To get a rough estimate of coverage, multiply that by the length of both reads. Divide that number by the size of your genome to get some idea of coverage. We can dial down the number of reads when we run Supernova if we need to. Guidance regarding the number of reads to use with Supernova can be found at this page.

  3. Setup a submission script for QB2 (in our case). Generally speaking, avian-sized genome assemblies are going to need something like 256 GB of RAM, whereas mammal sized genomes may need up to 512. However, Supernova should be run on AT LEAST 16 CPU cores, and we want it to finish in a reasonable amount of time (< 72 hours). So, on QB2, that means we’ll run a job with 24 of the 48 cores available on a QB2 bigmem node. This will net us ~750 GB RAM. Because of the way the program runs, we need to explicitly limit the number of cores and RAM used by the Supernova process. We’ll slightly undershoot the total RAM allocated to the job (limiting it to 745 GB of the 750 GB).

    #!/bin/bash
    #PBS -q bigmem
    #PBS -A <allocation>
    #PBS -l walltime=02:00:00
    #PBS -l nodes=1:ppn=24
    #PBS -V
    #PBS -N supernova_assembly
    #PBS -o supernova_assembly.out
    #PBS -e supernova_assembly.err
    
    
    export PATH=$HOME/bin/supernova-2.1.1:$PATH
    
    cd $PBS_O_WORKDIR
    supernova run \
        --id=<my_assembly_name> \
        --fastqs=/path/to/my/demuxed/fastq/files \
        --maxreads=<maxreads determined based on above> \
        --localcores 24 \
        --localmem 745