Running RADcap Analysis¶
|Author:||Jessie Salter and Brant C. Faircloth|
|Copyright:||This documentation is available under a Creative Commons (CC-BY) license.|
The following assumes you are demultiplexing RADcap data prepared with enzymes and the i5-8N tag. Otherwise, if you used standard libraries for RADcap locus enrichment, you can demultiplex those data like usual.
To compile stacks when using LSU HPC, be sure to
module load gcc/6.4.0or enable this modules in your
Get Stacks, configure (w/ home directory install), and install. The commands below need to be modified for your setup because they are set to install everything into
/project/brant/home/, which you don’t have access to.
wget https://catchenlab.life.illinois.edu/stacks/source/stacks-2.54.tar.gz tar -cxzvf stacks-2.54.tar.gz cd stacks-2.54 export CC=`which gcc` export CXX=`which g++` ./configure --prefix=/project/brant/home/ make # if using an entire node, you can `make -j 20` make install
Get BBmap, and install that somewhere. Basically download and place the files somewhere in your
mkdir $HOME/bin wget https://downloads.sourceforge.net/project/bbmap/BBMap_38.87.tar.gz tar -xzvf BBMap_38.87.tar.gz # this will create a folder bbmap which you need to add to your $PATH
If you are in my lab group, these are installed in
- Upload the relevant data to some location on @smic. These should not have been demultiplexed in any way.
- You may want to check to ensure the MD5 checksums of your files uploaded match the MD5 checksums that you expect. Usually, you receive these from the sequencing center.
- If you have multiple files (for some reason), you can combine the files together for READ1 and then combine the files together for READ2.
Your Data Contain Randomly Sheared DNA (“standard” libraries)¶
If your data contain RADcap performed on randomly sheared or “standard” sequencing libraries that are mixed with “regular” RAD-cap libraries, we’ll go ahead and demultiplex the randomly sheared, RADcap data, first. Once that’s done, we will demultiplex the remaining reads containing i5-8N tags.
I am starting with a directory structure that looks like this:
. ├── AEM1_CKDL200166465-1a_HF5GHCCX2_L3_1.fq.gz └── AEM1_CKDL200166465-1a_HF5GHCCX2_L3_2.fq.gz
The procedure for these standard libraries is identical to the one described in Demultiplexing a Sequencing Run, so refer to that document, demultiplex, rename your files, and return here.
When I’m done with this first step, my directory structure looks something like this, although you may have renamed the individual read files following the instructions in Demultiplexing a Sequencing Run:
. ├── AEM1_CKDL200166465-1a_HF5GHCCX2_L3_1.fq.gz ├── AEM1_CKDL200166465-1a_HF5GHCCX2_L3_2.fq.gz └── random-libraries ├── ACCATCCA+ACATTGCG_R1_001.fastq.gz ├── ACCATCCA+ACATTGCG_R2_001.fastq.gz ├── ... ├── demuxbyname.e631878 ├── demuxbyname.o631878 ├── demux.qsub ├── my_barcodes.txt ├── Undetermined_R1_001.fastq.gz └── Undetermined_R2_001.fastq.gz
You now want to jump to Running GATK in Parallel, and follow the procedure for trimming reads, generating a BAM file, removing duplicates, haplotype calling, etc. If you have data containing i7+i5-8N tags, you will merge the samples back together after haplotype calling.
- Once you have generated VCF files for the SNPs you have called, it is a very good idea to filter those VCF files to include only the loci/sites that you enriched with RADcap. You will do this using VCFTools and a BED-formatted file of your RADcap loci and the position of those loci in the genome with which you are working.