Random Computer Snippets

Author:Brant C. Faircloth
Copyright:This documentation is available under a Creative Commons (CC-BY) license.

All of the following assume that you are using the Z shell (zsh). These may or may not work in BASH.

Modification History

See Random Computer Snippets

Subsample reads for R1 and R2 using seqtk

READS=2000000
for dir in /path/to/your/clearn/data/dir/from/illumiprocesser/*;
do
        RAND=$RANDOM;
        echo $RAND;
        for file in $dir/split-adapter-quality-trimmed/*-READ[1-2]*;
        do
                echo $file;
                seqtk sample -s $RAND $file $READS | gzip > $file:t:r:r.SUBSAMPLE.fastq.gz
        done;
done

Download data for multiple files from NCBI SRA

First, create a list of SRRs in a file, sra-records.txt, that looks something like:

SRR453553
SRR453556
SRR453559
SRR453277
SRR453409
SRR453550
SRR452995
SRR453269
SRR453270
SRR453274
SRR453263

Be sure to use fasterq-dump, it’s actually fast. It will use 6 threads by default:

for record in `cat sra-records.txt`;
do
        echo $record;
        fastq-dump $record;
done

Zip or unzip many files in parallel

Make sure you have GNU Parallel installed. Then:

# to GZIP files
# navigate to the directory containing the files
cd /my/dir/with/files
parallel gzip ::: *

# to GUNZIP files
# navigate to the directory containing the files
cd /my/dir/with/files
parallel gunzip ::: *

The same can be applied to many tar.gz files in a directory by replacing gzip or gunzip with tar -cf or tar -zf or tar -jf.