Archiving Sequencing Data¶
Author: | Brant C. Faircloth |
---|---|
Copyright: | This documentation is available under a Creative Commons (CC-BY) license. |
Modification History¶
Purpose¶
We archive the raw read data from ALL of our old sequencing runs (>5 years) in AWS Glacier Deep Archive.
Steps¶
Navigate to the
illumina-runs
section of the NFSIdentify sequencing directories needed to be archived. Usually, in
illumina-runs
, this is everything but theclean
sequence directory that is ~5 years old. Once you’ve decided what needs to be excluded for a particular sequencing run, generate a directory tree of what you’re uploading:tree -I 'clean' -a > ${PWD##*/}-dirtree.txt
Check to make sure all files in dirs to package up are already zipped (these are sequence files, so they should be)
Now, package up everything, except for any excluded directories (e.g. like
clean
, which just duplicates the data):tar --exclude ./clean -cvf ${PWD##*/}.tar ./ | tee ${PWD##*/}-tar.out
This will make an output file named
<directory-name>-tar.out
that you can check to ensure everything has been packaged up that you wantedCompute md5 checksums of everything:
md5sum ${PWD##*/}.tar ${PWD##*/}-tar.out ${PWD##*/}-dirtree.txt > ${PWD##*/}.md5
Now, go ahead and upload those to AWS Glacier Deep Archive (be sure to use the correct
--profile
:aws s3 cp ${PWD##*/}.tar s3://2013-faircloth-lab-sequence-data --storage-class DEEP_ARCHIVE --profile lab-data && aws s3 cp ${PWD##*/}-tar.out s3://2013-faircloth-lab-sequence-data --storage-class DEEP_ARCHIVE --profile lab-data && aws s3 cp ${PWD##*/}-dirtree.txt s3://2013-faircloth-lab-sequence-data --storage-class DEEP_ARCHIVE --profile lab-data && aws s3 cp ${PWD##*/}.md5 s3://2013-faircloth-lab-sequence-data --storage-class DEEP_ARCHIVE --profile lab-data
Check the tar log file to make sure everything got packaged up (basically avoiding the
clean
data).Remove the directories that you’ve archived. For now, leave the
clean
data directoryDenote in the Google Sheet that the data have been archived
Move the directory to
/nfs/data1/illumina-runs/ARCHIVED