White Paper: Super-Fast Genome BWA-Bam-Sort on GLAD

Aligning the sequenced reads in FASTQ files and converting the resulted SAM file into a sorted BAM file are a common practice and often the first stage of many genome analysis pipelines. Software such as BWA and Samtools has been developed by the bioinformatics community to achieve this on single computers. Although those tools usually can make use of multi-threading technologies to use multiple CPU cores in modern computers, the time used is still so long as hours, taking a large part of the whole analysis pipeline and limiting the overall speed of pipeline executions. As the application area of genome analysis is larger and larger, the analysis should finish in minutes instead of hours.

GLAD is a genome analysis system that makes use of the resources of many compute nodes in a cluster from private or public clouds to significantly accelerate the gnome analysis such as FASTQ to sorted BAM generating which we call BWA-BAM-Sort or BBS in this white paper. With a flexible programming interface, GLAD does not require modifications to software such as BWA and Samtools to run on GLAD.

This white paper describes performance results of BBS computation on GLAD in clusters of compute nodes in private or public clouds. The BBS step’s time can be shortened to minutes from hours even with a cluster of a moderate size.

Read the white paper: Super-Fast Genome BWA-Bam-Sort on GLAD (draft).