TY - JOUR
T1 - SRmapper: A Fast and Sensitive Genome-Hashing Alignment Tool
AU - Gontarz, Paul Michael
AU - Berger, Jennifer
AU - Wong, Chung F.
N1 - With the advent of next-generation sequencing (NGS) instruments, the amount of raw genetic sequence information has exponentially increased during the past few years, and it is expected to continue to grow at a high rate as sequencing cost continue to decrease.
PY - 2013/2/1
Y1 - 2013/2/1
N2 - Modern sequencing instruments have the capability to produce millions of short reads every day. The large number of reads produced in conjunction with variations between reads and reference genomic sequences caused both by legitimate differences, such as single-nucleotide polymorphisms and insertions/deletions (indels), and by sequencer errors make alignment a difficult and computationally expensive task, and many reads cannot be aligned. Here, we introduce a new alignment tool, SRmapper, which in tests using real data can align 10s of billions of base pairs from short reads to the human genome per computer processor day. SRmapper tolerates a higher number of mismatches than current programs based on Burrows–Wheeler transform and finds about the same number of alignments in 2–8× less time depending on read length (with higher performance gain for longer read length). The current version of SRmapper aligns both single and pair-end reads in base space fastq format and outputs alignments in Sequence Alignment/Map format. SRmapper uses a probabilistic approach to set a default number of mismatches allowed and determines alignment quality. SRmapper’s memory footprint (∼2.5 GB) is small enough that it can be run on a computer with 4 GB of random access memory for a genome the size of a human. Finally, SRmapper is designed so that its function can be extended to finding small indels as well as long deletions and chromosomal translocations in future versions.
AB - Modern sequencing instruments have the capability to produce millions of short reads every day. The large number of reads produced in conjunction with variations between reads and reference genomic sequences caused both by legitimate differences, such as single-nucleotide polymorphisms and insertions/deletions (indels), and by sequencer errors make alignment a difficult and computationally expensive task, and many reads cannot be aligned. Here, we introduce a new alignment tool, SRmapper, which in tests using real data can align 10s of billions of base pairs from short reads to the human genome per computer processor day. SRmapper tolerates a higher number of mismatches than current programs based on Burrows–Wheeler transform and finds about the same number of alignments in 2–8× less time depending on read length (with higher performance gain for longer read length). The current version of SRmapper aligns both single and pair-end reads in base space fastq format and outputs alignments in Sequence Alignment/Map format. SRmapper uses a probabilistic approach to set a default number of mismatches allowed and determines alignment quality. SRmapper’s memory footprint (∼2.5 GB) is small enough that it can be run on a computer with 4 GB of random access memory for a genome the size of a human. Finally, SRmapper is designed so that its function can be extended to finding small indels as well as long deletions and chromosomal translocations in future versions.
UR - https://academic.oup.com/bioinformatics/article/29/3/316/257957
U2 - 10.1093/bioinformatics/bts712
DO - 10.1093/bioinformatics/bts712
M3 - Article
VL - 29
JO - Bioinformatics
JF - Bioinformatics
ER -