SRmapper: A Fast and Sensitive Genome-Hashing Alignment Tool

Paul Michael Gontarz, Jennifer Berger, Chung F. Wong

Research output: Contribution to journalArticlepeer-review

Abstract

Modern sequencing instruments have the capability to produce millions of short reads every day. The large number of reads produced in conjunction with variations between reads and reference genomic sequences caused both by legitimate differences, such as single-nucleotide polymorphisms and insertions/deletions (indels), and by sequencer errors make alignment a difficult and computationally expensive task, and many reads cannot be aligned. Here, we introduce a new alignment tool, SRmapper, which in tests using real data can align 10s of billions of base pairs from short reads to the human genome per computer processor day. SRmapper tolerates a higher number of mismatches than current programs based on Burrows–Wheeler transform and finds about the same number of alignments in 2–8× less time depending on read length (with higher performance gain for longer read length). The current version of SRmapper aligns both single and pair-end reads in base space fastq format and outputs alignments in Sequence Alignment/Map format. SRmapper uses a probabilistic approach to set a default number of mismatches allowed and determines alignment quality. SRmapper’s memory footprint (∼2.5 GB) is small enough that it can be run on a computer with 4 GB of random access memory for a genome the size of a human. Finally, SRmapper is designed so that its function can be extended to finding small indels as well as long deletions and chromosomal translocations in future versions.
Original languageAmerican English
JournalBioinformatics
Volume29
DOIs
StatePublished - Feb 1 2013

Disciplines

  • Computer Sciences
  • Bioinformatics

Cite this