Advancements in Oligonucleotide Mapping to Genome: SeqMap and GNUMAP Techniques

SeqMap: mapping massive amount of oligonucleotides to the genomeHui Jiang et al. Bioinformatics (2008) 24: 2395-2396The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencingNathan Clement et al. Bioinformatics (2010) 26: 38-45 Presented by: Xia Li

Short-read mapping software

SeqMap • Motivation • Hashing genome usually needs large memory (e.g. SOAP needs 14GB memory when mapping to the human genome) • Allow more substitutions and insertion/deletion

SeqMap Short Read • Pigeonhole principle • Spaced seed alignment • ELAND, SOAP, RMAP • Hash reads • Insertion/deletion: 2/4 combinations with 1/2 shifted one nucleotide to its left or right Split into 4 parts All combinations of 2/4 parts Short read look up table (indexed by 2 parts) Reference Genome Image credit: J. Ruan

Experiment & Result

Experiment & Result • Deal with more substitutions and insertion/deletion Randomly generate a DNA sequence of a length of 1Mb, add 100Kb random substitutions, N’s and insertion/deletions

GNUMAP • Motivation • Base uncertainty • Such as nearly equal or low probabilities to A, C, G or T • Filter low quality reads [RMAP] -> discard up to half of the reads (Harismendyet al., 2009) • Repeated regions in the genome • Discard them -> loss of up to half of the data (Harismendyet al., 2009) • Record one -> unequal mapping to some of the repeat regions • Record all -> each location having 3 times the correct score

GNUMAP • Flow-chart

Probabilistic Needleman-Wunsch

Alignment Score Read from sequencer GGGTACAACCATTAC Read is added to both repeat regions proportionally to their match quality weighted by its # of occurrences in the genome AACCAT GGGTAC AACCAT ACTGAACCATACGGGTACTGAACCATGAA Slide credit: N. Clement

Experiment & Result

Comments • SeqMap • Pos: dealing with more substations/insertion/deletion • Cons: memory consuming, not fast • GNUMAP • Pos: consider base quality and repeated regions -> generate more useful information and achieves best performance (~15% increase) • Cos: memory consuming, slow, more noise

Advancements in Oligonucleotide Mapping to Genome: SeqMap and GNUMAP Techniques

Advancements in Oligonucleotide Mapping to Genome: SeqMap and GNUMAP Techniques

Presentation Transcript

Packaging Contract Review Presented By: DCMAC-JP

Presented by: Michel

Presented by Human Resources

Presented by:

EVM System Surveillance Presented By: [NAMES] Presented to: [GROUP]

Presentation on OLPER’S Presented to: Mam.Ammara

Presented by: Date: Location:

§483.65 Infection Control (F441) Update Presented at

Presented by

Saliency או בולטות ויזואלית אם תרצו

Presented by: Rochelle Pauls

Presented By CA Swatantra Singh,

Presented at

Presented by:

Presented by : Mahmoud Abdulhakam

PERM !

Presented by Julie Esparza Brown, EdD jebrown@pdx

Interpreting MS/MS Proteomics Results

Presented by Hans Andersen Club (hac.hk)

Presented by: Mr. Dali Mdunge

Presented by : Joseph E. Seibert