Alignment Problem. (Optimal) pairwise alignment consists of considering all possible alignments of two sequences and choosing the optimal one. Sub-optimal (heuristic) alignment algorithms are also very important: e.g. BLAST. Key Issues. Types of alignments (local vs. global)
Global versus Local Alignments
Sanger (1982) introduced chain-termination sequencing.
Main idea: Obtain fragments of all possible lengths, ending in A, C, T, G.
Using gel electrophoresis, we can separate fragments of differing lengths, and then assemble them.
Can sequence ~500bp with 98.5% accuracy
Sequencing machines are limited to about ~500-750bp, so we must break up DNA into short and long fragments, with reads on either end.
Reads are then assembled into contigs, then scaffolds.
Celera used 300 sequencing machines in parallel to obtain 175,000 reads per day.
Efforts were combined, resulting in 8x coverage of the human genome; consensus sequence is 2.91 billion base pairs.
The Big Picture assembly.
Suppose we had a way to probe fragments of length k that were present in our sequence, from a hybridization assay.
Commercial products: Affymetrix GeneChip, Agilent, Amersham, etc.
Theorem (Euler 1736): A graph has a path visiting every edge exactly once if and only if it is connected and has 2 or fewer vertices of odd degree.