Select One • Sequencing Program • Re-organize read data • Incorporate oligomer statistics • Epigenetic study • Obtain epigenetic data • Protein Structure • Finding internally symmetric structure in proteins
Sequencing • Newer illumina read data gets longer • 150 to 300 bp per read • De Bruijn approach may not be suitable for longer reads • Segment each read to multiple k-mers sliding by one base from a read • K=27 • May lose info from k-mers to reads • Brute-force OLC is NP-complete
Sequencing 2 • OLC with data partition • Front bins of identical first k-mers of reads • Rear bins of identical last k-mers of reads • K=15, 415 = 230 = 1 billion bins* 2 • Join F and R bins with the same k-mers • Produce consensus contigs with bp matches over a threshold • Repeat generating new F & R bins, • Each time dealing with longer contigs • Per-bucket sequencing
Sequencing 3 • Is this a valid approach ? • Analysis in time and space, any advantage vs. conventional OLC ? • Implementation in any language • 2 Yeast chromosomes • when reads from both are mixed, can they be separated • 16 Yeast chromosomes • Each contig > 100 Kbp should have the same oligomer % distribution as the one from the genome • Delete contigs not matching oligomer distribution
Epigenetic Study • www.cs.uml.edu/~kim/epi_disease.pdf • Epigenetic evidences in cancer, Altzheimer, neurological disorder, obesity, etc. • Select one chapter on human diseases • Obtain epigenetic data • Reproduce the results
Protein Structure • www.cs.uml.edu/~kim/symD_article.pdf • symD.pdf paper describes the algorithm for finding symmetric structure in a protein and its results. • Program that performed the analysis is available in www.cs.uml.edu/~kim/580/symd1.0.tar • Use symmD.zip program (in C), and reproduce the results in the paper • If you want to extend the problem to finding ‘similar’ structures (not identical), describe which part of symd10.tar program needs to be modified and how.
Schedule • Next class (4/7) • Hand in • Which topic you will work on • 4/28 (Mon) • 10-min presentation • No fancy power points • Due on 5/5 (Mon)