1 / 20

Applying AI to Human Genome

Applying AI to Human Genome. Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns. Overview. Basics of DNA Collecting the data Collection : my application Perl Goal. Basics of DNA. DNA = polymer of 4 molecules : bases or nucleotides

Download Presentation

Applying AI to Human Genome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applying AI to Human Genome Part 1 : Collecting data Prof. M. Embrechts Robert Bress Bram Heyns

  2. Overview • Basics of DNA • Collecting the data • Collection : my application • Perl • Goal

  3. Basics of DNA • DNA = polymer of 4 molecules : bases or nucleotides • A = Adenine , C = Cytosine , G = Guanine , T = Thymine • Replication ( copying ) and translation ( reading ) => double helix : AT , GC ( copying ) • 3 letter combination = codon • RNA : U = Uracil in place of T => Transcribing • Protein = polymer composed of 20 amino acids ( reading ) => more complex structure than DNA

  4. Transition DNA RNA Protein

  5. Intron – Exon - Splicejunction • exon 200 characters  intron thousands • 30,000 genes identified out of possible 100,000 • Identification gene patent

  6. Summary • Human : 23 chromosomes • Chromosomes thousands of genes • Gene info : exons , comments : introns • Exons and introns codons • Codon bases

  7. Datacollection • Human Genome Project • NCBI website : http//www.ncbi.nlm.nih.gov • Entrez-Nucleotide.htm • NCBI Sequence Viewer.htm

  8. Datacollection • Human Genome Project • NCBI website : http//www.ncbi.nlm.nih.gov • Entrez-Nucleotide.htm • NCBI Sequence Viewer.htm

  9. Datacollection : my application

  10. Datacollection : my application

  11. Perl Practical Extraction and Report Language POD – files -> web Portability Free – CPAN modules String manipilation Extremely powerfull regex-engine Glue language designed for short and simple tasks, not equal to lack of power or “serious” features Tutorial : http://www.netcat.co.uk/rob/perl/win32perltut.html

  12. Regular Expression – Pattern Matching • Practical Extraction and Report Language • Scan through data and extract useful information • m/PATTERN/ s/PATTERN/REPLACEMENT/ • 1 line Perl = 100 lines C or Java • Complex, but easy

  13. Regex examples • /[KCZ]arl^sa/ • /<I>/(.*?)<\/I>/i • $1,$2,… • i , g , c , … • . , * , + , ? • /([0-9a-zA-Z])+/ or /([\w])+/ • s/us[^a-z]/them/g or s/us\W/them/g • /([acc|act][ttt|ttc|att])/ • TIMTOWTDT

  14. Part 2 : Applying AI • Our choice : evolutionary computing • First part : identify exon part • Second part : identify splicejunctions • Third part : combine previous parts • Hope to reach +90% accuracy

  15. Questions ?

More Related