Gene finding
Download
1 / 16

Gene Finding - PowerPoint PPT Presentation


  • 126 Views
  • Updated On :

Gene Finding. Changhui (Charles) Yan. Gene Finding. Genomes of many organisms have been sequenced. Genome. Completely Sequenced Genomes. Gene Finding. M ore than 60 eukaryotic genome sequencing projects are underway. Human Genome Project (HGP).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Gene Finding' - bing


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Gene finding l.jpg

Gene Finding

Changhui (Charles) Yan

Changhui (Charles) Yan


Gene finding2 l.jpg
Gene Finding

  • Genomes of many organisms have been sequenced

Changhui (Charles) Yan


Genome l.jpg
Genome

Changhui (Charles) Yan


Completely sequenced genomes l.jpg
Completely Sequenced Genomes

Changhui (Charles) Yan


Gene finding5 l.jpg
Gene Finding

  • More than 60 eukaryoticgenome sequencing projects are underway

Changhui (Charles) Yan


Human genome project hgp l.jpg
Human Genome Project (HGP)

  • To determine the sequences of the 3 billion bases that make up human DNA

    • 99% human DNA sequence finished to 99.99% accuracy (April 2003)

  • To identify the approximate 100,000 genes in human DNA (The estimates has been changed to 20,000-25,000 by Oct 2004)

    • 15,000 full-length human genes identified (March 2003)

  • To store this information in databases

  • To develop tools for data analysis

Changhui (Charles) Yan


Gene finding7 l.jpg
Gene Finding

  • Genomes of many organisms have been sequenced

  • We need to decipher the raw sequences

    • Where are the genes?

    • What do they encode?

    • How the genes are regulated?

Changhui (Charles) Yan


Gene finding8 l.jpg
Gene Finding

  • Homology-based methods, also called `extrinsicmethods‘

    • It seems that only approximately half of the genes can be found by homology to other known genes (although this percentage is of course increasing as more genomes get sequenced).

  • Gene prediction methods or`intrinsic methods‘

    • (http://www.nslij-genetics.org/gene/)

Changhui (Charles) Yan


Machine learning approach l.jpg
Machine LearningApproach

  • Split data into a training set and a test set

  • Use the training set to train a classifier

  • Test the classifier on test set

  • The classifier then can be applied to novel data

Training data

Machine Learning algorithm

Test data

Novel data

Classifier

Evaluation of classifier

Prediction

Changhui (Charles) Yan


Data examples classes classifier l.jpg
Data, examples, classes, classifier

ccgctttttgccagcataacggtgtcga, 1

accacgttttttgccagcatttgccagca, 0

atcatcacgatcacgaacatcaccacg, 0

Changhui (Charles) Yan


N fold cross validation l.jpg
N-fold cross-validation

3-fold cross-validation

E.Coli K12 Genome

4,639,675

Changhui (Charles) Yan


Machine learning approach12 l.jpg
Machine LearningApproach

Training data

Machine Learning algorithm

Test data

Novel data

Classifier

Evaluation of classifier

Prediction

Changhui (Charles) Yan


Gene finders l.jpg
Gene-finders

Changhui (Charles) Yan


Prokaryotes vs eukaryotes l.jpg
Prokaryotes vs. Eukaryotes

  • Prokaryotes are organisms without a cell nucleus.

    • Most prokaryotes are bacteria.

    • Prokaryotes can be divided into Bacteria and Archaeabacteria.

  • Eukaryotesare organisms which a membrane-bound nucleus.

Changhui (Charles) Yan


Prokaryotes vs eukaryotes15 l.jpg
Prokaryotes vs. Eukaryotes

  • Prokaryotes’ genomes are relatively simple: coding region (genes) vs. non-coding region.

  • Eukaryotes’ genomes are complicated.

Changhui (Charles) Yan


Eukaryotic genes l.jpg
Eukaryotic genes

Changhui (Charles) Yan


ad