gene prediction n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
gene prediction PowerPoint Presentation
Download Presentation
gene prediction

Loading in 2 Seconds...

play fullscreen
1 / 25

gene prediction - PowerPoint PPT Presentation


  • 153 Views
  • Uploaded on

gene prediction. roderic guigó i serra IMIM/UPF/CRG. number of genes in chromosome 22. initial annotation 545 Dunham et al., 1999 genscan+RT-PCR 590 Das et al., 2001 genscan+microarrays 730 Shoemaker et al., 2001 reviewed annotation 726 chr22 team, sanger, 2001

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'gene prediction' - kuper


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
gene prediction

gene prediction

roderic guigó i serra

IMIM/UPF/CRG

slide2

number of genes in chromosome 22

  • initial annotation 545 Dunham et al., 1999
  • genscan+RT-PCR 590 Das et al., 2001
  • genscan+microarrays 730 Shoemaker et al., 2001
  • reviewed annotation 726 chr22 team, sanger, 2001
  • mouse shotgun data +20 (our data)
  • geneid predictions 794
  • genscan predictions 1128
slide3

number of genes in human genome

  • Consortium 30.000-40.000 2001
  • Celera 27.000-38.000 2001
  • Consortium+Celera 50.000 Hogenesch et al. 2001
  • DBsearches 65.000-75.000 Wrigth et al., 2001
  • HumanGenomeSciences 90.000-120.000 Haseltine, 2001
decodificaci del genoma

the human genome sequence

decodificació del genoma

ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGAAGCGCAGTCGGGGGCACGGGGATGAGCTCAGGGGCCTCTAGAAAGATGTAGCTGGGACCTCGGGAAGCCCTGGCCTCCAGGTAGTCTCAGGAGAGCTACTCAGGGTCGGGCTTGGGGAGAGGAGGAGCGGGGGTGAGGCCAGCAGCAGGGGACTGGACCTGGGAAGGGCTGGGCAGCAGAGACGACCCGACCCGCTAGAAGGTGGGGTGGGGAGAGCATGTGGACTAGGAGCTAAGCCACAGCAGGACCCCCACGAGTTGTCACTGTCATTTATCGAGCACCTACTGGGTGTCCCCAGTGTCCTCAGATCTCCATAACTGGGAAGCCAGGGGCAGCGACACGGTAGCTAGCCGTCGATTGGAGAACTTTAAAATGAGGACTGAATTAGCTCATAAATGGAAAACGGCGCTTAAATGTGAGGTTAGAGCTTAGAATGTGAAGGGAGAATGAGGAATGCGAGACTGGGACTGAGATGGAACCGGCGGTGGGGAGGGGGAGGGGGTGTGGAATTTGAACCCCGGGAGAGAAAGATGGAATTTTGGCTATGGAGGCCGACCTGGGGATGGGGAAATAAGAGAAGACCAGGAGGGAGTTAAATAGGGAATGGGTTGGGGGCGGCTTGGTAACTGTTTGTGCTGGGATTAGGCTGTTGCAGATAATGGAGCAAGGCTTGGAAGGCTAACCTGGGGTGGGGCCGGGTTGGGGTCGGGCTGGGGGCGGGAGGAGTCCTCACTGGCGGTTGATTGACAGTTTCTCCTTCCCCAGACTGGCCAATCACAGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGGCAGGTATGGGGCGGGGCTTGCTCGGTTTTCCCCGCTTCTCCCCCTCTCATCCTCACCTCAACCTCCTGGCCCCATTCAAGCACACCCTGGGCCCCCTCTTCTTCTGCTGGTCTGTCCCCTGAGGGGAAAGCCCAGGTCTGAGGCTTCTATGCTGCTTTCTGGCTCAGAACAGCGATTTGACGCTCTGTGAGCCTCGGTTCCTCCCCCGCTTTTTTTTTTTCAGCCAGAGTCTCACTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCAATCTCAGCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCTATTCTCCCGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCGCCCGCCACCATGCCCGGCTAATTTTTTGTACTTTGAGTAGGGAAGGGGTTTCACTGTATTATCCAGGATGGTCTCTATCTCCTGACCTCGTGATCTGCCCGCCTGGCCTCCCAAAGTGCTGGAATTACAGGCGTGAGCCTCCGCGCCCGGCCTCCCCATCCTTAATATAGGAGTTAGAAGTTTTTGTTTGTTTGTTTTGTTTTGTTTTTGTTTTGTTTTGAGATGAAGTCCCTCTGTCGCCCAGGCTGGAGTGCAGTGGCTCCCAGGCTGGAGTTCAGTGGCTGGATCTCGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACGCCATTCTCCTGCCTCAGCCTCCGGAGTAGCTGGGACTACAGGAACATGCCACCACACCCGACTAACTTTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTGGCCAGGCTGGTCTGGAACTCCTGACCTCAGGTGATCTGCCTGCTTCAACCTCCCAAAGTGCTGGGATTACAGACGTGGGCCACCGCGCCCGGCTGGGAGTTAAGAGGTTTCTAATGCATTGCATTAGAATACCAGACACGGGACAGCTGTGATCTTTATTCTCCATCACCCCACACAGCCCTGCCTGGGGCACACAAGGACACTCAATACACGCTTTTCGGGCGCGGTGGCTCAAGCTGTAATCCCAGCACTTTGGGAGGCTGAGGCGGGTGGTACATGAGGTCAGGAGATCGAGACCATCCTGGCTAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAAACTAGCCCGGGCGTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGAGGCTGAGGCAGGAGAATGGCGTGAACCTGGGAGGCGGAGCTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGGGTGACACAGCGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATACACGCTTTTCCGCTAGGCACGGTGGCTCACCCCTGTAATCCCAGCATTTTGGGAGGCCAAGGTGGGAGGATCACTTGAGCCCAGGAGTTCAACACCAGACTCAGCAACATAGTGAGACTCTCTCTACTAAAAATACAAAAATTAGCCAGGCCTGGTGCCACACACCTGTGGTCCCAGCTACTCAGAAGGCTAAGGCAGGAGGATCGCTTAAGCCCAGAAGGTCAAGGTTGCAGTGAACCACGTTCAGGCCACTGCAGTCCAGCCTGGGTGACAGAGCAAGACCCTGTCTGTAAATAAATAACGCTTTTCAAGTGATTAAACAGACTCCCCCCTCACCCTGCCCACCATGGCTCCAAAGCAGCATTTGTGGAGCACCTTCTGTGTGCCCCTAGGTACTAGCTGCCTGGACGGGGTCAGAAGGAACCTGAACCACCTTCAACTTGTTCCACACAGGATGCCAGGCCAAGGTGGAGCAACCGGTGGAGCCAGAGACAGAACCCGACGTTCGCCAGCAGGCTGAGTGGCAGAGCGGCCAGCCCTGGGAGCTGGCACTGGGTCGCTTTTGGGATTACCTGCGCTGGGTGCAGACACTGTCTGAGCAGGTGCAGGAGGAGCTGCTCAGCCCCCAGGTCACCCAGGAACTGACGTGAGTGTCCCCATCCCGGCCCTTGACCCTCCTGGTGGGCGGCTATACCTCCCCAGGTCCAGGTTTCATTCTGCCCCTGCCACTAAGTCTTGGGGGCCTGGGTCTCTGCTGGTTCTAGCTTCCTCTTCCCATTTCTGACTCCTGGCTTTAGCTCTCTGGAATTCTCTCTCTCAGTTCTGTTTCTCCCTCTTCCCTTCTGACTCAGCCTGTCACACTCGTCCTGGCGCTGTCTCTGTCCTTCACTAGCTCTTTTATATAGAGACAGAGAGATGGGGTCTCACTGTGTTGCCCAGGCTGGTCTTGAACTTCTGGGCTCAAGCGATCCTCCCACCTCGCCTCCCAAAGTGCTGGGAATAGAGACATGAGCCACCTTGCTCGGCCTCCTAGCTCTTTCTTCGTCTCTGCCTCTGCTCTCTGCGTCTGTCTTTGTCTCCTCTCTGCCTCTGTCCCGTTCCTTCTCTCTTGGTTCACTGCCCTTCTGTCTCTCCCTGTTCTCCTTAGGAGACTCTCCTCTCTTCCTTCTCGAGTCTCTCTGGCTGATCCCCATCTCACCCACACCTATCC

slide5

the amino acid sequence of the proteins

QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNNSFNVATLPAEKMKILELPFASGDLSMLVLLPDEVSDLERIEKTINFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTSVLMALGMTDLFIPSANLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSPESEQFRADHPFLFLIKHNPTNTIVYFGRYWSP

estructura dels gens

INTRONS

PROMOTOR

EXONS

ELEMENT

REGULADOR

‘DOWNSTREAM’

ELEMENT

REGULADOR

‘UPSTREAM’

Estructura dels Gens
comparative gene prediciton
comparative gene prediciton
  • rosseta (Batzoglou et al., 2000)
  • cem (Bafna and Huson, 2000)
  • sgp1 (Wiehe et al., 2000)
  • twinscan (Korf et al., 2001)
  • slam ( Patcher et al., 2001)
  • sgp2 (Guigó et al., in preparation)
syntenic gene prediction sgp2

tblastxHSPs

HSPsProjections

QuerySequence

geneidExons

SGPExons

syntenic gene prediction (sgp2)
predicting novel genes in the human genome

golden path annotations

golden path annotations

additional blastn matches to ENSEMBL + REFSEQ

additional blastn matches to ENSEMBL + REFSEQ

tblastx

tblastx

geneid

exons

sgp

genes

Predicting “novel” genes in the human genome

Golden Path Oct 7, 2000 freeze. RepeatMasked

TraceDB, as on February 2001

novel genes
“novel” genes ?
  • 48,890 genic regions (known genes or similar)
  • 15,489 genes longer than 100 aa predicted by sgp
  • 13,302 non redundant predictions
  • 8,416 supported by tblastx hits to mouse 1.5
  • 3,331 predicted genes with at least two exons suported by tblastx hits
  • + 719 predicted genes supported by tblastx hits covering at least 75% of the prediction

4,050 supported sgp predictions

25% of them not overlapping genscan predictions

chr22
chr22

human genome vs. Mouse traceDB

chr21

slide22

human genome vs. Mouse assemblies

SN SP CC SNe SPe SNSP ME WE

chr22.assem. 0.87 0.65 0.75 0.69 0.54 0.62 0.14 0.33

chr22.shot. 0.82 0.66 0.72 0.63 0.54 0.58 0.20 0.31

slide23

testing novel predictions experimentally

In total 81 predictions.

For 40 of them, adjacent exon pairs were selected for rt-pcr

slide25

aknowledgments

IMIM-UPF-CRG, Barcelona

  • Josep F. Abril
  • Genís Parra
  • Roderic Guigó

GlaxoSmithKline, King of Prussia

  • Pankaj Agarwal

Max Plank Institute for Chemical Ecology, Jena

  • Thomas Wiehe

Whitehead Institute/MIT Center for Genome Research, Cambridge

  • Gwen Acton
  • Dan Brown
  • Kerstin

Mouse Sequence Consortium