Discovery and Characterization of
This presentation is the property of its rightful owner.
Sponsored Links
1 / 18

Discovery and Characterization of protein-coding genes in D. melanogaster PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on
  • Presentation posted in: General

Discovery and Characterization of protein-coding genes in D. melanogaster. Mark Yandell HHMI Berkeley Drosophila Genome project. We have just completed a large-scale genome-wide search for additional protein coding genes. What we found:.

Download Presentation

Discovery and Characterization of protein-coding genes in D. melanogaster

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Discovery and characterization of protein coding genes in d melanogaster

Discovery and Characterization of

protein-coding genes in D. melanogaster

Mark Yandell

HHMI

Berkeley Drosophila

Genome project


Discovery and characterization of protein coding genes in d melanogaster

We have just completed a large-scale genome-wide search for

additional protein coding genes.

What we found:

Most fly protein coding genes have been at least provisionally identified.

Looking for new protein-coding genes means searching very large

collections of predictions for only a few additional genes.

Doing so in a coordinated and cost effective manner is essential.

Validation & Coordination


Discovery and characterization of protein coding genes in d melanogaster

genic

genic

genic

intergenic

intergenic

Where would missing genes lie?

~50 % of genome is intergenic (61,971,014)


Discovery and characterization of protein coding genes in d melanogaster

Distribution of intergenic lengths

Genes distributed

randomly within Genome


Discovery and characterization of protein coding genes in d melanogaster

Distribution of intergenic lengths

Genes distributed

randomly within Genome

Actual distribution

26,346,479 million bp


Discovery and characterization of protein coding genes in d melanogaster

~62 mega bases of DNA

run Genscan on every intergenic region 11671 predictions

1,167 non-overlapping FgenesH predictions (V. Solovyev)

1266‘new’ genes from Hild et al

159 control annotations

14263 new gene predictions

How many are real?


Discovery and characterization of protein coding genes in d melanogaster

~62 mega bases of DNA

14,263 new gene predictions

Standardized validation procedure


Discovery and characterization of protein coding genes in d melanogaster

GENE PREDICTION

1. Pool mRNA from 6 different stages

2. RVT with T15 TAGGED primer

3. PCR w/exon specific primers

4. Sequence PCR product

5. Realign to genome

6. Examine in browser

validation procedure

genome browser

PCR PRODUCT

GENE PREDICTION


Discovery and characterization of protein coding genes in d melanogaster

~62 mega bases of DNA

14263 new gene predictions

sub-categorization seemed advised

homology seemed a logical criterion


Discovery and characterization of protein coding genes in d melanogaster

2%

9%

7%

Split the gene models in to 5 different sets

‘One or none set’

1

293

(9,276)

D. p. genome

‘two or more set’

339

(339)

2

About 800 protein coding genes

remain to be identified*.

~95%* of all fly protein-coding genes have at least

provisional annotations.

D. p. genome

AG

GT

‘splice junction

conserved set’

3

AG

207

(207)

GT

D. p. genome

34%

‘Heidelberg set’

Why are there so many predictions & so few genes?

4

196

(1266)

‘new’ genes from Hild et al.

‘control set’

5

159

96%

Platinum annotations


Discovery and characterization of protein coding genes in d melanogaster

A negative control

A=T=G=C=0.25

AATGCGGATTTGCGGGATTAGGCGTTGAAAAAAAAAGATTCG~

Genscan, CpG island finder

Random

sequence generator

Examine results


Discovery and characterization of protein coding genes in d melanogaster

Random DNA contains genes and CpG islands…

CpG

Genscan

Genscan

thus we argue that an abundance of predictions is itself not

evidence for missed genes.


Discovery and characterization of protein coding genes in d melanogaster

This fact means that

validation methodology

is a real issue.

As far as Genscan is concerned D. melanogaster

intergenic regions look like random DNA.

It now appears that much of the genome is transcribed.

We believe that in many cases spurious predictions

overlap transcribed regions simply by chance.


Discovery and characterization of protein coding genes in d melanogaster

  • Confirmation of expression is not

  • confirmation of existence.

  • At the very least show that its spliced,

  • or failing that discrete.

  • Determining the true structure of the

  • transcriptome is the next logical step

  • for annotation.

For protein-coding genes:

  • accurate annotation of each protein-coding gene’s intron-exon structure

  • accurate annotation of every alternate transcript.

  • extend in-situ information to individual alternative transcripts.


Discovery and characterization of protein coding genes in d melanogaster

Conclusions

We have just completed a large-scale genome-wide search for

Additional protein coding genes.

What we conclude:

Most fly protein coding genes have been at least provisionally identified.

Looking for new protein-coding genes means searching very large

collections of predictions for only a few additional genes.

-- finding more will require new/retrained gene-finders; casting a wider net.

-- this will make for even larger collections of predictions.

RE: protein-coding genes the real issue is ‘finalizing’ provisional annotations. This will a computationally & experimentally complex task!

Doing so in a coordinated and cost effective manner is essential.

Ditto for annotation ‘finalization’ and non-coding RNA genes


Discovery and characterization of protein coding genes in d melanogaster

Why doing this responsibly will require a common software infrastructure.

group B

group A

gene-finder 2

gene-finder 1

validation results

wet-lab

gff3

  • Coordination and

  • Standardization will

  • be key!

  • of validation procedures

  • of data exchange formats

  • Some centralized coordination

primers

results

gff3

wet-lab

group C

gene-finder 3


Discovery and characterization of protein coding genes in d melanogaster

Acknowledgements

Sima Misra

Adina Bailey

Colin Wiel

ShengQiang Shu

Joe Carlson

Martha Evans-Holm

Pavel Tomancak

Sue Celniker

Suzi Lewis

Gerald M. Rubin


  • Login