slide1
Download
Skip this Video
Download Presentation
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Loading in 2 Seconds...

play fullscreen
1 / 54

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence - PowerPoint PPT Presentation


  • 85 Views
  • Uploaded on

For Bioinformatics. , Start with:. Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence. carry out dideoxy sequencing. connect seqs. to make whole chromosomes . find the genes!. The Human Genome. E. coli Genome. Reading:. DNA target sample. SHEAR.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence' - kellsie


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide3
For Bioinformatics

, Start with:

Genomics:

READING genome sequences

ASSEMBLY of the sequence

ANNOTATION of the sequence

carry out dideoxy sequencing

connect seqs. to make whole chromosomes

find the genes!

slide4
The Human Genome

E. coli Genome

shotgun dna sequencing of whole genome wgs
Reading:

DNA target sample

SHEAR

Reads

LIGATE & CLONE

Primer

SEQUENCE

Vector

Shotgun DNA Sequencing of whole genome (WGS)
slide7
Assembly:

The challenge of eukaryotic genomes

E. coli Genome

4 million bp

The Human Genome

3 billion bp

50% of genome is repeat sequences!

slide8
Assembly of sequence of

each chromosome from end to end

END, Jan 14 begin

slide9
Annotation:

Genomics:

READING genome sequences

ASSEMBLY of the sequence

ANNOTATION of the sequence

Robotically do dideoxy-dye data collection

Whole genome shotgun OR Ordered clones

find the genes !

slide11
Annotation:

10/1/5

Genomics:

READING genome sequences

ASSEMBLY of the sequence

ANNOTATION of the sequence

find the genes !

  • ab initio
  • by evidence
slide12
Annotation:

For Bacterial genomes, ab initio is adequate

ab initio: “from the beginning”

יש מאין

from first principles…

ORFs are MOST of prokaryotic genome

slide13
Annotation:

ab initio – finding ORFs

  • -85-88% of the nucleotides are associated with coding sequence
  • in the bacterial genomes that have been completely sequenced.
    • example: in Escherichia coli there are 4288 genes that
    • have an average of 950 bp of coding sequence
    • and are separated by an average of just 118 bp.

So first, to find genes in prokaryotic DNA, search for ORFs!!

slide17
Annotation:

ab initio – finding ORFs

  • -85-88% of the nucleotides are associated with coding sequence
  • in the bacterial genomes that have been completely sequenced.
    • example: in Escherichia coli there are 4288 genes that
    • have an average of 950 bp of coding sequence
    • and are separated by an average of just 118 bp.

So first, to find genes in prokaryotic DNA, search for ORFs!!

slide18
Annotation:

ab initio – beyond ORFs

beyond ORFs:

  • -Prokaryotes have short, simple promoters that are
  • easy to recognize
  • -Transcriptional terminators often consist of short inverted
  • repeats followed by a run of Ts.
  • -Therefore, programs that find prokaryotic genes search for:
    • ORFs 60 or more codons long –and codon usage
    • promoters at the 5' end
    • Terminators at the 3' end
    • Homology to known genes from other prokaryotes
    • Shine-Dalgarno sequences
  • `
slide19
Annotation:

ab initio – automated

Prokaryotic gene finder examples

Glimmer-

Interpolated Markov Model method

GrailII-

Neural Network method

(See BioInfo text – Fig 8.8)

slide22
Annotation:

Multicellular eukaryotes

Done too 10/1/5

slide23
Annotation:

Multicellular eukaryotes

Done too 10/1/5

slide24
Annotation:

Multicellular eukaryotes

Done too 10/1/5

slide25
Annotation:

2 ways to annotate eukaryotic genomes:

-ab initio gene finders:

Work on basic biological principles:

Open reading frames

Codon usage

Consensus splice sites

Met start codons

…..

-Genes based on previous knowledge….EVIDENCE

-cDNA sequence of the gene’s message

-cDNA of a closely related gene’ message sequence

-Protein sequence of the known gene

Same gene’s

Same gene’s from another species

Related gene’s protein…….

-ab initio gene finders:

Work on basic biological principles:

Open reading frames

Codon usage

Consensus splice sites

Met start codons

…..

Genes based on previous knowledge-EVIDENCE

-cDNA sequence of the gene’s message

-cDNA of a related gene’s message seq.

-Protein sequence of the known gene

Same gene’s

Same gene’s from another species

Related gene’s protein…….

slide26
start and stop site predictions

Unique identifiers

Splice site predictions

Homology based exon predictions

computational exon predictions

Tracking information

Consensus gene

structure (both strands)

slide27
Automatically

generated

annotation

slide28
A zebrafish hit shows a gene model protein encoded by a 6 exon gene.

This gene structure (intron/exon) is seen in other species, as is the protein size.

The proteins, if corresponding to MSP in S. gal., must be heavily glycosylated (likely).

At least some have a signal peptide.

slide31
Genomics:

READING genome sequences

ASSEMBLY of the sequence

ANNOTATION of the sequence

carry out dideoxy sequencing

, 700 bp each read, MAX

connect seqs. to make whole chromosomes

slide32
Genomics:

READING genome sequences

ASSEMBLY of the sequence

ANNOTATION of the sequence

carry out dideoxy sequencing

connect seqs. to make whole chromosomes

find the genes!

slide34
Annotation:

End Reads (Mates)

Primer

SEQUENCE

cDNAs &

ESTs:

Expressed Sequence Tags

RNA target sample

cDNA Library

Each cDNA provides sequence from the two ends – two ESTs

slide37
Who Gets Sequenced?

Models

Pathogens

Agriculturals

slide47
Protein Structure Database

See Swiss-pdb viewer

slide54
RNAi for every C. elegans

gene too!

-results on the web

Projects to systematically Knock-out (or pseudo-knockout)

every gene, in order to establish phenotype of each gene

-> function of each gene

ad