Download
1 / 54

Slide 1 - PowerPoint PPT Presentation


  • 85 Views
  • Uploaded on

For Bioinformatics. , Start with:. Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence. carry out dideoxy sequencing. connect seqs. to make whole chromosomes . find the genes!. The Human Genome. E. coli Genome. Reading:. DNA target sample. SHEAR.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Slide 1' - kellsie


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

For Bioinformatics

, Start with:

Genomics:

READING genome sequences

ASSEMBLY of the sequence

ANNOTATION of the sequence

carry out dideoxy sequencing

connect seqs. to make whole chromosomes

find the genes!


The Human Genome

E. coli Genome


Shotgun dna sequencing of whole genome wgs

Reading:

DNA target sample

SHEAR

Reads

LIGATE & CLONE

Primer

SEQUENCE

Vector

Shotgun DNA Sequencing of whole genome (WGS)



Assembly:

The challenge of eukaryotic genomes

E. coli Genome

4 million bp

The Human Genome

3 billion bp

50% of genome is repeat sequences!


Assembly of sequence of

each chromosome from end to end

END, Jan 14 begin


Annotation:

Genomics:

READING genome sequences

ASSEMBLY of the sequence

ANNOTATION of the sequence

Robotically do dideoxy-dye data collection

Whole genome shotgun OR Ordered clones

find the genes !


Annotation:

10/1/5

Genomics:

READING genome sequences

ASSEMBLY of the sequence

ANNOTATION of the sequence

find the genes !

  • ab initio

  • by evidence


Annotation:

For Bacterial genomes, ab initio is adequate

ab initio: “from the beginning”

יש מאין

from first principles…

ORFs are MOST of prokaryotic genome


Annotation:

ab initio – finding ORFs

  • -85-88% of the nucleotides are associated with coding sequence

  • in the bacterial genomes that have been completely sequenced.

    • example: in Escherichia coli there are 4288 genes that

    • have an average of 950 bp of coding sequence

    • and are separated by an average of just 118 bp.

So first, to find genes in prokaryotic DNA, search for ORFs!!


Annotation:

ab initio – finding ORFs

  • -85-88% of the nucleotides are associated with coding sequence

  • in the bacterial genomes that have been completely sequenced.

    • example: in Escherichia coli there are 4288 genes that

    • have an average of 950 bp of coding sequence

    • and are separated by an average of just 118 bp.

So first, to find genes in prokaryotic DNA, search for ORFs!!


Annotation:

ab initio – beyond ORFs

beyond ORFs:

  • -Prokaryotes have short, simple promoters that are

  • easy to recognize

  • -Transcriptional terminators often consist of short inverted

  • repeats followed by a run of Ts.

  • -Therefore, programs that find prokaryotic genes search for:

    • ORFs 60 or more codons long –and codon usage

    • promoters at the 5' end

    • Terminators at the 3' end

    • Homology to known genes from other prokaryotes

    • Shine-Dalgarno sequences

  • `


Annotation:

ab initio – automated

Prokaryotic gene finder examples

Glimmer-

Interpolated Markov Model method

GrailII-

Neural Network method

(See BioInfo text – Fig 8.8)


Annotation:

results


Annotation:

Multicellular eukaryotes

Done too 10/1/5


Annotation:

Multicellular eukaryotes

Done too 10/1/5


Annotation:

Multicellular eukaryotes

Done too 10/1/5


Annotation:

2 ways to annotate eukaryotic genomes:

-ab initio gene finders:

Work on basic biological principles:

Open reading frames

Codon usage

Consensus splice sites

Met start codons

…..

-Genes based on previous knowledge….EVIDENCE

-cDNA sequence of the gene’s message

-cDNA of a closely related gene’ message sequence

-Protein sequence of the known gene

Same gene’s

Same gene’s from another species

Related gene’s protein…….

-ab initio gene finders:

Work on basic biological principles:

Open reading frames

Codon usage

Consensus splice sites

Met start codons

…..

Genes based on previous knowledge-EVIDENCE

-cDNA sequence of the gene’s message

-cDNA of a related gene’s message seq.

-Protein sequence of the known gene

Same gene’s

Same gene’s from another species

Related gene’s protein…….


start and stop site predictions

Unique identifiers

Splice site predictions

Homology based exon predictions

computational exon predictions

Tracking information

Consensus gene

structure (both strands)


Automatically

generated

annotation


A zebrafish hit shows a gene model protein encoded by a 6 exon gene.

This gene structure (intron/exon) is seen in other species, as is the protein size.

The proteins, if corresponding to MSP in S. gal., must be heavily glycosylated (likely).

At least some have a signal peptide.




Genomics: resolution

READING genome sequences

ASSEMBLY of the sequence

ANNOTATION of the sequence

carry out dideoxy sequencing

, 700 bp each read, MAX

connect seqs. to make whole chromosomes


Genomics: resolution

READING genome sequences

ASSEMBLY of the sequence

ANNOTATION of the sequence

carry out dideoxy sequencing

connect seqs. to make whole chromosomes

find the genes!


Annotation: resolution

End Reads (Mates)

Primer

SEQUENCE

cDNAs &

ESTs:

Expressed Sequence Tags

RNA target sample

cDNA Library

Each cDNA provides sequence from the two ends – two ESTs


Who Gets Sequenced? resolution

Models

Pathogens

Agriculturals



Protein Structure Database resolution

See Swiss-pdb viewer



RNAi for every C. elegans resolution

gene too!

-results on the web

Projects to systematically Knock-out (or pseudo-knockout)

every gene, in order to establish phenotype of each gene

-> function of each gene


ad