Bioinformatics and gene discovery
Download
1 / 27

BIOINFORMATICS AND GENE DISCOVERY - PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on

UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL. Bioinformatics Tutorials. BIOINFORMATICS AND GENE DISCOVERY. Iosif Vaisman. 1998. From genes to proteins. From genes to proteins. DNA. PROMOTER ELEMENTS. TRANSCRIPTION. RNA. SPLICE SITES. SPLICING. mRNA. START CODON. STOP CODON.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' BIOINFORMATICS AND GENE DISCOVERY' - cinderella-rufus


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Bioinformatics and gene discovery

UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL

Bioinformatics Tutorials

BIOINFORMATICSANDGENE DISCOVERY

Iosif Vaisman

1998



From genes to proteins

DNA

PROMOTER

ELEMENTS

TRANSCRIPTION

RNA

SPLICE

SITES

SPLICING

mRNA

START

CODON

STOP

CODON

TRANSLATION

PROTEIN



Comparative sequence sizes
Comparative Sequence Sizes

  • Yeast chromosome 3 350,000

  • Escherichia coli (bacterium) genome 4,600,000

  • Largest yeast chromosome now mapped 5,800,000

  • Entire yeast genome 15,000,000

  • Smallest human chromosome (Y) 50,000,000

  • Largest human chromosome (1) 250,000,000

  • Entire human genome 3,000,000,000




Computational gene prediction
Computational Gene Prediction

  • Where the genes are unlikely to be located?

  • How do transcription factors know where to bind a region of DNA?

  • Where are the transcription, splicing, and translation start and stop signals?

  • What does coding region do (and non-coding regions do not) ?

  • Can we learn from examples?

  • Does this sequence look familiar?


Artificial intelligence in biosciences
Artificial Intelligence in Biosciences

Neural Networks (NN)

Genetic Algorithms (GA)

Hidden Markov Models (HMM)

Stochastic context-free grammars (CFG)



Information theory1
Information Theory

00

01

1 bit

11

10

1 bit


Information theory2
Information Theory

1 bit

1 bit


Scientific models

Stochastic models

Mechanistic models

Mechanism

Black box

Predictive power

Elegance

Consistency

Predictive power

Hidden Markov models

Stochastic mechanism

Scientific Models

Physical models -- Mathematical models


Neural networks
Neural Networks

  • interconnected assembly of simple processing elements (units or nodes)

  • nodes functionality is similar to that of the animal neuron

  • processing ability is stored in the inter-unit connection strengths (weights)

  • weights are obtained by a process of adaptation to, or learning from, a set of training patterns


Genetic algorithms
Genetic Algorithms

Search or optimization methods using simulated evolution.

Population of potential solutions is subjected to

natural selection, crossover, and mutation

choose initial population

evaluate each individual's fitness

repeat

select individuals to reproduce

mate pairs at random

apply crossover operator

apply mutation operator

evaluate each individual's fitness

until terminating condition


Crossover

Parent A

Parent B

crossover point

Child AB

Child BA

Crossover

Mutation


Markov model or markov chain
Markov Model (or Markov Chain)

A

A

G

T

C

T

Probability for each character based only on several

preceding characters in the sequence

# of preceding characters = order of the Markov Model

Probability of a sequence

P(s) = P[A] P[A,T] P[A,T,C] P[T,C,T] P[C,T,A] P[T,A,G]


Hidden markov models

G

T

A

C

A

C

T

Hidden Markov Models

States -- well defined conditions

Edges -- transitions between the states

ATGAC

ATTAC

ACGAC

ACTAC

Each transition asigned a probability.

Probability of the sequence:

single path with the highest probability --- Viterbi path

sum of the probabilities over all paths -- Baum-Welch method


Hidden markov model of biased coin tosses
Hidden Markov Model of Biased Coin Tosses

  • States (Si): Two Biased Coins {C1, C2}

  • Outputs (Oj): Two Possible Outputs {H, T}

  • p(OutputsOij): p(C1, H), p(C1, T), p(C2, H) p(C2, T)

  • Transitions: From State X to Y {A11, A22, A12, A21}

  • p(Initial Si): p(I, C1), p(I, C2)

  • p(End Si): p(C1, E), p(C2, E)



Grail gene identification program

REFINED EXON

POSITIONS

FINAL EXON CANDIDATES

POSSIBLE EXONS

GRAIL gene identification program



Measures of prediction accuracy

FN (GeneParser)

TN

FN

TP

FN

TN

TN

TP

FP

REALITY

PREDICTION

REALITY

Sensitivity

c

nc

Sn = TP / (TP + FN)

FP

TP

c

PREDICTION

Specificity

FN

nc

TN

Sp = TP / (TP + FP)

Measures of Prediction Accuracy

Nucleotide Level


Measures of prediction accuracy1

number of correct exons (GeneParser)

Sensitivity

Sn =

number of actual exons

number of correct exons

Sp =

Specificity

number of predicted exons

Measures of Prediction Accuracy

Exon Level

MISSING

EXON

WRONGEXON

CORRECTEXON

REALITY

PREDICTION



Bibliography (GeneParser)

http://linkage.rockefeller.edu/wli/gene/list.html

and

http://www-hto.usc.edu/software/procrustes/fans_ref/

Gene Discovery Exercise

http://metalab.unc.edu/pharmacy/Bioinfo/Gene


ad