10 24 05 promoter prediction rna structure function prediction
Download
Skip this Video
Download Presentation
10/24/05 Promoter Prediction RNA Structure & Function Prediction

Loading in 2 Seconds...

play fullscreen
1 / 46

10/24/05 Promoter Prediction RNA Structure & Function Prediction - PowerPoint PPT Presentation


  • 71 Views
  • Uploaded on

10/24/05 Promoter Prediction RNA Structure & Function Prediction. Announcements. Seminar (Mon Oct 24) (several additional seminars listed in email sent to class) 12:10 PM IG Faculty Seminar in 101 Ind Ed II

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' 10/24/05 Promoter Prediction RNA Structure & Function Prediction' - peggy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
10 24 05 promoter prediction rna structure function prediction
10/24/05Promoter PredictionRNA Structure & FunctionPrediction

D Dobbs ISU - BCB 444/544X: Promoter Prediction

announcements
Announcements

Seminar (Mon Oct 24)

(several additional seminars listed in email sent to class)

12:10 PMIG Faculty Seminar in 101 Ind Ed II

"Laser capture microdissection-facilitated transcriptional profiling of abscission zones in Arabidopsis" Coralie Lashbrook, EEOB

http://www.bb.iastate.edu/%7Emarit/GEN691.html

Mark your calendars:

1:10 PM Nov 14Baker Seminar in Howe Hall Auditorium

"Discovering transcription factor binding sites"

Douglas Brutlag,Dept of Biochemistry & Medicine, Stanford University School of Medicine

D Dobbs ISU - BCB 444/544X: Promoter Prediction

announcements1
Announcements
  • 544 Semester Projects
  • Thanks to all who sent already!
  • Others: Information needed today!
  • [email protected]
  • Briefly describe:
    • Your background & current grad research
    • Is there a problem related to your research you would like to learn more about & develop as project for this course?
    • or
    • What would your ‘dream’ project be?

D Dobbs ISU - BCB 444/544X: Promoter Prediction

announcements2
Announcements

Exam 2 - this Friday

Posted Online:Exam 2 Study Guide

544 Reading Assignment (2 papers)

Office Hours: David Mon 1-2 PM in 209 Atanasoff

Drena Tues 10-11AM in 106 MBB

Michael - none this week

Thurs No Lab - Extra Office Hrs instead:

David 1-3 PM in 209 Atanasoff

Drena 1-3 PM in 106 MBB

D Dobbs ISU - BCB 444/544X: Promoter Prediction

announcements3
Announcements
  • Updated PPTs & PDFs for Gene Prediction lectures (covered on Exam 2) will be posted today (changes are minor)
  • Is everyone on BCB 444/544 mailing list? Auditors?

D Dobbs ISU - BCB 444/544X: Promoter Prediction

promoter prediction rna structure function prediction

Promoter Prediction & RNA Structure/Function Prediction

Mon Quite a few more words re:

Gene prediction

Promoter prediction

WedRNA structure & function

RNA structure prediction

2\' & 3\' structure prediction

miRNA & target prediction

Thurs No Lab

Fri Exam 2

D Dobbs ISU - BCB 444/544X: Promoter Prediction

reading assignment previous
Reading Assignment - previous
  • Mount Bioinformatics
    • Chp 9Gene Prediction & Regulation
      • pp 361-401
      • Ck Errata:http://www.bioinformaticsonline.org/help/errata2.html
  • * Brown Genomes 2 (NCBI textbooks online)
    • Sect 9 Overview: Assembly of Transcription Initiation Complex
    • http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.chapter.7002
    • Sect 9.1-9.3 DNA binding proteins, Transcription initiation
    • http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.section.7016
  • *NOTEs: Don’t worry about the details!!
    • See Study Guide for Exam 2 re:Sections covered

D Dobbs ISU - BCB 444/544X: Promoter Prediction

optional but very helpful reading
Optional - but very helpful reading:

(that\'s a hint!)

  • Zhang MQ (2002) Computational prediction of eukaryotic protein-coding genes. Nat Rev Genet 3:698-709

http://proxy.lib.iastate.edu:2103/nrg/journal/v3/n9/full/nrg890_fs.html

  • Wasserman WW & Sandelin A (2004) Applied bioinformatics for identification of regulatory elements. Nat Rev Genet 5:276-287

http://proxy.lib.iastate.edu:2103/nrg/journal/v5/n4/full/nrg1315_fs.html

Check this out: http://www.phylofoot.org/NRG_testcases/

03489059922

D Dobbs ISU - BCB 444/544X: Promoter Prediction

reading assignment for wed
Reading Assignment (for Wed)
  • Mount Bioinformatics
    • Chp 8 Prediction of RNA Secondary Structure
    • pp. 327-355
    • Ck Errata:http://www.bioinformaticsonline.org/help/errata2.html
  • Cates (Online) RNA Secondary Structure Prediction Module
    • http://cnx.rice.edu/content/m11065/latest/

D Dobbs ISU - BCB 444/544X: Promoter Prediction

review last lecture gene prediction formerly gene prediction 3
Review last lecture: Gene Prediction(formerly Gene Prediction - 3)
  • Overview of steps & strategies
  • Algorithms
  • Gene prediction software

D Dobbs ISU - BCB 444/544X: Promoter Prediction

predicting genes basic steps
Predicting Genes - Basic steps:
  • Obtain genomic DNA sequence
  • Translate in all 6 reading frames
    • Compare with protein sequence database
    • Also perform database similarity search
    • with EST & cDNA databases, if available
  • Use gene prediction programs to locate genes
  • Analyze gene regulatory sequences
  • Note: Several important details missing above:
    • 1. Mask to "remove" repetitive elements (ALUs, etc.)・
    • Perform database search on translatedDNA (BlastX,TFasta)
    • Use several programs to predict genes (GenScan,GeneMark.hmm)
    • 4. Translate putative ORFs and search for functional motifs (Blocks, Motifs, etc.) & regulatory sequences

D Dobbs ISU - BCB 444/544X: Promoter Prediction

gene prediction flowchart
Gene prediction flowchart

Fig 5.15

Baxevanis & Ouellette 2005

D Dobbs ISU - BCB 444/544X: Promoter Prediction

overview of gene prediction strategies
Overview of gene prediction strategies
  • What sequence signals can be used?
  • Transcription:TF binding sites, promoter, initiation site, terminator
  • Processing signals:splice donor/acceptors, polyA signal
  • Translation: start (AUG = Met) & stop (UGA,UUA, UAG)
  • ORFs, codon usage
  • What other types of information can be used?
  • cDNAs & ESTs(pairwise alignment)
  • homology(sequence comparison, BLAST)

D Dobbs ISU - BCB 444/544X: Promoter Prediction

examples of gene prediction software
Examples of gene prediction software
  • Similarity-based or Comparative
      • BLAST
      • SGP2 (extension of GeneID)
  • Ab initio = “from the beginning”
      • GeneID - (used in lab last week)
      • GENSCAN - (used in lab last week)
      • GeneMark.hmm - (should try this!)
  • Combined "evidence-based”
      • GeneSeqer (Brendel et al., ISU)

BEST?GENSCAN, GeneMark.hmm, GeneSeqer

but depends on organism & specific task

D Dobbs ISU - BCB 444/544X: Promoter Prediction

annotated lists of gene prediction software
Annotated lists of gene prediction software
  • URLs from Mount Chp 9, available online

Table 9.1http://www.bioinformaticsonline.org/links/ch_09_t_1.html

  • from Pevsner Chps 14 & 16

http://www.bioinfbook.org/chapt14.htm - prokaryotic

http://www.bioinfbook.org/chapt16.htm - eukaryotic

  • Table in Zhang Nat Rev Genet article: hptt://proxy.lib.iastate.edu:2103/nrg/journal/v3/n9/full/nrg890_fs.html
  • Another list: Kozar, Stanford

http://cmgm.stanford.edu/classes/genefind/

  • Performance Evaluation? Guig�ó, Barcelona(&sites above)http://www1.imim.es/courses/SeqAnalysis/GeneIdentification/Evaluation.html

D Dobbs ISU - BCB 444/544X: Promoter Prediction

gene prediction eukaryotes vs prokaryotes
Gene prediction: Eukaryotes vs prokaryotes

Gene prediction is easier in microbial genomes

Methods? Previously, mostly HMM-based

Now: similarity-based methods

because so many genomesavailable

see Mount Fig 9.7 (E.coli gene)

Many microbial genomes have been fully sequenced &

whole-genome "gene structure" and "gene function"

annotations are available.

e.g., GeneMark.hmm

TIGRComprehensive Microbial Resource (CMR)

NCBIMicrobial Genomes

D Dobbs ISU - BCB 444/544X: Promoter Prediction

ucsc browser view of 1000 kb region human uro d gene
UCSC Browser view of 1000 kb region (Human URO-D gene)

Fig 5.10

Baxevanis & Ouellette 2005

D Dobbs ISU - BCB 444/544X: Promoter Prediction

spliced alignment algorithm

GeneSeqer - Brendel et al.

Intron

GT AG

Donor

Acceptor

Splice sites

Spliced Alignment Algorithm

http://deepc2.psi.iastate.edu/cgi-bin/gs.cgi

  • Perform pairwise alignment with large gaps in one sequence (due to introns)
    • Align genomic DNA with cDNA, ESTs, protein sequences
  • Score semi-conserved sequences at splice junctions
    • Using a Bayesian model
  • Score coding constraints in translated exons
    • Using a Bayesian model

Brendel et al (2004)Bioinformatics 20: 1157

Brendel 2005

D Dobbs ISU - BCB 444/544X: Promoter Prediction

slide19

Start codon

Stop codon

Genomic DNA

Start codon

Stop codon

-Poly(A)

mRNA

Cap-

5’-UTR

3’-UTR

Brendel - Spliced Alignment I:

Compare with cDNA or EST probes

Brendel 2005

D Dobbs ISU - BCB 444/544X: Promoter Prediction

slide20

Start codon

Stop codon

Genomic DNA

Protein

Brendel - Spliced Alignment II:

Compare with protein probes

Brendel 2005

D Dobbs ISU - BCB 444/544X: Promoter Prediction

slide21

Information Content Ii:

  • Extent of Splice Signal Window:

Splice Site Detection

Do DNA sequences surrounding splice "consensus" sequences contribute to splicing signal?

YES

i: ith position in sequence

Ī: avg information content over all positions >20 nt from splice site

Ī: avg sample standard deviation of Ī

Brendel 2005

D Dobbs ISU - BCB 444/544X: Promoter Prediction

slide22

Human

T2_GT

Human

T2_AG

Information content vs position

Which sequences are exons & which are introns?

How can you tell?

Brendel et al (2004)Bioinformatics 20: 1157

Brendel 2005

D Dobbs ISU - BCB 444/544X: Promoter Prediction

slide23

Let S = s-l s-l+1 s-l+2…s-1GT s1 s2 s3 …sr

Bayesian Splice Site Prediction

where H indexes the hypotheses of GT or AG at

- True site in reading phase 1, 2, or 0

- False within-exon site in reading phase 1, 2, or 0

- False within-intron site

Brendel et al (2004)Bioinformatics 20: 1157

Brendel 2005

D Dobbs ISU - BCB 444/544X: Promoter Prediction

slide24

H0: H=T

2-class model:

Bayes Factor as Decision Criterion

7-class model:

Brendel et al (2004)Bioinformatics 20: 1157

Brendel 2005

D Dobbs ISU - BCB 444/544X: Promoter Prediction

slide25

PG

PG

(1-PG)(1-PD(n+1))

en

en+1

(1-PG)PD(n+1)

PA(n)PG

(1-PG)PD(n+1)

in

in+1

1-PA(n)

Markov Model for Spliced Alignment

Brendel 2005

D Dobbs ISU - BCB 444/544X: Promoter Prediction

slide26

Evaluation of Splice Site Prediction

Actual

True

False

  • TP
  • FP

PP=TP+FP

True

Predicted

  • FN
  • TN

False

PN=FN+TN

AP=TP+FN

AN=FP+TN

= Coverage

  • Sensitivity:
  • Specificity:
  • Misclassification rates:
  • Normalized specificity:

Brendel 2005

D Dobbs ISU - BCB 444/544X: Promoter Prediction

slide27

Performance?

Human

GT site

Human

AG site

Sn

Sn

A. thaliana

AG site

A. thaliana

GT site

Sn

Sn

  • Note: these are not ROC curves (plots of (1-Sn) vs Sp)
    • But plots such as these (& ROCs) much better than using "single number" to compare different methods
    • Both types of plots illustrate trade-off: Sn vs Sp

Brendel 2005

D Dobbs ISU - BCB 444/544X: Promoter Prediction

slide28

Sp =

Evaluation of Splice Site Prediction

What do measures really mean?

Fig 5.11

Baxevanis & Ouellette 2005

D Dobbs ISU - BCB 444/544X: Promoter Prediction

slide29

Actual

True

False

  • TP
  • FP

PP=TP+FP

True

Predicted

  • FN
  • TN

False

PN=FN+TN

AP=TP+FN

AN=FP+TN

= Coverage

  • Sensitivity:

Careful: different definitions for "Specificity"

Brendel definitions

  • Specificity:

cf. Guig�ó definitions

Sn: Sensitivity = TP/(TP+FN)

Sp: Specificity = TN/(TN+FP) = Sp-

AC: Approximate Coefficient = 0.5 x ((TP/(TP+FN)) + (TP/(TP+FP)) + (TN/(TN+FP)) + (TN/(TN+FN))) - 1

Other measures? Predictive Values, Correlation Coefficient

D Dobbs ISU - BCB 444/544X: Promoter Prediction

slide30

Best measures for comparing different methods?

  • ROC curves(Receiver Operating Characteristic?!!)
  • http://www.anaesthetist.com/mnm/stats/roc/
  • "The Magnificent ROC" - has fun applets & quotes:
      • "There is no statistical test, however intuitive and simple, which will not be abused by medical researchers"
  • Correlation Coefficient
    • (Matthews correlation coefficient (MCC)
    • MCC = 1 for a perfect prediction
    • 0 for a completely random assignment
    • -1 for a "perfectly incorrect" prediction

Do not memorize this!

D Dobbs ISU - BCB 444/544X: Promoter Prediction

performance of geneseqer vs other methods
Performance of GeneSeqer vs other methods?
  • Comparison with ab initio gene prediction

(e.g., GENESCAN)

  • Depends on:
      • Availability of ESTs
      • Availability of protein homologs

Other Performance Evaluations? Guig�ó

http://www1.imim.es/courses/SeqAnalysis/GeneIdentification/Evaluation.html

Brendel 2005

D Dobbs ISU - BCB 444/544X: Promoter Prediction

slide32

GeneSeqer vs GENSCAN

(Exon prediction)

1.00

0.90

0.80

0.70

0.60

Exon (Sn + Sp) / 2

0.50

0.40

GeneSeqer

0.30

NAP

0.20

GENSCAN

0.10

0.00

0

10

20

30

40

50

60

70

80

90

100

Target protein alignment score

GENSCAN - Burge, MIT

Brendel 2005

D Dobbs ISU - BCB 444/544X: Promoter Prediction

slide33

1.00

0.90

0.80

0.70

0.60

Intron (Sn + Sp) / 2

0.50

GeneSeqer

0.40

0.30

NAP

0.20

GENSCAN

0.10

0.00

0

10

20

30

40

50

60

70

80

90

100

Target protein alignment score

GeneSeqer vs GENSCAN

(Intron prediction)

GENSCAN - Burge, MIT

Brendel 2005

D Dobbs ISU - BCB 444/544X: Promoter Prediction

other resources
Other Resources
  • Current Protocols in Bioinformatics
  • http://www.4ulr.com/products/currentprotocols/bioinformatics.html
  • Finding Genes
  • 4.1 An Overview of Gene Identification: Approaches, Strategies, and Considerations
  • 4.2 Using MZEF To Find Internal Coding Exons
  • 4.3 Using GENEID to Identify Genes
  • 4.4 Using GlimmerM to Find Genes in Eukaryotic Genomes
  • 4.5 Prokaryotic Gene Prediction Using GeneMark and GeneMark.hmm
  • 4.6 Eukaryotic Gene Prediction Using GeneMark.hmm
  • 4.7 Application of FirstEF to Find Promoters and First Exons in the Human Genome
  • 4.8 Using TWINSCAN to Predict Gene Structures in Genomic DNA Sequences
  • 4.9 GrailEXP and Genome Analysis Pipeline for Genome Annotation
  • 4.10 Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences

D Dobbs ISU - BCB 444/544X: Promoter Prediction

new today promoter prediction
New Today: Promoter Prediction
    • A few more words about Gene prediction
    • Predicting regulatory regions (focus on promoters)
  • Brief review promoters & enhancers
  • Predicting in eukaryotes vs prokaryotes
  • Introduction to RNA
  • Structure & function

D Dobbs ISU - BCB 444/544X: Promoter Prediction

predicting promoters
Predicting Promoters

What signals are there?

Algorithms

Promoter prediction software

D Dobbs ISU - BCB 444/544X: Promoter Prediction

what signals are there simple ones in prokaryotes
What signals are there? Simple ones in prokaryotes

Brown Fig 9.17

D Dobbs ISU - BCB 444/544X: Promoter Prediction

BIOS Scientific Publishers Ltd, 1999

prokaryotic promoters
Prokaryotic promoters
  • RNA polymerase complex recognizes promoter sequences located very close to & on 5’ side (“upstream”) of initiation site
  • RNA polymerase complexbinds directly to these. with no requirement for “transcription factors”
  • Prokaryotic promoter sequences are highly conserved
      • -10 region
      • -35 region

D Dobbs ISU - BCB 444/544X: Promoter Prediction

slide39

What signals are there?

Complex ones in eukaryotes!

Fig 9.13

Mount 2004

D Dobbs ISU - BCB 444/544X: Promoter Prediction

slide40

Simpler view of complex promoters in eukaryotes:

Fig 5.12

Baxevanis & Ouellette 2005

D Dobbs ISU - BCB 444/544X: Promoter Prediction

eukaryotic genes are transcribed by 3 different rna polymerases
Eukaryotic genes are transcribed by 3 different RNA polymerases

Recognize different types of promoters & enhancers:

Brown Fig 9.18

D Dobbs ISU - BCB 444/544X: Promoter Prediction

BIOS Scientific Publishers Ltd, 1999

eukaryotic promoters enhancers
Eukaryotic promoters & enhancers
  • Promoters located “relatively” close to initiation site

(but can be located within gene, rather than upstream!)

  • Enhancers also required for regulated transcription

(these control expression in specific cell types, developmental stages, in response to environment)

  • RNA polymerase complexes do not specifically recognize promoter sequences directly
  • Transcription factors bind first and serve as “landmarks” for recognition by RNA polymerase complexes

D Dobbs ISU - BCB 444/544X: Promoter Prediction

eukaryotic transcription factors
Eukaryotic transcription factors
  • Transcription factors (TFs) are DNA binding proteins that also interact with RNA polymerase complex to activate or repress transcription
  • TFs contain characteristic “DNA binding motifs”

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.table.7039

  • TFs recognize specific short DNA sequence motifs “transcription factor binding sites”
    • Several databases for these, e.g.TRANSFAC

http://www.generegulation.com/cgibin/pub/databases/transfac

D Dobbs ISU - BCB 444/544X: Promoter Prediction

zinc finger containing transcription factors
Zinc finger-containing transcription factors
  • Common in eukaryotic proteins
  • Estimated 1% of mammalian genes encode zinc-finger proteins
  • In C. elegans, there are 500!
  • Can be used as highly specific DNA binding modules
  • Potentially valuable tools for directed genome modification (esp. in plants) & human gene therapy

Brown Fig 9.12

BIOS Scientific Publishers Ltd, 1999

D Dobbs ISU - BCB 444/544X: Promoter Prediction

global alignment of human mouse obese gene promoters 200 bp upstream from tss
Global alignment of human & mouse obese gene promoters (200 bp upstream from TSS)

Fig 5.14

Baxevanis & Ouellette 2005

D Dobbs ISU - BCB 444/544X: Promoter Prediction

reading assignment for wed1
Reading Assignment (for Wed)
  • Mount Bioinformatics
    • Chp 8 Prediction of RNA Secondary Structure
    • pp. pp. 327-355
    • Ck Errata:http://www.bioinformaticsonline.org/help/errata2.html
  • Cates (Online) RNA Secondary Structure Prediction Module
    • http://cnx.rice.edu/content/m11065/latest/

D Dobbs ISU - BCB 444/544X: Promoter Prediction

ad