10 24 05 promoter prediction rna structure function prediction
Download
1 / 46

10/24/05 Promoter Prediction RNA Structure & Function Prediction - PowerPoint PPT Presentation


  • 71 Views
  • Uploaded on

10/24/05 Promoter Prediction RNA Structure & Function Prediction. Announcements. Seminar (Mon Oct 24) (several additional seminars listed in email sent to class) 12:10 PM IG Faculty Seminar in 101 Ind Ed II

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '10/24/05 Promoter Prediction RNA Structure & Function Prediction' - peggy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
10 24 05 promoter prediction rna structure function prediction
10/24/05Promoter PredictionRNA Structure & FunctionPrediction

D Dobbs ISU - BCB 444/544X: Promoter Prediction


Announcements
Announcements

Seminar (Mon Oct 24)

(several additional seminars listed in email sent to class)

12:10 PMIG Faculty Seminar in 101 Ind Ed II

"Laser capture microdissection-facilitated transcriptional profiling of abscission zones in Arabidopsis" Coralie Lashbrook, EEOB

http://www.bb.iastate.edu/%7Emarit/GEN691.html

Mark your calendars:

1:10 PM Nov 14Baker Seminar in Howe Hall Auditorium

"Discovering transcription factor binding sites"

Douglas Brutlag,Dept of Biochemistry & Medicine, Stanford University School of Medicine

D Dobbs ISU - BCB 444/544X: Promoter Prediction


Announcements1
Announcements

  • 544 Semester Projects

  • Thanks to all who sent already!

  • Others: Information needed today!

  • ddobbs@iastate.edu

  • Briefly describe:

    • Your background & current grad research

    • Is there a problem related to your research you would like to learn more about & develop as project for this course?

    • or

    • What would your ‘dream’ project be?

D Dobbs ISU - BCB 444/544X: Promoter Prediction


Announcements2
Announcements

Exam 2 - this Friday

Posted Online:Exam 2 Study Guide

544 Reading Assignment (2 papers)

Office Hours: David Mon 1-2 PM in 209 Atanasoff

Drena Tues 10-11AM in 106 MBB

Michael - none this week

Thurs No Lab - Extra Office Hrs instead:

David 1-3 PM in 209 Atanasoff

Drena 1-3 PM in 106 MBB

D Dobbs ISU - BCB 444/544X: Promoter Prediction


Announcements3
Announcements

  • Updated PPTs & PDFs for Gene Prediction lectures (covered on Exam 2) will be posted today (changes are minor)

  • Is everyone on BCB 444/544 mailing list? Auditors?

D Dobbs ISU - BCB 444/544X: Promoter Prediction


Promoter prediction rna structure function prediction

Promoter Prediction & RNA Structure/Function Prediction

Mon Quite a few more words re:

Gene prediction

Promoter prediction

WedRNA structure & function

RNA structure prediction

2' & 3' structure prediction

miRNA & target prediction

Thurs No Lab

Fri Exam 2

D Dobbs ISU - BCB 444/544X: Promoter Prediction


Reading assignment previous
Reading Assignment - previous

  • Mount Bioinformatics

    • Chp 9Gene Prediction & Regulation

      • pp 361-401

      • Ck Errata:http://www.bioinformaticsonline.org/help/errata2.html

  • * Brown Genomes 2 (NCBI textbooks online)

    • Sect 9 Overview: Assembly of Transcription Initiation Complex

    • http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.chapter.7002

    • Sect 9.1-9.3 DNA binding proteins, Transcription initiation

    • http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.section.7016

  • *NOTEs: Don’t worry about the details!!

    • See Study Guide for Exam 2 re:Sections covered

D Dobbs ISU - BCB 444/544X: Promoter Prediction


Optional but very helpful reading
Optional - but very helpful reading:

(that's a hint!)

  • Zhang MQ (2002) Computational prediction of eukaryotic protein-coding genes. Nat Rev Genet 3:698-709

    http://proxy.lib.iastate.edu:2103/nrg/journal/v3/n9/full/nrg890_fs.html

  • Wasserman WW & Sandelin A (2004) Applied bioinformatics for identification of regulatory elements. Nat Rev Genet 5:276-287

    http://proxy.lib.iastate.edu:2103/nrg/journal/v5/n4/full/nrg1315_fs.html

Check this out: http://www.phylofoot.org/NRG_testcases/

03489059922

D Dobbs ISU - BCB 444/544X: Promoter Prediction


Reading assignment for wed
Reading Assignment (for Wed)

  • Mount Bioinformatics

    • Chp 8 Prediction of RNA Secondary Structure

    • pp. 327-355

    • Ck Errata:http://www.bioinformaticsonline.org/help/errata2.html

  • Cates (Online) RNA Secondary Structure Prediction Module

    • http://cnx.rice.edu/content/m11065/latest/

D Dobbs ISU - BCB 444/544X: Promoter Prediction


Review last lecture gene prediction formerly gene prediction 3
Review last lecture: Gene Prediction(formerly Gene Prediction - 3)

  • Overview of steps & strategies

  • Algorithms

  • Gene prediction software

D Dobbs ISU - BCB 444/544X: Promoter Prediction


Predicting genes basic steps
Predicting Genes - Basic steps:

  • Obtain genomic DNA sequence

  • Translate in all 6 reading frames

    • Compare with protein sequence database

    • Also perform database similarity search

    • with EST & cDNA databases, if available

  • Use gene prediction programs to locate genes

  • Analyze gene regulatory sequences

  • Note: Several important details missing above:

    • 1. Mask to "remove" repetitive elements (ALUs, etc.)・

    • Perform database search on translatedDNA (BlastX,TFasta)

    • Use several programs to predict genes (GenScan,GeneMark.hmm)

    • 4. Translate putative ORFs and search for functional motifs (Blocks, Motifs, etc.) & regulatory sequences

D Dobbs ISU - BCB 444/544X: Promoter Prediction


Gene prediction flowchart
Gene prediction flowchart

Fig 5.15

Baxevanis & Ouellette 2005

D Dobbs ISU - BCB 444/544X: Promoter Prediction


Overview of gene prediction strategies
Overview of gene prediction strategies

  • What sequence signals can be used?

  • Transcription:TF binding sites, promoter, initiation site, terminator

  • Processing signals:splice donor/acceptors, polyA signal

  • Translation: start (AUG = Met) & stop (UGA,UUA, UAG)

  • ORFs, codon usage

  • What other types of information can be used?

  • cDNAs & ESTs(pairwise alignment)

  • homology(sequence comparison, BLAST)

D Dobbs ISU - BCB 444/544X: Promoter Prediction


Examples of gene prediction software
Examples of gene prediction software

  • Similarity-based or Comparative

    • BLAST

    • SGP2 (extension of GeneID)

  • Ab initio = “from the beginning”

    • GeneID - (used in lab last week)

    • GENSCAN - (used in lab last week)

    • GeneMark.hmm - (should try this!)

  • Combined "evidence-based”

    • GeneSeqer (Brendel et al., ISU)

  • BEST?GENSCAN, GeneMark.hmm, GeneSeqer

    but depends on organism & specific task

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    Annotated lists of gene prediction software
    Annotated lists of gene prediction software

    • URLs from Mount Chp 9, available online

      Table 9.1http://www.bioinformaticsonline.org/links/ch_09_t_1.html

    • from Pevsner Chps 14 & 16

      http://www.bioinfbook.org/chapt14.htm - prokaryotic

      http://www.bioinfbook.org/chapt16.htm - eukaryotic

    • Table in Zhang Nat Rev Genet article: hptt://proxy.lib.iastate.edu:2103/nrg/journal/v3/n9/full/nrg890_fs.html

    • Another list: Kozar, Stanford

      http://cmgm.stanford.edu/classes/genefind/

    • Performance Evaluation? Guig�ó, Barcelona(&sites above)http://www1.imim.es/courses/SeqAnalysis/GeneIdentification/Evaluation.html

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    Gene prediction eukaryotes vs prokaryotes
    Gene prediction: Eukaryotes vs prokaryotes

    Gene prediction is easier in microbial genomes

    Methods? Previously, mostly HMM-based

    Now: similarity-based methods

    because so many genomesavailable

    see Mount Fig 9.7 (E.coli gene)

    Many microbial genomes have been fully sequenced &

    whole-genome "gene structure" and "gene function"

    annotations are available.

    e.g., GeneMark.hmm

    TIGRComprehensive Microbial Resource (CMR)

    NCBIMicrobial Genomes

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    Ucsc browser view of 1000 kb region human uro d gene
    UCSC Browser view of 1000 kb region (Human URO-D gene)

    Fig 5.10

    Baxevanis & Ouellette 2005

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    Spliced alignment algorithm

    GeneSeqer - Brendel et al.

    Intron

    GT AG

    Donor

    Acceptor

    Splice sites

    Spliced Alignment Algorithm

    http://deepc2.psi.iastate.edu/cgi-bin/gs.cgi

    • Perform pairwise alignment with large gaps in one sequence (due to introns)

      • Align genomic DNA with cDNA, ESTs, protein sequences

    • Score semi-conserved sequences at splice junctions

      • Using a Bayesian model

    • Score coding constraints in translated exons

      • Using a Bayesian model

    Brendel et al (2004)Bioinformatics 20: 1157

    Brendel 2005

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    10 24 05 promoter prediction rna structure function prediction

    Start codon

    Stop codon

    Genomic DNA

    Start codon

    Stop codon

    -Poly(A)

    mRNA

    Cap-

    5’-UTR

    3’-UTR

    Brendel - Spliced Alignment I:

    Compare with cDNA or EST probes

    Brendel 2005

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    10 24 05 promoter prediction rna structure function prediction

    Start codon

    Stop codon

    Genomic DNA

    Protein

    Brendel - Spliced Alignment II:

    Compare with protein probes

    Brendel 2005

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    10 24 05 promoter prediction rna structure function prediction

    • Extent of Splice Signal Window:

    Splice Site Detection

    Do DNA sequences surrounding splice "consensus" sequences contribute to splicing signal?

    YES

    i: ith position in sequence

    Ī: avg information content over all positions >20 nt from splice site

    Ī: avg sample standard deviation of Ī

    Brendel 2005

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    10 24 05 promoter prediction rna structure function prediction

    Human

    T2_GT

    Human

    T2_AG

    Information content vs position

    Which sequences are exons & which are introns?

    How can you tell?

    Brendel et al (2004)Bioinformatics 20: 1157

    Brendel 2005

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    10 24 05 promoter prediction rna structure function prediction

    Let S = s-l s-l+1 s-l+2…s-1GT s1 s2 s3 …sr

    Bayesian Splice Site Prediction

    where H indexes the hypotheses of GT or AG at

    - True site in reading phase 1, 2, or 0

    - False within-exon site in reading phase 1, 2, or 0

    - False within-intron site

    Brendel et al (2004)Bioinformatics 20: 1157

    Brendel 2005

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    10 24 05 promoter prediction rna structure function prediction

    H0: H=T

    2-class model:

    Bayes Factor as Decision Criterion

    7-class model:

    Brendel et al (2004)Bioinformatics 20: 1157

    Brendel 2005

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    10 24 05 promoter prediction rna structure function prediction

    PG

    PG

    (1-PG)(1-PD(n+1))

    en

    en+1

    (1-PG)PD(n+1)

    PA(n)PG

    (1-PG)PD(n+1)

    in

    in+1

    1-PA(n)

    Markov Model for Spliced Alignment

    Brendel 2005

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    10 24 05 promoter prediction rna structure function prediction

    Evaluation of Splice Site Prediction

    Actual

    True

    False

    • TP

    • FP

    PP=TP+FP

    True

    Predicted

    • FN

    • TN

    False

    PN=FN+TN

    AP=TP+FN

    AN=FP+TN

    = Coverage

    • Sensitivity:

    • Specificity:

    • Misclassification rates:

    • Normalized specificity:

    Brendel 2005

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    10 24 05 promoter prediction rna structure function prediction

    Performance?

    Human

    GT site

    Human

    AG site

    Sn

    Sn

    A. thaliana

    AG site

    A. thaliana

    GT site

    Sn

    Sn

    • Note: these are not ROC curves (plots of (1-Sn) vs Sp)

      • But plots such as these (& ROCs) much better than using "single number" to compare different methods

      • Both types of plots illustrate trade-off: Sn vs Sp

    Brendel 2005

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    10 24 05 promoter prediction rna structure function prediction

    Sp =

    Evaluation of Splice Site Prediction

    What do measures really mean?

    Fig 5.11

    Baxevanis & Ouellette 2005

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    10 24 05 promoter prediction rna structure function prediction

    Actual

    True

    False

    • TP

    • FP

    PP=TP+FP

    True

    Predicted

    • FN

    • TN

    False

    PN=FN+TN

    AP=TP+FN

    AN=FP+TN

    = Coverage

    • Sensitivity:

    Careful: different definitions for "Specificity"

    Brendel definitions

    • Specificity:

    cf. Guig�ó definitions

    Sn: Sensitivity = TP/(TP+FN)

    Sp: Specificity = TN/(TN+FP) = Sp-

    AC: Approximate Coefficient = 0.5 x ((TP/(TP+FN)) + (TP/(TP+FP)) + (TN/(TN+FP)) + (TN/(TN+FN))) - 1

    Other measures? Predictive Values, Correlation Coefficient

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    10 24 05 promoter prediction rna structure function prediction

    Best measures for comparing different methods?

    • ROC curves(Receiver Operating Characteristic?!!)

    • http://www.anaesthetist.com/mnm/stats/roc/

    • "The Magnificent ROC" - has fun applets & quotes:

      • "There is no statistical test, however intuitive and simple, which will not be abused by medical researchers"

  • Correlation Coefficient

    • (Matthews correlation coefficient (MCC)

    • MCC = 1 for a perfect prediction

    • 0 for a completely random assignment

    • -1 for a "perfectly incorrect" prediction

  • Do not memorize this!

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    Performance of geneseqer vs other methods
    Performance of GeneSeqer vs other methods?

    • Comparison with ab initio gene prediction

      (e.g., GENESCAN)

    • Depends on:

      • Availability of ESTs

      • Availability of protein homologs

    Other Performance Evaluations? Guig�ó

    http://www1.imim.es/courses/SeqAnalysis/GeneIdentification/Evaluation.html

    Brendel 2005

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    10 24 05 promoter prediction rna structure function prediction

    GeneSeqer vs GENSCAN

    (Exon prediction)

    1.00

    0.90

    0.80

    0.70

    0.60

    Exon (Sn + Sp) / 2

    0.50

    0.40

    GeneSeqer

    0.30

    NAP

    0.20

    GENSCAN

    0.10

    0.00

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    Target protein alignment score

    GENSCAN - Burge, MIT

    Brendel 2005

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    10 24 05 promoter prediction rna structure function prediction

    1.00

    0.90

    0.80

    0.70

    0.60

    Intron (Sn + Sp) / 2

    0.50

    GeneSeqer

    0.40

    0.30

    NAP

    0.20

    GENSCAN

    0.10

    0.00

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    Target protein alignment score

    GeneSeqer vs GENSCAN

    (Intron prediction)

    GENSCAN - Burge, MIT

    Brendel 2005

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    Other resources
    Other Resources

    • Current Protocols in Bioinformatics

    • http://www.4ulr.com/products/currentprotocols/bioinformatics.html

    • Finding Genes

    • 4.1 An Overview of Gene Identification: Approaches, Strategies, and Considerations

    • 4.2 Using MZEF To Find Internal Coding Exons

    • 4.3 Using GENEID to Identify Genes

    • 4.4 Using GlimmerM to Find Genes in Eukaryotic Genomes

    • 4.5 Prokaryotic Gene Prediction Using GeneMark and GeneMark.hmm

    • 4.6 Eukaryotic Gene Prediction Using GeneMark.hmm

    • 4.7 Application of FirstEF to Find Promoters and First Exons in the Human Genome

    • 4.8 Using TWINSCAN to Predict Gene Structures in Genomic DNA Sequences

    • 4.9 GrailEXP and Genome Analysis Pipeline for Genome Annotation

    • 4.10 Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    New today promoter prediction
    New Today: Promoter Prediction

    • A few more words about Gene prediction

    • Predicting regulatory regions (focus on promoters)

  • Brief review promoters & enhancers

  • Predicting in eukaryotes vs prokaryotes

  • Introduction to RNA

  • Structure & function

  • D Dobbs ISU - BCB 444/544X: Promoter Prediction


    Predicting promoters
    Predicting Promoters

    What signals are there?

    Algorithms

    Promoter prediction software

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    What signals are there simple ones in prokaryotes
    What signals are there? Simple ones in prokaryotes

    Brown Fig 9.17

    D Dobbs ISU - BCB 444/544X: Promoter Prediction

    BIOS Scientific Publishers Ltd, 1999


    Prokaryotic promoters
    Prokaryotic promoters

    • RNA polymerase complex recognizes promoter sequences located very close to & on 5’ side (“upstream”) of initiation site

    • RNA polymerase complexbinds directly to these. with no requirement for “transcription factors”

    • Prokaryotic promoter sequences are highly conserved

      • -10 region

      • -35 region

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    10 24 05 promoter prediction rna structure function prediction

    What signals are there?

    Complex ones in eukaryotes!

    Fig 9.13

    Mount 2004

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    10 24 05 promoter prediction rna structure function prediction

    Simpler view of complex promoters in eukaryotes:

    Fig 5.12

    Baxevanis & Ouellette 2005

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    Eukaryotic genes are transcribed by 3 different rna polymerases
    Eukaryotic genes are transcribed by 3 different RNA polymerases

    Recognize different types of promoters & enhancers:

    Brown Fig 9.18

    D Dobbs ISU - BCB 444/544X: Promoter Prediction

    BIOS Scientific Publishers Ltd, 1999


    Eukaryotic promoters enhancers
    Eukaryotic promoters & enhancers

    • Promoters located “relatively” close to initiation site

      (but can be located within gene, rather than upstream!)

    • Enhancers also required for regulated transcription

      (these control expression in specific cell types, developmental stages, in response to environment)

    • RNA polymerase complexes do not specifically recognize promoter sequences directly

    • Transcription factors bind first and serve as “landmarks” for recognition by RNA polymerase complexes

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    Eukaryotic transcription factors
    Eukaryotic transcription factors

    • Transcription factors (TFs) are DNA binding proteins that also interact with RNA polymerase complex to activate or repress transcription

    • TFs contain characteristic “DNA binding motifs”

      http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.table.7039

    • TFs recognize specific short DNA sequence motifs “transcription factor binding sites”

      • Several databases for these, e.g.TRANSFAC

        http://www.generegulation.com/cgibin/pub/databases/transfac

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    Zinc finger containing transcription factors
    Zinc finger-containing transcription factors

    • Common in eukaryotic proteins

    • Estimated 1% of mammalian genes encode zinc-finger proteins

    • In C. elegans, there are 500!

    • Can be used as highly specific DNA binding modules

    • Potentially valuable tools for directed genome modification (esp. in plants) & human gene therapy

    Brown Fig 9.12

    BIOS Scientific Publishers Ltd, 1999

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    Global alignment of human mouse obese gene promoters 200 bp upstream from tss
    Global alignment of human & mouse obese gene promoters (200 bp upstream from TSS)

    Fig 5.14

    Baxevanis & Ouellette 2005

    D Dobbs ISU - BCB 444/544X: Promoter Prediction


    Reading assignment for wed1
    Reading Assignment (for Wed) bp upstream from TSS)

    • Mount Bioinformatics

      • Chp 8 Prediction of RNA Secondary Structure

      • pp. pp. 327-355

      • Ck Errata:http://www.bioinformaticsonline.org/help/errata2.html

    • Cates (Online) RNA Secondary Structure Prediction Module

      • http://cnx.rice.edu/content/m11065/latest/

    D Dobbs ISU - BCB 444/544X: Promoter Prediction