1 / 24

Solanaceae 2006 BAC Annotation

Solanaceae 2006 BAC Annotation. 2006. 07. 26 Plant Genome Research Center KRIBB, KOREA. Developmental Environments. OS : SGI IRIX 6.5 CPU : MIPS 500MHz 12 CPUs MEM : 12288 MB OS : SUSE Linux 9.0 version 2.6.11.4-21.11-bigsmp CPU : Intel(R) Xeon(TM) CPU 2.80GHz

radwan
Download Presentation

Solanaceae 2006 BAC Annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Solanaceae 2006 BAC Annotation 2006. 07. 26 Plant Genome Research Center KRIBB, KOREA

  2. Developmental Environments • OS : SGI IRIX 6.5 • CPU : MIPS 500MHz 12 CPUs • MEM : 12288 MB • OS : SUSE Linux 9.0 version 2.6.11.4-21.11-bigsmp • CPU : Intel(R) Xeon(TM) CPU 2.80GHz • MEM : 6231 MB • DBMS : MySQL-4.0.25 • Language : PHP 5.0.4, Apache 2.0.54, Perl-5.8.7

  3. Data Sets • BACs (SGN test BACs) • Annotated: 10 • ESTs : 200,015 (cf: 202,043 -current) • Full-length mRNAs (GenBank): 596 • Protein DB (UniProt Release 7.7) • Swiss-Prot/trEMBL: 228,917 / 2,914,826 • Swiss-Prot/trEMBL(plant) 15,203 / 219,361 • Arabidopsis Proteins • Proteins, Genomes (TAIR): 30,693 • GO associated (TAIR): 28,812 • Pathway/EC associated (KEGG): 1,521 • Tomato Chip DATA - tomato Expression Database (cornell)

  4. Structural Annotation

  5. Functional Annotation

  6. Predict Predict mRNA EST Protein Define gene structure by various data evidences • Full-length evidenced genes (mRNAs / Proteins) • Full-length clue evidenced genes (Full-length clue ESTs from Kazusa full-length cDNA library) • Partially evidenced genes (Other partial ESTs) • No-evidenced genes (Prediction only)

  7. Sample Predicted Genes ESTs mRNAs Predict mRNA TIGR TC Protein stackPACK 1) Full-length Evidenced Genes • Gene locus with full-length mRNA / Protein (GMAP, GeneWise) • Almost complete gene structure: Gene boundary (mRNA:TSS/poly-A, protein:CDS), Exon/Intron, (some alternative splicing structure) • Requirement: more than 1 mRNA or Proteins • Processing: • Merge the same AS forms • mRNA evidence: Predict CDS (ESTscan etc.) • Protein evidence: Mend gene boundary(TSS, poly-A)

  8. Sample Predicted Genes Full length Clue ESTs(kazusa) ESTs Predict EST 2) Full-length Clue Evidenced Genes • Gene locus with full-length clue ESTs from Kazusa full-length cDNA library (GMAP) • Gene boundary(TSS, poly-A), some Exon/Intron • Requirement: more than 1 full-length clue ESTs • Processing: • Merge the same AS forms • Link the same-cloned ESTs • Mend uncomplete portion with predicted model • CDS to be predicted (ESTscan / orfPredictor etc.)

  9. Sample Predicted Genes ESTs 3) Partially Evidenced Genes • Gene locus with general ESTs (GMAP) • Some Exon/Intron, poly-A • More ESTs, more information expected • Requirement: more than 2 ESTs with more than 2 couples of overlapped hard-edges • Processing: • Merge the same AS forms • Link the same-cloned ESTs • Mend incomplete portion with predicted model • CDS to be predicted (ESTscan/orfPredictor etc.) Predict EST1 EST2

  10. Sample Predict No Evidence !! 4) No-evidenced Genes • Predicted model only (hypothetical gene) • Predicted CDS

  11. Gene Structure Annotation - Problems False positive intergenic region: 2 annotated genes actually correspond to a single gene False negative intergenic region: One annotated gene structure actually contains 2 genes False negative gene prediction: Missing gene (no annotation) Other: partially incorrect gene annotation missing annotation of alternative transcripts -Alternative Splicing Pseudo-genes Promoter / Regulatory Elements

  12. Estimated Gene Prediction 1) hexamer signal A(A/U)AAA - PASes (predict polyadenylation signals) hexamers

  13. Gene Structure Browser FGENESH GENSCAN Protein Repeats / Domain mRNA dbESTs TIGR TC Kazusa Full ESTs Unigene • Test BLAT/SIM4/GMAP/GeneSeqer • BLAT – Fast/Unaccurate • SIM4/GMAP/GeneSeqer – Approx. the Same results • KRIBB: Prefiltering ESTs by BLAT + GMAP • Cutoff: Coverage > 80%, Identity > 90%

  14. Click !!

  15. Click !!

  16. Functional Annotation Protein DB/ EC / GO

  17. Functional Annotation Protein DB / GO TFBS / Promoter

  18. TargetP/TMHMM Enzyme / Pathway Domain / Motif Functional Annotation

  19. Expression Annotation(Digital Expression ) Principle of identifying differentially expressed genes by Hypergeometric Test N: ESTs for all genes in all tissues,n: ESTs for selected genes in all tissues,K: ESTs for all genes in selected tissue,k: ESTs for selected gene in selected tissue,P: Significance of over- or under-expression in selected tissue

  20. Expression Annotation(ARRAY CHIP)

  21. Expression Annotation (Tissue Specific Genes) Principle of identifying differentially expressed genes by Audic's Test x: number of cognate ESTs of a given gene in a selected libraryN1: selected libraryy: number of cognate ESTs of a given gene in other libraryN2: other library

  22. Pepper tissue-specific gene analysis Fruit * 25 cycles, annealing temp. 55℃ * (# of ESTs) Floral bud Breaker Flower stem Bark Leaf root M.G Xag M.R Buf IM CaActin CacnA (16) CacnB (18) Flower CacnC (13) CacnD (10) CacnE (25) CacnF (31) Pathogen CacnG (20)

  23. Annotation Results

  24. Thanks !! Solanaceae 2006 BAC Annotation Test page http://crop.kribb.re.kr/SOL-Test/ http://sol.kribb.re.kr/

More Related