Functional and structural genomics using pedant
1 / 18

Functional and structural genomics using PEDANT - PowerPoint PPT Presentation

  • Updated On :

Functional and structural genomics using PEDANT. 陽明生技所 生物資訊學程 林千涵. Introduction. With increasing biological sequence data, it need a system with ability of storing and retreving tens of gigabytes of data, a mature database management system, and a good visualization tools.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Functional and structural genomics using PEDANT' - HarrisCezar

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Functional and structural genomics using pedant l.jpg

Functional and structural genomics using PEDANT




Introduction l.jpg

  • With increasing biological sequence data, it need a system with ability of storing and retreving tens of gigabytes of data, a mature database management system, and a good visualization tools

  • From case-oriented sequence analysis work to automated large-scale genome annotation

Introduction pedant l.jpg

  • Difference of existing genome analysis programs

    • protein oriented vs. DNA oriented analysis

    • interactive work vs. commandline operation

    • bioinformatics method applied

    • user interface

    • conveniency feature, project management and data editors

    • fidelity of result produced

  • Benchmark may vary in terms of chosen of balance between sensitivity and selectivity of the analyses

  • PEDANT (Protein Extraction, Description, and ANalysis Tool) was available in mid-1997(use FASTA as similarity search)

    • a workhorse for general bioinformatics research

    • a common framework for a number of genome analysis projects

    • a complete database of automated genomes

    • a tool for routine analysis of large amounts of genomic contigs and ESTs

System architecture l.jpg
System Architecture

  • Overview

    • database module: storing, modifying and accessing data

    • processing module: bioinformatics computations

    • user interface: web based communication

System architecture cont l.jpg
System Architecture-Cont.

  • Data access

    • primary table: store raw data (ex DNA, protein sequences and program results ex BLAST output )

    • secondary table: parsed program results

    • simplified schema

  • Operation in command line mode

    • applying bioinformatics methods to sequences

    • parsing data tables

    • querying the resulting databases

  • Web interface

    • No static HTML pages required

    • DNA and Protein viewers make direct access to the SQL tables

  • Implementation and system requirements

    • Perl 5, and C++ for graphical viewer

  • Performance

    • parallel capabilities

Bioinformatics method l.jpg
Bioinformatics Method

  • Overview of the PEDANT processing pipeline

    • identification of coding regions and various analysis genetics elements

    • homology search

    • detection of protein motifs, prediction of secondary structure and other protein features and sensitive fold recognition

    • automatically attributed to pre-defined functional categories

  • Prediction of genes and other genetic elements

    • Table 1

    • choose one of 15 genetic codes


  • Functional and structural categories

    • similarity search : PSI-BLAST(Position-Specific Iterated BLAST)

    • special datasets: MIPS, COG, PROSITE, PFAM and BLOCKS

    • significant matches of PIR: annotations, keywords, enzyme classification and superfamily information

    • with significant relationship of PDB, secondary structure information: STRIDE(upper case), PREDATOR(lower case)

    • low complexity region, membrance regions, coiled coils and signal peptides

    • comparison of SCOP with IMPALA



Bioinformatics method cont l.jpg
Bioinformatics Method-Cont.

  • Yeast biological role categories

    • first system of biological role of categories : E.Coli

    • MIPS: advanced hierarchical functional catalogue (Yeast)

    • Multidimensionality-protein:gene is M:M

    • automated assignment to MIPS is first approximation, will be refined by manual annotation

    • Distribution of ORFs

  • Visualization

    • a integrated, hypertext-linked protein report with calculated parameters and sequences as reference for further manual annotation

    • Protein report page

Bioinformatics method cont 2 l.jpg
Bioinformatics Method-Cont.2

  • Automatic versus manual annotation

    • Problem of error propagation

      • erroneous annotation by human error and spurious similarity hits

      • with filtering algorithms and domain structure ?

      • quality improvement of manual review of human experts !

    • Manual annotation

      • Catalogue independent

      • Flexibility: first place in higher category and later step move to the finer categories

    • 528 categories: 20 main categories and 6 levels

    • confidence levels: “reject”, “low”, “medium”, “high” and default is “auto”

  • Data release management

    • new release data can be intelligently merged with existing data pool

    • transfer manual annotation between subsequent data release

    • “manual” field: “yes” or ”no” and default is “no” initially

    • example: a PFAM domain identified in new release ORF is “manual: no” and “conf: auto”

Manual annotation transfer l.jpg
Manual annotation transfer

  • Two genes fuse to one contig

  • Two contigs fuse to one

  • Gene boundary change

  • Appears new gene

The pedant genome database l.jpg
The PEDANT Genome Database

  • Annotation of publicly available completely sequenced and unfinished genomes

    • Genome annotated by MIPS

    • Completely sequenced and published genomic sequences

    • Unfinished and/or unpublished genomics sequences

    • gene prediction by ORPHEUS, allow large overlaps between ORFs

  • PEDANT as a structural genomics resource-0.3M proteins

    • class-based approach, cost-saving

    • (i)non-redundant protein sequence databases

    • (ii)PSI-BLAST search with SCOP against (I) abd saving resulting profiles

    • (iii)construct a SCOP profile library using IMPALA

    • (iv)IMPALA search with each genomic sequence against SCOP library

    • same procedure for nr PDB sequence database

    • performance of IMPALA

  • Cross-genome comparison

    • treat each genome as an individual contig : creat cross-genome datasets without any modification

    • 44 genomes

Applications l.jpg

  • Arabidopsis thaliana chromosome IV

    • 3744 predicted protein coding genes

    • roughly 30% are known proteins or strongly similar to known proteins

    • multi-cellular organisms has higher all-alpha and smaller mixed alpha/beta structural domains ratio to unicellular species

  • Assembled human transcripts

    • human UniGene subjected PEDANT analysis, compare over 75000 contigs

    • this MySQL DB is close to 8GB

    • acceptable query time show the suitability of PEDANT for large-scale EST sequencing projects

  • Analysis of the GroEL substrates

    • GroEL: a common E.Coli chaperonin

    • structural motif common in 52 substrates relying on GroEL for folding in vivo : two or more alpha/beta domains involving buried beta-sheets with large hydrophobic surfaces--easy aggregation

Classification of predicted genes l.jpg
Classification of predicted genes

  • Classification by the degree of homology to functionally characterized proteins based on BLAST scores

Summary and outlook l.jpg
Summary and Outlook

  • PEDANT is a useful tool for genome annotation and bioinformatics research

  • It can automated and manual assignment of gene product to functional and structural categories

  • extensive hyperlinked protein report and advanced viewers

  • Outlook

    • better decision rules need to be employed

    • manually annotate predicted genetics eelments(ex. LTRs)

    • supporting Oracle RDBMS

    • automatic gene prediction pipeline for higher eukaryotes

    • interactive capabilities