1 / 34

A data model for Comparative Genomics

Luciano Antonio Digiampietri João Carlos Setubal Cláudia Maria Bauzer Medeiros. PhD Student: Advisor: Co-advisor:. A data model for Comparative Genomics. Laboratory for Bioinformatics (LBI), Institute of Computing (IC) - UNICAMP. History. In 2002 the following genomes:

feng
Download Presentation

A data model for Comparative Genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Luciano Antonio Digiampietri João Carlos Setubal Cláudia Maria Bauzer Medeiros PhD Student: Advisor: Co-advisor: A data model for Comparative Genomics Laboratory for Bioinformatics (LBI), Institute of Computing (IC) - UNICAMP

  2. History • In 2002 the following genomes: • Agrobacterium tumefaciens • Mesorhizobium loti • Ralstonia solanacearum • Sinorhizobium meliloti • Xanthomonas axonopodis pv. citri • Xanthomonas campestris pv. campestris • Xylella fastidiosa cvc • Xylella fastidiosa Temecula1 Were compared by the following people: • M. A. Van Sluys, C. B. Monteiro-Vitorello, L. E. A. Camargo, C. F. M. Menck, A. C. R. da Silva, J. A. Ferro, M. C. Oliveira, J. C. Setubal, J. P. Kitajima, A.J. Simpson. Plant associated-bacteria

  3. To help the comparison a database was created: => PAB database • Main author: J. P. Kitajima Publication: M. A. van Sluys, C. B. Monteiro-Vitorello, L. E. A. Camargo, C. F. M. Menck, A. C. R. da Silva, J. A. Ferro,M. C. Oliveira, J. C. Setubal, J. P. Kitajima, and A. J. G. Simpson. Comparative genomic analysis of plant-associated bacteria. Annual Review of Phytopathology, 40, 169-189, 2002. • This publication presents analysis results, not database description

  4. This work • PAB database overhaul • Redesign • Repopulation (data reload) • Incusion of new query and visualization tools • PAB database description (there was none) • Results • It is now much more flexible • can be used as building block of larger information systems • Scalable • Much easier to include more genomes

  5. Motivation for the work • Growing number of complete genomes of bacteria: • Today there are about 130 complete genomes • In few years there will be more than 1000 • The genomes of several species of a genus or indeed the genomes of of several strains of the same species have been sequenced. • This data growth has made necessary the development of new systems and tools for comparative genomics. • The new systems must be: • Flexible • Scalable

  6. Scope Xylella fastidiosa citrus grape almond oleander strains Xanthomonasaxonopodis pv. citri campestris pv. campestris oryzae vesicatoria species Plant associated bacteria: Agrobacteriumtumefaciens Sinorhizobium meliloti Xanthomonas axonopodis pv. citri Xylella fastidiosa cvc small sets of genomes large sets of genomes All microbial

  7. Synechocystis sp. PCC 6803 plasmid pSYSM plasmid pSYSA chromosome plasmid pSYSX Basic concepts: Replicon • Any kind of cell unit that contains genetic information (e.g. chromosomes, plasmids and mitochondria)

  8. homologous genes homologous genes Basic concepts: Homology • Homology: two genes are homologous if they share a common ancestor.

  9. paralogous genes orthologous genes organism1 organism2 Basic concepts: Homology (II) • Paralogous genes are two (or more) genes homologous in the same organisms. • Orthologous genes are homologous genes belong to different organisms.

  10. Basic concepts: gene family

  11. Basic concepts: functional category • I - Intermediary metabolism • Degradation • Degradation of polysaccharides and oligosaccharides • Degradation of small molecules • Degradation of lipids • Central intermediary metabolism • Energy metabolism, carbon • Regulatory functions • II - Biosynthesis of small molecules • III - Macromolecule metabolism • IV - Cell structure • V - Cellular processes • VI - Mobile genetic elements • VII - Pathogenicity, virulence, and adaptation • VIII - Hypothetical

  12. Motivation queries • Given two or more genomes, what are the genes shared between them and to what families do they belong? • Given two or more genomes, what are the genes specific to one in relation to the others, and to what families do they belong? • Given a gene x from an organism not in the system, does it have homologous in the system? If so, how many?

  13. gw gw gr go gz gc gx Family2 Family1 Category G1 G2 Gk genomes R1 R2 R3 R4 R5 Rp-1 Rp replicons genes gx gx gx gx gx gx gx gx gx gx gx gx gx gw gx gx gx gx gx gx gy gx gx gx gx gx gx gz gx gx gx gx gx gx gx gx gx

  14. Attributes • Attributes based in GenBank data • Genome: • id, strain, source, taxid, description • Replicon: • id, genome_id, description, sequence • Genes: • id, replicon_id, start_pos, end_pos, gene_synonym, orientation, product, name, gi, category

  15. Gene Family N . . N Genome Replicon Gene 1 . . N 1 . . N 2 : N 1 : N Conceptual model Category BLAST Hits

  16. family_tbl gene_family_tbl family_id description family_id gene_id genome_id gene_blast_tbl gene_id blast_type blast_db blast_order blast_gene_id blast_tax_id blast_qu_cover blast_sj_cover blast_idty blast_description gene_tbl genome_tbl gene_id gene_start_pos gene_end_pos replicon_id gene_synonym gene_orientation gene_product gene_name gene_category gene_category_sec gene_gi genome_id genome_strain genome_source genome_taxid genome_description genome_pab replicon_tbl replicon_id genome_id replicon_description replicon_sequence category_tbl categ_id categ_description Tables and relationships

  17. PABdb information system • Plant Associated Bacteria Database • Main objectives • management of genome data; • comparison among genomes; • clustering of genes in gene families and in categories • Allow easy inclusion of new comparison tools

  18. User tools Structured files DBMS DBMS DBMS System overview BLAST, category and family operations LOCAL DBMS converters of data

  19. Gene Families and Categories • Gene families were created based on BLAST results and on an undirected graph model G. • the connected components of G are the families; • Gene categories were assigned by • automatic methods; • human curator;

  20. PABdb – tools • Queries tools: • Query facilitators; • Visualization tools: • Genome overview; • Comparison of orthologous genes of two genomes;

  21. DBMS SQL query Search mechanism What are the genes inXanthomonas axonopodis pv. citri and Xylella fastidiosa cvcand not in Xanthomonas campestris pv. campestris and Xylella fastidiosa Temecula1? Query facilitator XML result file result table Browser

  22. Screenshot (1) – search tool

  23. Genes in Xanthomonas axonopodis pv. citri and Xylella fastidiosa cvc and not in Xanthomonas campestris pv. campestris and Xylella fastidiosa Temecula1 family_id gene_id categ_id product 2288 Xac-chromosome I.D.2 transcriptional regulator 2288 Xfcvc-chromosome I.D transcriptional regulator 2730 Xac-chromosome VI.B plasmid stability protein 2730 Xfcvc-chromosome VI.B plasmid stabilization protein 2739 Xac-chromosome VIII.A conserved hypothetical protein 2739 Xfcvc-pXF51 VIII.A conserved hypothetical protein 3402 Xac-chromosome I.C.3 cytochrome like B561 3402 Xfcvc-chromosome I.C.3 cytochrome B561 4520 Xac-chromosome VI.A phage-related integrase 4520 Xfcvc-chromosome VI.A phage-related integrase 5376 Xac-chromosome V.B chromosome partitioning related protein 5376 Xfcvc-chromosome V.B chromosome partitioning related protein 5377 Xac-chromosome VIII.A conserved hypothetical protein 5377 Xfcvc-chromosome VIII.A hypothetical protein 5377 Xfcvc-chromosome VIII.A hypothetical protein 5378 Xac-chromosome VIII.A conserved hypothetical protein 5378 Xfcvc-chromosome VIII.A conserved hypothetical protein 5379 Xac-chromosome VIII.A conserved hypothetical protein 5379 Xfcvc-chromosome VIII.A hypothetical protein 5380 Xac-chromosome VIII.A conserved hypothetical protein 5380 Xfcvc-chromosome VIII.A hypothetical protein

  24. family_id gene_id categ_id product 5381 Xac-chromosome VIII.A conserved hypothetical protein 5381 Xfcvc-chromosome VIII.A hypothetical protein 5382 Xac-chromosome III.A.2 single-stranded DNA binding protein 5382 Xfcvc-chromosome III.A.2 single-stranded DNA binding protein 5383 Xac-chromosome III.A.5 cytosine-specific DNA methyltransferase 5383 Xfcvc-chromosome III.A.5 DNA methyltransferase 5384 Xac-chromosome VIII.A conserved hypothetical protein 5384 Xfcvc-chromosome VIII.A hypothetical protein 5385 Xac-chromosome VIII.A conserved hypothetical protein 5385 Xfcvc-chromosome VIII.A hypothetical protein 5386 Xac-chromosome VIII.A conserved hypothetical protein 5386 Xfcvc-chromosome VIII.A hypothetical protein 5387 Xac-chromosome VIII.A conserved hypothetical protein 5387 Xfcvc-chromosome VIII.A hypothetical protein 5388 Xac-chromosome VIII.A conserved hypothetical protein 5388 Xfcvc-chromosome VIII.A hypothetical protein 5389 Xac-chromosome VI.B plasmid-related protein 5389 Xfcvc-chromosome VI.B conserved plasmid protein 5390 Xac-chromosome VIII.A conserved hypothetical protein 5390 Xfcvc-chromosome VIII.A hypothetical protein 5391 Xac-chromosome VIII.A conserved hypothetical protein 5391 Xfcvc-chromosome VIII.A hypothetical protein 5413 Xac-chromosome VIII.A conserved hypothetical protein 5413 Xfcvc-chromosome VIII.A hypothetical protein 5414 Xac-chromosome VIII.A conserved hypothetical protein 5414 Xfcvc-chromosome VIII.A hypothetical protein

  25. DBMS SQL query SQL query Search mechanism Given the genomes Xanthomonas axonopodis pv. citriandXanthomonas campestris pv. campestris, what are the genes shared between them (orthologous genes)? What are the genes specific to one genome in relation to the other? Query facilitator result tables XML result file SVG result file Visualization tool

  26. Screenshot (2) – search tool

  27. Xanthomonas axonopodis pv. citri chromosome compared with Xanthomonas campestris pv. campestris chromosome

  28. DBMS SQL query Search mechanism Given the genomes Xanthomonas axonopodis pv. citriand Xanthomonas campestris pv. campestris, what are the genes shared between them (orthologous genes)? Query facilitator XML result file result table SVG result file Visualization tool

  29. Screenshot (3) – visualization tool

  30. Comparison of orthologous genes ofXanthomonas axonopodis pv. citri and Xanthomonas campestris pv. campestris

  31. Distribution of genes of each genome by category

  32. Conclusions • The information systems for genomic management must be scalable and allow exchange of data and operations; • This work presented a simple but flexible and extensible data model for comparative genomics. A first step in the design of a large information system; • The data model was used in a real application (PABdb system).

  33. Future work • Extend the data model to a richer context (e.g. metabolic pathways); • Extend the model to include subdivisions between “family” and “category”; • Use of metadata to describe services and data; • Use of different methods to generate the gene families.

  34. Thank you! Laboratory for Bioinformatics www.lbi.ic.unicamp.br Institute of Computation (IC) www.ic.unicamp.br University of Campinas (UNICAMP) www.unicamp.br Luciano Antonio Digiampietri luciano@ic.unicamp.br

More Related