1 / 44

The GeneCards TM Project at the Weizmann Institute of Science

The GeneCards TM Project at the Weizmann Institute of Science. http://bioinformatics.weizmann.ac.il/cards/. • For each gene - a card with displayed data. and links to entries in major databases. • Genes with HUGO nomenclature symbols. and others.

angus
Download Presentation

The GeneCards TM Project at the Weizmann Institute of Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The GeneCardsTMProject at the Weizmann Institute of Science

  2. http://bioinformatics.weizmann.ac.il/cards/ • For each gene - a card with displayed data and links to entries in major databases • Genes with HUGO nomenclature symbols and others • Automatic data mining and integration • Advanced human-computer interaction

  3. chromosome gene DNA sequence disease mutation medical applications protein research article RNA gene genetic chromosomal alias map location marker similar mouse gene

  4. GeneCards: From Chaos to Order A card for each gene Aliases o DNA, RNA o Protein o Chromosomal location o Disorders o Medical applications o Related mouse gene o Research articles o Links to more data o Data is retrieved and integrated automatically

  5. link to link to link to link to GeneCard: Integrated Data and Starting Point Mining and A Starting point Integration of Data for More Data Entries in Data Sources GeneCard of GeneCards link to Data Sources other other of GeneCards Data Sources Data Sources

  6. HUGO nomenclature gene symbol Accession ID to other databases LocusLink or HUGO location If chromosome 21 A typical GeneCard: RUNX1

  7. For chromosome 21 only Sequence accessions Information on proteins

  8. Homologues Single nucleotide polymorphisms Disorders and mutations Medical news from Doctor’s guide Published literature

  9. Start new search Snapshot of additional GeneCard fields Additional information

  10. Improved Single Nucleotide Polymorphisms Summaries

  11. Current GeneCards Data Sources and Links HUGO GDB OMIM SWISS-PROT LocusLink UDB UniGene MGD DOTS UCSC GenBank PubMed CroW 21 Doctor’s Guide HUGE euGenes Genatlas ATLAS HGMD TGDB BCGD MTDB RZPD MIPS PDB BLOCKS HORDE dbSNP ENSEMBL SBCELEGANS GeneLynx IMGT SOURCE

  12. Gene sources 13,046 HUGO 360 LocusLink MGD 8,951 CroW 21 63

  13. Simple search box search keywords results no results gene 1: name spell corrections - ... keyword ... - ... ... keyword . query modification outside resources gene 2: name - keyword ... How to search and find?

  14. Some GeneCards Statistics 27,612 GeneCards(November, 2001) 13,548 HUGO approved genes 2,646,185 Accesses to GeneCards(at WIS since January 1, 1998) 25 Mirror sites around the world

  15. The Affymetrix System

  16. Sample preparation Hybridization Signal detection Data analysis Genechip Procedure Fluidic station Scanner Software

  17. ChipCards - A Functional Integration Tool for DNA Array Data Tsviya Olender, Shirley Horn-Saban, Marilyn Safran, Vered Chalifa-Caspi, Michal Ronen and Doron Lancet The Crown Human Genome Center The Weizmann Institute of Center, Rehovot 76100

  18. About ChipCards • ChipCards correlates DNA array data with comprehensive information from gene-specific databases. It is currently implemented for the Affymetrix GeneChip. • ChipCards’s output is an HTML table with essential additional information for each gene including: gene symbol, functional definition, accession number, protein information, chromosomal location and EST data. • Human data is integrated with GeneCards, UDB and Unigene. • Mouse data is integrated with information about the human orthologue via GeneCards, HomoloGene and MGD.

  19. Example of GeneChip output before ChipCards processing

  20. An Extract of Human Expression Data After ChipCards Processing NCBI link GeneCards link UDB link A snapshot of ChipCards’s result, with human Affymetrix expression data as input. Each probe set has a link to NCBI, GeneCards and UDB. Information about the cDNA sources of the gene is extracted from Unigene and is given as a separate column in the table. The same for UDB coordinates.

  21. Murine Expression Data After ChipCards Processiong Human orthologes data Human’s Unigene link NCBI link GeneCards link NCBI link Murine’s Unigene link A snapshot of ChipCards output for Mouse Affymetrix expression data. Each probe set is linked to NCBI and Unigene. Information about the human orthologue is integrated into the table and includes links to NCBI, GeneCards and Unigene.

  22. Current Research - Adding Cards for Genes that Don’t Yet Have a Name Assembly-based Unigene 1 resources cluster 2 3 Gene 4 sequence 5 tag GeneCard Unique for novel persistent gene gene identifier

  23. Version 3.0 Project Goals Improving flexibility, allowing automated parameterized generation from partial sets of sources and/or genes, and appending to an existing database Providing an Application Programming Interface for users of the generation software to incorporate their own data Standardizing the format of the database to use XML

  24. Project Goals (cont’d) Providing a foundation for supplying a stable identifier for each GeneCard, even when no known gene symbol exists Improving the maintainability, testability, and quality of the software Providing a seamless migration path from Version 2.xx while maintaining the current look and feel and functionality

  25. Perl not originally designed as an OOP language Type safety, proper encapsulation and aggregation aren’t enforced Can be between 20 and 50 % slower Allows for more robust implementations Greater modularity More comprehensible interface to modules Better abstraction of software components Less namespace pollution Greater code reusability Software scalability Cleaner and more compact code Pros and Cons of Using OOP BUT

  26. The 3.0 Hybrid Solution • Combines an object-oriented skeleton with some non object-oriented internals • The large data structure of gene-based data is implemented as a hash of hashes, avoiding numerous costly instantiations • All other major components, including the extractors and administration classes, are implemented as objects

  27. GeneCards Architecture • Generation Software UniGene Extractor GeneCards Database SwissProt Extractor API Customized Extractor Support Functions Display Software

  28. Generation Software Classes An underlying layer of support tools that manage extracting data from locally mirrored files and the internet, proxy connections, verification, security, file management, caching, conflict detection, error handling, statistics, and XML output formating A set of extractor classes, one for each source of information using source-specific algorithms and heuristics (adapted from pervious versions of GeneCards). Methods include new, prepare and search A template for building extractor classes. All such classes can create new or append to old entries, as well as generate data for all entries (genes) at once, or one at a time A main class that handles building sets of cards according to parameterized partial ordering rules

  29. The XML-Based Database XML is a meta-language that supports customized tags for describing and providing semantic meaning to structured data Typed elements are arranged within other elements to form a nested hierarchy The data is grouped by source in the XML files, but can be retrieved by function: <GCresource>SWISSPROT<GCresource>OMIM <protein> <disorder>Colorectal Cancer <disorder>Germline Cancer </disorder> </disorder></GCresource> </protein> <GCresource>GENECLINICS <GCresource><disorder>Li-Fraumeni Syndrome </disorder> </GCResource> Each extractor module is responsible for its own Document Type Definition (DTD) specification to ensure that the XML is well formed and valid Files are stored in a hierarchical directory structure, one file per gene

  30. The Display Software Currently in the design phase Want to maintain the current look and feel while providing the flexibility of easy customization Will use XML Perl parser modules in cgi scripts Search will be expanded beyond current text-based capabilities to include context-specific searches

  31. 3.0 Project Status and Open Issues Procedural programs/ad-hoc flat file format Object-oriented methodology/standardized XML Easy to add new extractors Flexible and extensibile Performance , Searchingstrategies

  32. Original public databases Data mining Semantic Integration Source-specific information Megabase Integration Integrated chrmosomal maps Unified Database (UDB) Data mining and integration Thesaurus UDB

  33. Sequence-Based Repositioning (SBR) Placing finished genomic sequences on UDB map. Map fine tuning in sequenced regions.

  34. SBR (Sequence Based Repositioning) Elimination of overlaps between contigs Object repositioning UDB original map SBR map

  35. Search Results - a Map Slice to GeneCard to Unigene to MarkerCard

  36. A MarkerCard

  37. GeneCards Success Stories • GeneCards as a bookmark for linkage analysis • Mutations that were polymorphisms and not disease-causing • Adult-onset diabetes without obesity in India • Work on Chromosome 21 at the Weizmann Institute • PVT – a heart disease found in Israeli Beduins • Parkinson’s disease paper

  38. Frequently Asked Questions • What’s special about GeneCards? • Can I interface my own data? • Can I access my own in-house database mirrors instead of public internet sites?

  39. GeneCards/UDB Team current: Avital Adato Vered Chalifa-Caspi Michal Lapidot Zvia Olender Naomi Rosen Marilyn Safran, head Orit Shmueli Irina Solomon Doron Lancet, PI alumni: Michael Rebhan Shai Shen-Orr Inga Peter Jaime Prilusky Michal Ronen Hershel Safer Julie Stampnitzky Liora Yaar

More Related