1 / 134

Finding What you Need in Biological Databases

Finding What you Need in Biological Databases. Cédric Notredame. Databases:. Where is my Needle ?. Our Scope. Give you means to answer simple questions. Databases are UNFRIENDLY INFORMATION DESKS. Give you an idea of what is possible. WHAT can you ask ?. HOW can you ask it ?.

talon
Download Presentation

Finding What you Need in Biological Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding What you Need in Biological Databases Cédric Notredame

  2. Databases: Where is my Needle ?

  3. Our Scope Give you means to answer simple questions Databases are UNFRIENDLY INFORMATION DESKS Give you an idea of what is possible WHAT can you ask ? HOW can you ask it ?

  4. Outline - An Overall view - Asking a biological question to a database - Turning a question into a query - Bibliographic Databases: Medline, OMIM - Gene Databases: GenBank, LocusLink, ENSEMBL - Protein Databases: SwissProt, InterPro, Prodom - SRS

  5. Database: What is a Database ?

  6. DataBase Entries 1 entry = 1 Sequence AGCTGTCGAGGGATAGGACA TATACATAAATTAATATAAT SEQ 1 entry = 1 File = Sequence +Doc DOC = Flat File Database = Collection of Flat Files SEQ SEQ SEQ SEQ SEQ SEQ SEQ DOC DOC DOC DOC DOC DOC DOC

  7. DataBase Entries: Flat Files Accession number: 1 First Name: Amos Last Name: Bairoch Course: DEA=oct-nov-dec 2002 http://www.expasy.org/people/amos.html // Accession number: 2 First Name: Laurent Last name: Falquet Course: EMBnet=sept 2000, sept 2001;DEA=oct-nov-dec 2000; // Accession number 3: First Name: Marie-Claude Last name: Blatter Garin Course: EMBnet=sept 2000; sept 2001; DEA=oct-nov-dec 2000; http://www.expasy.org/people/Marie-Claude.Blatter-Garin.html //

  8. DataBase: Relational Databases Relational database (« table file »):

  9. To Summarize: What’s a database ? • Collection of Data that is: • Structured Data • Searchable (index) -> table of contents • Updated periodically (release) -> new edition • Cross-referenced (hyperlinks) -> links with other db • Collection of tools (software) necessary for:Searching –Updating -Releasing • Data storage managment: flat files, relational databases…

  10. Database: What’s on the Menu?

  11. A large amount of information • More than 1000 different databases • Generally accessible through the web • EBI: http://www.ebi.ac.uk/ • NCBI: http://www.ncbi.nlm.nih.org • Google: http://www.google.com • Variable size: <100Kb to >10Gb • DNA: > 10 Gb • Protein: 1 Gb • 3D structure: 5 Gb • Other: smaller • Update frequency: daily to annually

  12. A Non Exhaustive List AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb, ARR, AsDb, BBDB, BCGD, Beanref, Biolmage,BioMagResBank, BIOMDB, BLOCKS, BovGBASE, BOVMAP, BSORF, BTKbase, CANSITE, CarbBank,CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP,ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG,CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb,Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC,ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db,ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView,GCRDB, GDB, GENATLAS, Genbank, GeneCards,Genline, GenLink, GENOTK, GenProtEC, GIFTS,GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB,HAEMB, HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD,HIDB, HIDC, HlVdb, HotMolecBase, HOVERGEN, HPDB,HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat,KDNA, KEGG, Klotho, LGIC, MAD, MaizeDb, MDB,Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP5Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us,MPDB, MRR, MutBase, MycDB, NDB, NRSub, 0-lycBase,OMIA, OMIM, OPD, ORDB, OWL, PAHdb, PatBase, PDB,PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD,PPDB, PRESAGE, PRINTS, ProDom, Prolysis, PROSITE,PROTOMAP, RatMAP, RDP, REBASE, RGP, SBASE,SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase,SPAD, SRNA db, SRPDB, STACK, StyGene,Sub2D,SubtiList, SWISS-2DPAGE, SWISS-3DIMAGE, SWISS-MODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB,TOPS, TRANSFAC, TRR, UniGene, URNADB, V BASE,VDRR, VectorDB, WDCM, WIT, WormPep, YEPD, YPD,YPM, etc .................. !!!! There Exists A Specialized Database on Almost anything you can think of

  13. A database of databases

  14. What’s on the Menu:The Art of Eating Well Always Use Fresh Data: The Latest Update of your DataBase Make Sure The DataBase is Maintained: Many Databases are poorly maintained Treat DataBases like Publications: Some Journals are Better than Others

  15. Bio-Google: How Can I Search a Database ?

  16. Searching Databases SEQ DOC Similarity Searches: BLAST AGCTGTCGAGGGATAGGACA TATACATAAATTAATATAAT Text based queries: Medline, Entrez Search For « Smith AND dUTPase> There are 2 ways to search databases

  17. Searching Databases Each database is a little kingdom… • Has its own query system • Has its own information structure • The main databases are well documentedand this documentation is available online • Most databases can be searched using SRSor Entrez

  18. Databases: Asking the right Question When you search a Database you must have an idea of what your Needle-in-a-hay-stack looks like Databases ARE NOT meant for browsing

  19. Databases: Asking the right Question Browsing a database is like Using yourphone book in place of a dating agency…

  20. Databases: Asking the right Question Finding Data: Database Search Data Mining Finding Questions:

  21. The Kind Of Questions We Can Ask: SEQUENCE Based InterPro SwissProt Any Known Domain in my Protein ??? Any Protein like mine ??? These ARE Predictions

  22. The Kind Of Questions We Can Ask: TEXT Based Medline SwissProt PDB Who Worked on my Protein ??? Function of My Protein ??? Structure of My Protein ??? These are NOT Predictions

  23. Just like When You Google up Specific Queries give Precise Answers

  24. Medline: Who worked on my Protein ?

  25. Medline (PubMed)

  26. What is in Medline ? • MEDLINE covers the fields of medicine, nursing, dentistry, veterinary medicine, the health care system, and the preclinical sciences • more than 4,000 biomedical journals and More than 10 million citations since 1966 until now • Contains links to biological db and to some journals • Many papers not dealing with human are not in Medline • Before 1970, keeps only the first 10 authors !

  27. Using Medline: Asking a question During the last Lab Meeting, I heard the word dUTPase. What can it be ? What has been published on this ?

  28. Using Medline: Asking a question

  29. Using Medline: Asking a question

  30. Using Medline: Asking a question

  31. Using Medline: Asking a question By Default, Medline Assumes you mean: Abergel AND dUTPase

  32. Using Medline: Asking a question Save Your Data in the Proper DataBase format I have found the reference I wanted. Now I want to save it so that I can use it later, For instance to Import it in ENDnote my Reference Manager

  33. Using Medline: Storing your results

  34. Using Medline: Storing your results

  35. [AB] [AD] Restricted fields Retrieving EXACTLY the Information that you need

  36. Using Medline: Storing your results AB AD

  37. Using Medline: Looking for a Review I Want to Find the LATEST REVIEW on the dUTPase. Use The Limit Option of Medline

  38. 1-Limits Title OR Abstract Language Article type Using Medline: Looking For a Review

  39. Using Medline: A Few Tips • Quoted queries (e.g. «down syndrome» ) behave as a single word, and are great to improve the relevance of your search • Adding initials to names (e.g. “Abergel C” ) (if you can) also reduces your output • Write down the PubMed Identifier (the number in the PMID field) of that interesting paper you just find. It could be very useful in your subsequent search for related items such as associated gene and protein sequences

  40. Using Medline: A Few Tips • Spelling mistakes, wrong field restrictions or Limits setting can occur. These may be the problem. • Use abstracts to enlarge your vocabulary and look for synonyms: some papers on dUTPase might use dUTP pyrophosphatase instead! • The “related papers” button (on the extreme right of the PubMed output). Try it from time to time, to enlarge a search that is not giving you enough references

  41. Using Medline: A Few Tips • Storing your PDFs, • Memory is cheap, access is sometimes strange… • Storing your favourite PDF is a good idea • Which name on your disk? • THE MEDLINE ID NUMBER !!! • With a reference manager like EndNote

  42. GenBank: What is the Sequence of my Gene ?

  43. GenBank: an Overview

  44. GenBank: an Overview

  45. GenBank: an Overview EMBL, GenBank and DDBJ are the same database. They are synchronized every day. GenBank EMBL DDBJ

  46. GenBank contains EVERY piece of DNA that has been sequenced and made publicly available. It contains GOOD and BAD data There is a Historical Aspect in the GenBank data: -Complex Genes are spread in many entries: GenBank: an Overview

  47. GenBank Entries Are Complex because Genes are complex Prokaryotic Example Gene RBS Promoter ATG STOP mRNA ORF Protein

  48. GenBank Entries Are Complex because Genes are complex Gene Protein (form1) mRNA (form1) Promoter exon exon exon exon exon exon mRNA (form2) Protein (form2)

More Related