GeneKeyDB: Enhancing Gene Data Mining

Creation and Maintenance of GeneKeyDB Research being conducted by Kevin Kastner Under the direction of Dr. Erich Baker

The Problem • There exists thousands of biomedical data sources. • In 2006, there were ~557 relevant public resources in molecular biology. • This is growing rapidly. • 203 sources in 1999 • 226 sources in 2000 • 277 sources in 2001.

The Problem • Traditional database approaches are too structured. • Scientific objects change identification over time. • Gene names change over time. • The Human Genome Nomenclature Database (HUGO) contains 13,594 active symbols, 9635 literature aliases, and 2739 withdrawn symbols. • SIR2L1 (w/drawn) is a synonym for SIRT1 and sir2-like 1.

Scientific Object Identities

The Solution • GeneKeyDB • A gene-centered relational database developed to enhance data mining in biological data sets. • GeneKeyDB relies primarily on existing database identifiers derived from community databases (NCBI, GO, Ensembl, et al.) as well as the known relationships among those identifiers. • Version 1 is already out! • http://www.biomedcentral.com/1471-2105/6/72

Weaknesses of Version 1 • Can no longer be updated • Complex queries must be made to the database in order to obtain desired information

Complex Queries SELECT ll_xp_cdd.cdd_name, ll_np_cdd.cdd_name, organism FROM ll_xp_cdd, ll_np_cdd, ll_locus WHERE ll_xp_cdd.cdd_score = ll_np_cdd.cdd_score AND ll_id IN (SELECT ll_id FROM ll_refseq_xm WHERE ll_refseq_xm_id IN (SELECT ll_refseq_xm_id FROM ll_xp_cdd, ll_np_cdd WHERE ll_xp_cdd.cdd_score = ll_np_cdd.cdd_score)) AND ll_id IN (SELECT ll_id FROM ll_refseq_nm WHERE ll_refseq_nm_id IN (SELECT ll_refseq_nm_id FROM ll_xp_cdd, ll_np_cdd WHERE ll_xp_cdd.cdd_score = ll_np_cdd.cdd_score));

Current Research • Creation of APIs to validate data in the database and to enable querying to become much easier for the user. • One-step updating of the database and the information it contains.

API Alternative // fxn(search_params, desired_info), returns ll_id curated.cdd(score[ ],null) curated_score[ ]  score[ ] locus_id1[ ]  gaa.cdd((name[ ],score[ ]), score[ ]) gaa_name[ ]  name[ ] gaa_score[ ]  score[ ] locus_id2[ ]  curated.cdd(name[ ],score[ ]) curated_name[ ]  name[ ] locus_id[ ]  intersect(locus_id1[ ],locus_id2[ ]) locus(organism[ ], locus_id[ ]) print(gaa_name[ ], curated_name[ ], organism[ ])

External Implementations • Some databases have APIs as well. • Ensembl • APIs are done in Perl. • APIs for GeneKeyDB will be done in Java. • More structured language. • Easier to read.

The Future of GeneKeyDB • GeneKeyDB will join even more external and widely used databases together. • Code for updating GeneKeyDB will tie into database information that will change in expected ways. • Lowers the required number of code rewrites. • GeneKeyDB will be dynamically updated.

The Future of GeneKeyDB • APIs made that will be written in Perl. • Perl is used often, almost exclusively, by biologists. • Can have Perl APIs tie into Java APIs, rather than creating all new ones.

Comments? Questions? • http://genereg.ornl.gov/gkdb/

GeneKeyDB: Enhancing Gene Data Mining

GeneKeyDB: Enhancing Gene Data Mining

Presentation Transcript

Creation and Maintenance of Internet Resource Catalogues

Human Subjects Protection: Creation and Maintenance of an IRB

Creation and The Creation and Fall of Man

Job Creation, Skills Development and Empowerment in Road Construction Rehabilitation and Maintenance

Test Framework in PeopleSoft PeopleTools 8.52 Test Creation, Optimization and Maintenance

Whisper of Creation

Creation of Nunavit!!!!!!!!!!

Creation of Information

“An automated tool designed to ease the pain of test creation and maintenance .”

Use of XSLT Formats in Creation, Maintenance and Optimization of Web Site Templates

Creation of money

Goodness of Creation

All of Creation

The Creation of

Database Creation and Maintenance

Recovery Catalog Creation and Maintenance

7 words of Creation and Re-creation

Creation of CFPB

A New Interface to GeneKeyDB

Maintenance of Substation Equipment | Operation And Maintenance Of Substation

Creation of Memories