1 / 11

Linking Open Drug Data Susie Stephens, Principal Research Scientist, Eli Lilly

Linking Open Drug Data Susie Stephens, Principal Research Scientist, Eli Lilly. The Linked Data Cloud. Source: Chris Bizer. Linking Open Drug Data. HCLSIG task started October 1, 2008 Primary Objectives Survey publicly available data sets about drugs

ccrouch
Download Presentation

Linking Open Drug Data Susie Stephens, Principal Research Scientist, Eli Lilly

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linking Open Drug DataSusie Stephens,Principal Research Scientist, Eli Lilly

  2. The Linked Data Cloud Source: Chris Bizer

  3. Linking Open Drug Data • HCLSIG task started October 1, 2008 • Primary Objectives • Survey publicly available data sets about drugs • Publish and interlink these data sets on the Web • Explore interesting questions in competitive intelligence that could be answered if the data sets are linked • Participants: Bosse Andersson, Chris Bizer, Kei Cheung, Don Doherty, Oktie Hassanzadeh, Anja Jentzsch, Scott Marshall, Eric Prud’hommeaux, Matthias Samwald, Susie Stephens, Jun Zhao

  4. Assessment of Data Sources Mark Sharp et al. A Framework for Characterizing Drug Information Sources. AMIA 2008

  5. Published Data Sets • LinkedCT (http://linkedct.org) • Online registry of more than 60,000 clinical trials • Published in XML • 7,011,000 triples (290,000 interlinking) • DrugBank (http://www4.wiwiss.fu-berlin.de/drugbank) • A repository of almost 5,000 FDA-approved drugs • Published as DrugBank DrugCards • 1,153,000 triples (23,000 interlinking) • DailyMed (http://www4.wiwiss.fu-berlin.de/dailymed/) • High quality information about marketed drugs • Flat file representation • 124,000 triples (29,600 interlinking) • Diseasome (http://www4.wiwiss.fu-berlin.de/diseasome) • Information about 4,300 disorders and disease genes linked by known disorder-gene associations • Published in XML • 88,000 triples (23,000 interlinking)

  6. Classes of Links • Based on common identifiers • Links present in the source data sets • Based on link discovery and record linkage techniques • String matching • E.g., “Alzheimer’s disease” in LinkedCT was matched with “Alzheimer_disease” in Diseasome • Semantic matching • E.g. “Varenicline” has the synonym “Varenicline Tartrate” and the brand names “Champix” and “Chantix”

  7. Business Use Case • A neuroscience focused business manager is interested in seeing an update on new clinical trials by competitors on Alzheimer’s Disease (AD) • A phase III trial by Pfizer for a drug called Varenicline has just been listed in linkedCT • More information of interest is found in DBpedia, DailyMed, and DrugBank • DailyMed indicates the drug is already on the market for Nicotine addiction and has minimal side effects • DrugBank allows the manager to see the targets for Varenicline • Diseasome, however, indicates that the corresponding genes are only implicated in nicotine addiction, rather than AD • This suggests a more complex relationship between the diseases than just the drug target • Extending the browsing to the SWAN Knowledgebase shows that there are hypotheses relating AD to nicotine receptors through amyloid beta

  8. Technical Challenges • Life sciences data is difficult to connect due to inconsistent terminology and the prevalence of synonyms, and homonyms • Refinement of tools and techniques for enabling more automatic linking of entities across data sets • Selection of ontologies to enable consistent mappings • Development a sufficiently robust platform as to enable inferencing • Provide an interface to users that supports browsing, querying, and filtering data • Persuade data providers to publish in RDF would alleviate the need for us to update data, and provide some of the interlinking

  9. Next Steps • Ensure that existing data are accurately and comprehensively linked • Incorporate additional data sources into the LODD cloud that are of interest to competitive intelligence (e.g. Traditional Chinese Medicine) • Use novel link discovery tools and frameworks including Silk and LinQuer • Explore using SIOC to aggregate information as what patients are saying about drugs • Submit paper to the iTriplify Challenge

  10. Task Alignment • LODD is looking to use Pharma Ontology’s work to help inform the mappings • Data converted to RDF is also loaded into BioRDF’s HCLS KB

  11. Conclusions • Added 4 drug-related data sets into the cloud for competitive intelligence • Will add further data sources to the LODD cloud to enable more insights to be gleaned • Will continue to explore and test tools that are being developed for LOD

More Related