1 / 44

The Vision: Scientist as knowledge worker

Information Management for the Life Sciences M. Scott Marshall Marco Roos Adaptive Information Disclosure University of Amsterdam. The Vision: Scientist as knowledge worker. For Knowledge Workers: Knowledge is the data (i.e. rules, relations, properties, hypotheses, etc.)

sian
Download Presentation

The Vision: Scientist as knowledge worker

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Management for the Life SciencesM. Scott MarshallMarco RoosAdaptive Information DisclosureUniversity of Amsterdam

  2. The Vision: Scientist as knowledge worker • For Knowledge Workers: • Knowledge is the data (i.e. rules, relations, properties, hypotheses, etc.) • For Today's Biologist: • Numbers, sequences, organisms(!), and images are the data • Manipulate knowledge instead of data • Find support for relations between concepts instead of discovering table and column names and numbers. • In the virtual laboratory, everything is a resource that can be described and manipulated with semantics

  3. Vision: Concept-based interfaces • The scientist should be able to work in terms of commonly usedconcepts. • The scientist should be able to work in terms of personal concepts and hypotheses. - Not be forced to map concepts to the terms that have been chosen for a given application by the application builder.

  4. Interface Sketch:Finding a basis for relation Hypothesis Epigenetic Mechanisms Transcription “There is a relation” Chromatin Transcription Factors Histone Modification Transcription Factor Binding Sites Classes Instances Common Domain position

  5. Biological cartoon as interface KSinBIT’06 Source: Marco Roos

  6. Biology in a nutshell: Bigger isn’t better • DNA Dogma • Transcription = DNA -> mRNA -> Protein • Molecular pathways allow biologists to ‘connect’ one process to another. • Huntington’s mutation mapped in 1993 yet there is still no understanding of the mechanism that causes the neurodegeneration. • Semantic models are necessary to create a ‘systems view’ of biology.

  7. Show Bigger isn’t Better • Scaling up should be done in small increments but once you’ve reached a certain threshold..

  8. What is metadata (in this course)? • Metadata: data about data • Metadata can be syntactic such as a data type, e.g. Integer. • Metadata can be semantic such as chromosome number. • Note: not always ontology, but metadata can be stored in OWL

  9. Common approaches to metadata • Code it into the GUI or application (in datastructures, object types, etc.) • Create special tables or fields for it in a relational database • Map it into substrings of filenames • Mix it in with data in proprietary file formats • Let the user figure it out • Conclusion: There is a need for semantic disclosure.

  10. The Semantic Gap Application Middleware Resources User

  11. The Model in the middle My Model Model Model Application Middleware Resources User

  12. What is knowledge (in this course) “data”, “information”, “facts”, “knowledge” Knowledge is a statement that can be tested for truth. (by a machine) Otherwise, computing can’t add much

  13. Resources are shared on the grid • Shared: • CPU time • network bandwidth • memory • storage space • But also: • Data • Knowledge: ontologies, rules, vocabularies • Services

  14. Abundance of resources in Grid: A Challenge • Knowledge Sharing • How will we find the relevant resources (data, services)? • How can we automatically integrate them into an application? • How will we leverage existing knowledge in my analysis? • How will we integrate our results as usable data for a new (computational) experiment? • And link to the evidence (data) for the new knowledge?

  15. Knowledge Capture • How will we acquire the knowledge? • Literature • Other forms of discourse • Data analysis • How will we represent and store it? • In Semantic Web formats such as RDF, OWL, RIF

  16. Knowledge capture from a computational experiment Database Database Computational experiment in workflow environment ... Database

  17. What will we do with knowledge? • How will we use it? • Query it • Reason across it • Integrate it with other data • Link it up

  18. Linked Data Principles • Use URIs as names for things. • Use HTTP URIs so that people can look up those names. • When someone looks up a URI, provide useful RDF information. • Include RDF statements that link to other URIs so that they can discover related things. • Tim Berners-Lee 2007 • http://www.w3.org/DesignIssues/LinkedData.html

  19. Background of the HCLS IG • Originally chartered in 2005 • Chairs: Eric Neumann and Tonya Hongsermeier • Re-chartered in 2008 • Chairs: Scott Marshall and Susie Stephens • Team contact: Eric Prud’hommeaux • Broad industry participation • Over 100 members • Mailing list of over 600 • Background Information • http://www.w3.org/2001/sw/hcls/ • http://esw.w3.org/topic/HCLSIG

  20. Mission of HCLS IG • The mission of HCLS is to develop, advocate for, and support the use of Semantic Web technologies for • Biological science • Translational medicine • Health care • These domains stand to gain tremendous benefit by adoption of Semantic Web technologies, as they depend on the interoperability of information from many domains and processes for efficient decision support

  21. Translating across domains • Translational medicine – use cases that cross domains • Link across domains and research: • What are the links? • gene – transcription factor – protein • pathway – molecular interaction – chemical compound • drug – drug side effect – chemical compound

  22. Group Activities • Document use cases to aid individuals in understanding the business and technical benefits of using Semantic Web technologies • Document guidelines to accelerate the adoption of the technology • Implement a selection of the use cases as proof-of-concept demonstrations • Develop high-level vocabularies • Disseminate information about the group’s work at government, industry, and academic events

  23. Current Task Forces • BioRDF – integrated neuroscience knowledge base • Kei Cheung (Yale University) • Clinical Observations Interoperability – patient recruitment in trials • Vipul Kashyap (Cigna Healthcare) • Linking Open Drug Data – aggregation of Web-based drug data • Chris Bizer (Free University Berlin) • Pharma Ontology – high level patient-centric ontology • Christi Denney (Eli Lilly) • Scientific Discourse – building communities through networking • Tim Clark (Harvard University) • Terminology – Semantic Web representation of existing resources • John Madden (Duke University)

  24. BioRDF Task Force • Task Lead: Kei Cheung • Participants: M. Scott Marshall, Eric Prud’hommeaux, Susie Stephens, Andrew Su, Steven Larson, Huajun Chen, TN Bhat, Matthias Samwald, Erick Antezana, Rob Frost, Ward Blonde, Holger Stenzhorn, Don Doherty

  25. BioRDF: Answering Questions • Goals: Get answers to questions posed to a body of collective knowledge in an effective way • Knowledge used: Publicly available databases, and text mining • Strategy: Integrate knowledge using careful modeling, exploiting Semantic Web standards and technologies

  26. BioRDF: Looking for Targets for Alzheimer’s • Signal transduction pathways are considered to be rich in “druggable” targets • CA1 Pyramidal Neurons are known to be particularly damaged in Alzheimer’s disease • Casting a wide net, can we find candidate genes known to be involved in signal transduction and active in Pyramidal Neurons? Source: Alan Ruttenberg

  27. BioRDF: Integrating Heterogeneous Data PDSPki NeuronDB Reactome Gene Ontology BAMS Allen Brain Atlas BrainPharm Antibodies Entrez Gene MESH Literature PubChem Mammalian Phenotype SWAN AlzGene Homologene Source: Susie Stephens

  28. BioRDF: SPARQL Query Source: Alan Ruttenberg

  29. BioRDF: Results: Genes, Processes • DRD1, 1812 adenylate cyclase activation • ADRB2, 154 adenylate cyclase activation • ADRB2, 154 arrestin mediated desensitization of G-protein coupled receptor protein signaling pathway • DRD1IP, 50632 dopamine receptor signaling pathway • DRD1, 1812 dopamine receptor, adenylate cyclase activating pathway • DRD2, 1813 dopamine receptor, adenylate cyclase inhibiting pathway • GRM7, 2917 G-protein coupled receptor protein signaling pathway • GNG3, 2785 G-protein coupled receptor protein signaling pathway • GNG12, 55970 G-protein coupled receptor protein signaling pathway • DRD2, 1813 G-protein coupled receptor protein signaling pathway • ADRB2, 154 G-protein coupled receptor protein signaling pathway • CALM3, 808 G-protein coupled receptor protein signaling pathway • HTR2A, 3356 G-protein coupled receptor protein signaling pathway • DRD1, 1812 G-protein signaling, coupled to cyclic nucleotide second messenger • SSTR5, 6755 G-protein signaling, coupled to cyclic nucleotide second messenger • MTNR1A, 4543 G-protein signaling, coupled to cyclic nucleotide second messenger • CNR2, 1269 G-protein signaling, coupled to cyclic nucleotide second messenger • HTR6, 3362 G-protein signaling, coupled to cyclic nucleotide second messenger • GRIK2, 2898 glutamate signaling pathway • GRIN1, 2902 glutamate signaling pathway • GRIN2A, 2903 glutamate signaling pathway • GRIN2B, 2904 glutamate signaling pathway • ADAM10, 102 integrin-mediated signaling pathway • GRM7, 2917 negative regulation of adenylate cyclase activity • LRP1, 4035 negative regulation of Wnt receptor signaling pathway • ADAM10, 102 Notch receptor processing • ASCL1, 429 Notch signaling pathway • HTR2A, 3356 serotonin receptor signaling pathway • ADRB2, 154 transmembrane receptor protein tyrosine kinase activation (dimerization) • PTPRG, 5793 ransmembrane receptor protein tyrosine kinase signaling pathway • EPHA4, 2043 transmembrane receptor protein tyrosine kinase signaling pathway • NRTN, 4902 transmembrane receptor protein tyrosine kinase signaling pathway • CTNND1, 1500 Wnt receptor signaling pathway Many of the genes are related to AD through gamma secretase (presenilin) activity Source: Alan Ruttenberg

  30. Linking Open Drug Data • HCLSIG task started October 1st, 2008 • Primary Objectives • Survey publicly available data sets about drugs • Explore interesting questions from pharma, physicians and patients that could be answered with Linked Data • Publish and interlink these data sets on the Web • Participants: Bosse Andersson, Chris Bizer, Kei Cheung, Don Doherty, Oktie Hassanzadeh, Anja Jentzsch, Scott Marshall, Eric Prud’hommeaux, Matthias Samwald, Susie Stephens, Jun Zhao

  31. The Classic Web Single information space Built on URIs globally unique IDs retrieval mechanism Built on Hyperlinks are the glue that holds everything together Search Engines Web Browsers HTML HTML HTML hyper-links hyper-links A C B Source: Chris Bizer

  32. Linked Data Linked Data Browsers Linked DataMashups Search Engines Thing Thing Thing Thing Thing Thing Thing Thing Thing Thing typedlinks typedlinks typedlinks typedlinks A E C D B • Use Semantic Web technologies to publish structured data on the Web and set links between data from one data source and data from another data sources Source: Chris Bizer

  33. Data Objects Identified with HTTP URIs rdf:type foaf:Person pd:cygri foaf:name Richard Cyganiak foaf:based_near dbpedia:Berlin pd:cygri = http://richard.cyganiak.de/foaf.rdf#cygridbpedia:Berlin = http://dbpedia.org/resource/Berlin Forms an RDF link between two data sources Source: Chris Bizer

  34. Dereferencing URIs over the Web 3.405.259 dp:population skos:subject dp:Cities_in_Germany rdf:type foaf:Person pd:cygri foaf:name Richard Cyganiak foaf:based_near dbpedia:Berlin Source: Chris Bizer

  35. Dereferencing URIs over the Web 3.405.259 dp:population skos:subject dp:Cities_in_Germany rdf:type foaf:Person pd:cygri foaf:name Richard Cyganiak foaf:based_near dbpedia:Berlin skos:subject dbpedia:Hamburg skos:subject dbpedia:Meunchen Source: Chris Bizer

  36. LODD Data Sets Source: Anja Jentzsch

  37. LODD in Marbles Source: Anja Jentzsch

  38. The Linked Data Cloud Source: Chris Bizer

  39. Accomplishments • Technical • HCLS KB hosted at 2 institutes • Linked Open Data contributions • Demonstrator of querying across heterogeneous EHR systems • Integration of SWAN and SIOC ontologies for Scientific Discourse • Outreach • Conference Presentations and Workshops: • Bio-IT World, WWW, ISMB, AMIA, C-SHALS, etc. • Publications: • Proceedings of LOD Workshop at WWW 2009: Enabling Tailored Therapeutics with Linked Data • Proceedings of the ICBO: Pharma Ontology: Creating a Patient-Centric Ontology for Translational Medicine • AMIA Spring Symposium: Clinical Observations Interoperability: A Semantic Web Approach • BMC Bioinformatics. A Journey to Semantic Web Query Federation in Life Sciences • Briefings in Bioinformatics.  Life sciences on the Semantic Web: The Neurocommons and Beyond

  40. New Technologies • SPARQL-DL • Semantic Wiki (integration with KB’s) • Cloud Computing (e.g. Amazon) • Query rewriting: SPARQL -> SQL • Legacy integration • Improve interfaces • FeDeRate: Federated query

  41. We’ve come a long way • Triplestores have gone from millions to billions • Linked Open Data cloud • http://lod.openlinksw.com/ • On demand Knowledge Bases: Amazon’s EC2 • Terminologies: SNOMED-CT, MeSH, UMLS, .. • Neurocommons, Flyweb, Biogateway, Bio2RDF, Linked Life Data, ..

  42. Penetrance of ontology in biology • OBO Foundry - http://www.obofoundry.org • BioPortal - http://bioportal.bioontology.org • National Centers for Biomedical Computing http://www.ncbcs.org/ • Shared Names • Concept Web Alliance • Semantic Web Interest Group PRISM Forum • Work packages in ELIXIR

  43. Recipe for a Semantic Web • Follow Linked Open Data principles • Attempt to use Shared Names (same URI’s) • Query rewriting to map from: • SPARQL -> (query language) • SPARQL (term1) -> SPARQL (term2) • Add federated query support to SPARQL engine implementations

  44. The End “Science is built up of facts, as a house is built of stones; but an accumulation of facts is no more a science than a heap of stones is a house.”  – Henri Poincaré, Science and Hypothesis, 1905

More Related