1 / 22

Semantic Web In Industry

Semantic Web In Industry. R. Guha. Two Levels of the Semantic Web. Deep Semantic Web: Intelligent agents performing inference Semantic Web as distributed AI Small problem … the AI problem is not yet solved Shallow Semantic Web: using SW/Knowledge Representation techniques for

sylvain
Download Presentation

Semantic Web In Industry

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Web In Industry R. Guha

  2. Two Levels of the Semantic Web • Deep Semantic Web: • Intelligent agents performing inference • Semantic Web as distributed AI • Small problem … the AI problem is not yet solved • Shallow Semantic Web: using SW/Knowledge Representation techniques for • Data integration • Search • Is starting to see traction in industry

  3. Integration: The new buzzword in bussiness • Huge explosion in the number of new databases, applications, documents, … in the 90s • Lots of redundancy, duplication … => high inefficiency • Economic pressures forcing consolidation and efforts to reduce inefficiency • Two aspects to integration: Process & Data • Process integration depends on data integration

  4. Data Integration for Science • Many experimental fields will generate more data in the next 2 years than exists today • Large part of research consists of writing programs to analyze data, e.g., NASA • Tools to normalize, share, integrate data stuck in the 80s (ftp, perl, …) • Semantic Web could create a “web of data” that changes all this. • Example of the Internet Observatory

  5. Varieties of Data Integration: Data Transformation • Data Transformation Example • Contact Information in SAP, Siebel, PeopleSoft, … • We want to reflect updates in one data source into another XSLT, etc. App. Server Clarify Siebel PeopleSoft

  6. Varieties of Data Integration: Data Aggregation • Data Aggregation Example • Clinical trial data at Stanford, UCSF, Mayo … • We want to give a Meta-analyst a uniform view of data from these different clinical trials • Example of how this would have helped recent meta studies such as the estrogen study Relational Views DBMS Meta-Analyst UCSF Stanford Mayo

  7. Data Integration Layers • Coping with software from different vendors • Oracle vs. DB2 vs. SQL Server … this is a solved problem • Coping with different formats • Relational vs. XML vs. ISAM… this too is a solved problem • Coping with different schemas • Solved for the small case where one person understands all the schemas • No products for the case where it is truly distributed • We know how to do it in theory, but lots of practical problems • Coping with data from unknown sources • Wide open … lots of unsolved problems

  8. Typical Data Integration Methodology • Use a common namespace of terms for the concepts in the domain of the data sources being integrated, e.g., Employee, Customer, Patient, weight, height, bodyTemperature, … • Mappings relate data items in data sources to terms in namespace • Transformation algorithms map queries in terms of common namespace into corresponding queries in terms of data source vocabularies • Background knowledge about terms essential for transformations … e.g., Employee subClassOf Person, 2 people with the same last name, first name and street address are likely to be the same, I.e., common namespace is really an Ontology • Mappings and common namespace are the workhorse

  9. Role of Semantic Web in Data Integration • The XML stack (XML, XSD, XPath, XQuery, …) does not have the concepts (objects, classes, properties, …) required for representing ontologies • RDF/S does … • Neither of the them have a language for expressing mappings • But RDF/S, being closer to logic, has more of the machinery that is required

  10. Kinds of Mappings • Simple structural • DB1.patient.weight corresponds to Patient’s weight • Conditional structural • If DB1.patient.type equals Outpatient then DB1.patient.foo corresponds to Patient’s visits duration … • Term mappings • CA in DB1 corresponds to California in domain namespace • Object with ssn 7687667 in database 1 corresponds to object with id “aksdks” in database 2

  11. Challenges and non-challenges in data integration • Non-challenge: algorithms for doing the transformations (ISI, MCC, SU & AT&T) • Engineering Challenges • Creating large, useful ontologies that are shared by many • Creating mappings • Research Challenges • Semantic Drift • Fuzzy terms, probabilistic mappings • Trust

  12. Engineering Challenges • Creating large, detailed ontologies is complex and expensive • But it is happening … CrossWorlds for business concepts, MAGE, etc. for medicine • Danger: some of them might turn out to be proprietary • Creating mappings is tedious and time consuming • Object mappings pose special challenges • Mappings need to be dynamic and constantly updated

  13. Research Challenges with mappings • Semantic Drift • The meaning of terms as interpreted by different members of a community, over time could drift • Cyc experience shows that Description Logic mechanisms are not adequate for either detecting or fixing these • Fuzzy mappings • E.g., walmart’s concept of chair is similar to but not the same as MOMA’s concept of chair • Probabilistic mappings • There is a 82% likelihood that Michael Jordan in database 1 is the same as Michael Jordan in database 2

  14. Other data web related challenges • Trust: How should the program know whether to trust some new data source? • Without this, we will only have closed systems • Options: centralized approaches like UDDI or decentralized approaches like WOTs • Inverse trust: how can I trust you not to indiscriminately distribute my data? A big issue in fresh scientific data • Systems challenges • Caching • Preventing accidental DOS attacks

  15. Forecast for SW and Data Integration • We already have a number of data integration tools on the market • We are seeing the first generation of ontology based data integration tools from small companies • At least some of the big players will probably have some offerings for doing data integration based on Semantic Web concepts in the near future • Whether they use Semantic Web formats and acronyms is an open question … • These common vocabularies will exhibit very strong network effects

  16. Semantic Web for Search: Going beyond search as Location Bar • Keywords  a particular page • Typically a home page or well known hub page • United airlines  www.united.com • Unix  gnu.org, linux.org, freebsd.org • Search as a smarter location bar • Page rank is ideally suited for this • This is largely a solved problem

  17. Varieties of Search: Research searches • User is searching for info about something • Could be directed – user is looking for a particular property • Price of something, location of some event, … • Or undirected – user is looking for some general class of properties • Reviews/feedback on product, info on person or country • If there is no hub page on the thing, existing search engines perform very poorly • New focus is on this class of searches

  18. Semantic Web for Search • Keyword based approaches haven’t made significant advances since PageRank • Improvements may be gained by adding a modicum of understanding about the *object* denoted by the search query • Improvements not just in search itself but also in the relevance of search related advertising

  19. Basic Issues • Need database of potential objects user may be referring to, along with some properties of the object … e.g., its type • Too many objects to manually construct DB • At least 300 million distinct object references on Web • If it does know something more about the search term’s denotation, (e.g., it denotes a musician), how can the search engine do better?

  20. Building the Web KB • Many different automated approaches • Simple natural language processing (Riloff, TAP, …) • Scrappers • Machine Learning • Most commercial efforts lead to proprietary KBs • Huge opportunity for wider SW community • Collaborate to actually create the KB

  21. Using the KB • Word Sense Disambiguation., e.g., MSN Search, Teoma • Incorporating data feeds into search results. E.g., MSN with popular musicians • Incorporating object type specific actions. E.g., Google with addresses and stock symbols • Coming soon … KB construction driven by ads

  22. Conclusions • Please help Eric miller

More Related