1 / 47

Introduction

Information Integration Lecture 1. Introduction. Michael Genesereth Autumn 2001. Information Processors. x>y & y>z => x>z. a>b. b>c. a>c?. Universal Connectivity. Information Broker. Client. Client. Client. Information Broker. Source. Source. Source.

kalli
Download Presentation

Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Integration Lecture 1 Introduction Michael Genesereth Autumn 2001

  2. Information Processors x>y & y>z => x>z. a>b. b>c. a>c?

  3. Universal Connectivity

  4. Information Broker Client Client Client Information Broker Source Source Source

  5. Syntactic Search Engines Google Search Words Document References Document Document Document Document Document

  6. Too Many Results Query: Who is older -- Jane or John? Search Words: John Jane older Document Fragments: ..John is older than Jane... Jill wants to know whether John is older than Jane... ..John is older than Jill... ...Jim is older than Jane...

  7. Too Few Results Query: Is it the case that John is older than Jane? Document fragments: ..John is more advanced in years than Jane... ..Jane is younger than John... ...John is the father of Jane...

  8. No Integration Query: Is it the case that John is older than Jane? Documents: ...John is older than Jill... ...Jill is older than Jane...

  9. Content versus Form Semantic View Syntactic View Thosewhowillnotr easonPerishinthe act;Thosewhowill notactPerishfort hatreason. Those who will not reason Perish in the act; Those who will not act Perish for that reason.

  10. Structured Data Free Form Text Easy to Use but limited capability Too Few answers, too many answers Impossible to Aggregate effectively Structured Data Taxonomy, Attributes, Typed Values Powerful search possible Aggregation possible

  11. Databases name manager office phone John Jill MJH222 38086 Jane Jerry Cedar12 57493 Jill MJH222 Jerry 420-032 56777

  12. Fragmentation name manager office phone John Jill MJH222 38086 Jane Jerry Cedar12 57493 name manager office phone Jill MJH222 Jerry 420-032 56777 name manager John Jill Jane Jerry Jill Jerry name office phone John MJH222 38086 Jane Cedar12 57493 Jill MJH222 Jerry 420-032 56777 Horizontal fragmentation Vertical Fragmentation

  13. Replication • Network Issues • Latency • Bandwidth • Reliability • Information Source Issues • Limited Availability • Performance • Unscheduled Failures • Solution - Replication • Problems - Cost and Update

  14. Heterogeneity name manager office phone John Jill MJH222 38086 Jane Jerry Cedar12 57493 Jill MJH222 Jerry 420-032 56777 name employee location telephone John MJH222 7238086 Jane Cedar12 7257493 Jill John MJH222 Jerry Jane 420-032 7256777 “The biggest problem facing anyone who wants to search multiple structured databases. . .is that many organizations use different words to describe the same thing. “ Martin Marshall, Communications Week

  15. Automatic Information Integration integrated access to fragmented, heterogeneous, distributed data sources giving the illusion of a homogeneous data management system Client Client Client Information Broker Source Source Source

  16. Potential Application Areas Corporate Logistics - Enterprise Resource Directories Personnel, locations, organizations, equipment, orders Electronic Commerce - Integrated Product Catalogs Catalogs, inventories, product ratings, contracts Health Care - Consolidated Patient Records Doctors, nurses, lab technicians, administrators, patients Multidisciplinary Engineering - Concurrent Engineering Architects, engineers, construction planners Command and Control - Situation Assessment Commanders, intelligence, field officers, consultants

  17. Question Give me a list of 15 inch aluminum skillets with nonstick coating rated at least 4 out of 5 by Consumer Reports that sell for under $30 and are currently in stock.

  18. Data Sources Retailer Product Data Vendor Catalogs Consumer Reports Ratings Currency Conversion Tables Price Sheets Inventory Data Demographic Data Company Data

  19. Quotes The catalog … is what I believe is blocking the growth of Internet commerce.- Geoffrey Moore, Red HerringContent catalogs are critical to enabling an electronic conversation between business partners- Goldman Sachs, November 2000You can’t buy it if you can’t find it.- Amos Barzilay

  20. Infomaster Data Integration System - integrated access to heterogeneous data sources giving the illusion of a homogeneous data management system Client Client Client Infomaster Source Source Source “Infomaster creates an environment that makes it easier for information consumers to get the information they need to answer their questions, while making it easier for owners to publish and share their databases. “ Dennis Rayer, Manager, Data Warehouse, Stanford University

  21. Demonstration Architecture Costco Buyer GTW Catalog User Payless Buyer Costco Interface GTW Interface Payless Interface Rule Library Integrator Internal Warehouse Corning Agent Mirro Agent Regal Agent Corning Data Source Mirro Data Source Regal Data Source

  22. Demonstration Architecture Costco Buyer GTW Catalog User Payless Buyer Costco Interface GTW Interface Payless Interface Rule Library Integrator Internal Warehouse Corning Agent Mirro Agent Regal Agent iMerge Corning Data Source Mirro Data Source Regal Data Source

  23. Course Schedule 1. MRG - Introduction 2. MRG - Data Model - Project Phase Ia 3. MRG - Knowledge Model - Project Phase Ib 4. MRG - Data Integration 5. MRG - Data Integration in Infomaster - Project Phase II 6. MRG - Data Aggregation 7. MRG - Data Aggregation in Infomaster - Project Phase III 8. xxx - View inversion, Containment, and bucket method 9. xxx - Qian, Duschka and Genesereth, Master Schema

  24. Course Schedule 10. xxx - XML, RDF 11. xxx - xCBL, ebXML, cXML 12. xxx - XPath and XSL 13. xxx - standards (e.g. D&B) and directories (e.g. UDDI) 14. yyy - iMerge 15. yyy - Cohera, requisite 16. yyy - a2i, goto 17. xxx - student papers and projects 18. xxx - student papers and projects

  25. Grade Requirements Participation (20%) Attendance Good Questions Good Ideas Project (20%) Functionality Performance Presentations (20%) Familiarity with Material Strengths and Weaknesses Additional Perspectives Clear Exposition and Good Discussion Paper (40%) Correctness and Completeness Appropriate Incorporation of Existing Material Inherent Interest Heft

  26. Deadlines October 4 - Volunteer for Topic Presentation October 9 - Project Phase I complete October 16 - Project Phase II complete October 23 - Project Phase III complete October 30 - Paper Proposal November 20 - Paper Complete December 4 - Paper Ready for Presentation

  27. Assignments Find Teammates Register on Course Website Volunteer or Be Assigned Read Introduction Papers (logic and cghipuw) Read Data Model Papers (rdf and graphs) Think

  28. Pre-Lecture Exercise What is Information Integration? multiple users and multiple sources fragmentation, replication, heterogeneity update and query What are some examples? movies catalogs patient records collaborative design What is not included? observation, including parsing of images, etc. action, including fancy graphics, etc. planning and execution beyond info exchange

More Related