1 / 112

Heterogeneous Information Management

prepared for Herbstschule, Berlin. Heterogeneous Information Management. October 1998 Gio Wiederhold Stanford University. Abstract. Information is created by applying knowledge (enoded as programs or rules) to collected data and message received.

johnsoneric
Download Presentation

Heterogeneous Information Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. prepared for Herbstschule, Berlin Heterogeneous Information Management October 1998 Gio Wiederhold Stanford University

  2. Abstract Information is created by applying knowledge (enoded as programs or rules) to collected data and message received. Data and computation resources are provided by a variety of suppliers, public and private. The autonomy of the suppliers causes heterogeneity and inconsistencies. The number of potential suppliers and their autonomy also creates information overload To cope with these issues novel intermediate services are needed, opening up new opportunities. Many traditional relationships among consumers and vendors will change. We will present the concepts and status of such services. Collaboration, security, and payment schemes are some of the considerations.

  3. Section 1 • Motivation and Background • Functions needed • Architecture • Maintenance • Basic Concepts • Applications • Security ? • Current Status • Research • Effects

  4. Real-time control of processes, factories, . . . Processing as Analyses Payroll, . . . Focus on Information Systems Computing Systems Information Systems (on-line and . distributed, . . . )

  5. Industry Needs Information • Engineering and Manufacturing • own capability ésuppliers’ capabilities • demand églobal demand • Distribution and Transportation • costs for alternate means of shipping • Finance • project demand 3 project cost of funds • Marketing and Service • taste and style édemographics more from remote sources

  6. Data and Knowledge Information is created at the confluence of data -- the state & knowledge -- the ability to select and project the state into the future Knowledge Loop Data Loop Storage Education Selection Recording Integration Abstraction Experience State changes Decision-making Action

  7. Procedural system analysts programmers Declarative domain analysts knowledge engineers rule writers Creators faster Maintainers easier Knowledge Manifestations }-{

  8. Tactical Customers Inventory Suppliers Strategic Planning Capabilities Opportunities Information Leverage

  9. Plethora of Resources • Public and Private • Autonomous & diverse • World-wide • Sensor resolution (5cm SAR) • Communication • High accessability • Modest cost • New distance models Current State raises Issues

  10. Digital Earth [Vice President Gore, 1998] • A multi-resolution, three-dimensional representation of the planet, into which we can embed vast quantities of geo-referenced data. • a `collaboratory’– a laboratory without walls –for re-search scientists seeking to understand the complex interactions between humanity and our environment. • a “user interface” –a browsable, 3-D version of the planet available at various levels of resolution, a rapidly growing universe of networked geospatial information, and the mechanisms for integrating and displaying information from multiple sources. Source: NSF 1998 planning viewgraph

  11. Information overload Data starvation • More databases • public & corporate • Faster communication • digital • packeting: TCP-IP, ATM • World-wide connectivity • internet • world-wide web • Disintermediation • ubiquitous publishing

  12. Humans Perception Cognition Action Search & control GI systems Reality Data acquisition Tools sensing, transfoirming, integration. presentation Courtesy of Michael Goodchild et al. NCGIA workkshop, NSF 14Jan199 Rediscovering the world through GIS GIS planet meeting, Lisbon, Sept. 1998. Components of GI Science

  13. Support forDecision-Making • Report current status • own status -- under control of decision-maker • state of the world -- not under control • Trends from past history • temporal databases • Projection into the future • effect of decisions • effect of external events • Provide a limited number of interesting choices • avoid overload • leave choices to account for human insight

  14. Issue:Link from resources to consumer?

  15. Transforming Data to Information Application Layer Mediation Layer Foundation Layer users at workstations value-added services data and simulation resources

  16. Change in Supply vs Demand What information consumes is rather obvious, it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it. [Herbert Simon]

  17. Definition* A mediator is a software module that exploits encoded knowledge about certain sets or subsets of data to create information for a higher layer of applications. It should be small and simple, so that it can be maintained by one expert or, at most, a small and coherent group of experts. * Wiederhold: IEEE Computer March 1992

  18. Section 2 • Motivation • Function needed • Architecture • Maintenance • Fundamental Concepts • Applications • Security ? • Current Status • Research • Effects

  19. Human-computer Interaction User interface Application- specific code Service interface Domain- specific code MEDIATION Resource access interface Source- specific code Real-world interface Functional Layer

  20. Function of Mediation Apply Domain-specific Specialist Knowledge to add value • to locate data sources • to convert for consistency • to describe data for processing • to abstract for insight / models • to extrapolate to new situations • to integrate from diverse sources • to summarize for presentation • INFORMATION

  21. Making data relevant • Data reduction • Data abstraction • Level changing • Summarization • Exception search • Level change to integrate with other data sources • Follow Customer Model: hierarchical, divide-and-conquer, a common paradigm

  22. Summarize articulation Inte- -gration Hetero- genous resources Transform Selection Functionsinside Mediation

  23. Hardware platform . . . . . Operating system . . . . . . Programming language . . Database system model . . Database system . . . . . . . Coverage . . . . . . . . . . . . . . Attributes Scope Data representation . . . . . Hidden by operating system Choices are reducing: POSIX Fewer choices Irrelevant in remote access Relational and E-R common Standards, convergence Source dependent documented, additive undocumented, intersecting Conversion problems, nulls Dealing With Heterogeneity

  24. Mediation on the WWW • Resources on the World-Wide-Web • are plentiful • autonomous • incoherent • Opportunity for value-added services • select best source • improve coverage • minimize overlap • resolve inconsistencies • summarize results

  25. wrapper wrapper Wrappers • Must deal with varied data • Must deal with legacy data Wrappers reduce the number of distinct data representations the mediator has to deal with Wrappers reduce the number of distinct comm. methods Wrappers are locally maintained

  26. S1 S2 S3 Selection for Quality • User Model • f(S,C,T) • Expert assessments: • S1=.8 S2=.9 S3=8 • BEST= • low cost • rapid response • reliable delivery • trustworthiness Estimates from suppliers: C3= 10±1 T3=50±80 C1= 5±1 T1=100±160 C2= 8±1 T2=70±30 S= source reliability C= confidence T= delay time

  27. Abstraction for Input Reduction • Abstract to match levels of granularity • sources collect to differing level of detail ownership / government unit • Omit replicated or known information • autonomous sources are not disjoint • avoid known data

  28. Temporary storage to Synchronize information sources that have differing time bases / update times. Increase performance of recurring requests Common in planning applications where alternatives are being evaluated Reduce cost of access of costly resources Increase availability Extreme case: build a Warehouse Early selection of relevant source material Caching in Mediators

  29. Articulation: Intersection Terms useful for service planning Articulation ontology Matching rules that use terms from the 2 source domains Now done implicitly during conversations Hydrology terms Agriculture terms

  30. Rank material from diverse sources 1. 45 2. 43 3. 33 4. 28 .. . . transistors semi- conductors Integration Increases the Volume • Integrate associated material from diverse domains • Resolve scope mismatches

  31. Customer: • wants choices • source ids • explanation * • sometimes Result modes for ranking Databases: • Completeness • All the answers Prolog model • Correctness • The first answer Optimization • The best answer • Assumes all factors are known, no human decision

  32. Ranking Qualitative Significant Differences: in terms of the customer model Plan 1. UA59 dep.Wash.Dulles 17:10, arr. LAX 19:49 Plan 2. AA75 dep.Wash.Dulles 18:00, arr. LAX 20:10 Plan 3. UA119 dep.Wash.Dulles 9:25, arr. LAX 12:00 Busy Joe: P1= P2, P3 Speedy Mike: P2, P1=P3 Greedy Pete: P1=P3, P2

  33. Summarize to Reduce Load Based on a hierarchical customer model (because we understand processing in hierarchies) • Statistical aggregation: • sums / means / SDs of subsets • Assessment of completeness • Temporal aggregation: • compute intervals from events • Exception seeking • find significant difference from expectation or simulation-projection

  34. Delivery to Customer • Transform to create an effective presentation • Adaptat to bandwidth & media need of the client • Leave GUI to client software on customer’s workstation

  35. Flow in mediation • QUERY / DELIVERY t s • EXPANSION / ___SUMMARIZATION t s • SELECTION / ___INTEGRATION t s • REFINEMENT / ___ABSTRACTION t s • ACCESS / RETRIEVAL

  36. Summary: Mediation Functions • Delivery to client • Summarization or determine exceptions from expected values or trends • Omission of replicated or known information • Integration of data from diverse domains • Resolution of scope mismatch • Abstraction to match levels of granularity • Conversion to compatible protocols and representations transistors semi- conductors • Assessment of quality of diverse sources • Acess to sources via wrappers

  37. Section 3 • Motivation • Function needed • Architecture and development • Maintenance • Fundamental Concepts • Applications • Security ? • Current Status • Research • Effects

  38. mediators network Evolution of mediation applications A3 A4 A2 A5 A1 A6 integrators a. I2 I1 M1 b. M2 c. d. e. wrappers D1 W3 D6 W2 D5 D4 W1 D2 D3 datasources

  39. Central Solutions do not Scale What works with 7 modules and one person in charge fails when there are 100 modules and a committee is needed Any changes in resources affect the central module

  40. 7 • 2 Partition mediation by domain Manage complexity • Limit sources per mediator • n sources É n changes / year/month/... • Limit conceptual scope per mediator • finance / agriculture / geography / … • Make reuse likely • New applications use existing, reliable value-added services committees

  41. Integration at 2 Levels • Application • Informal, pragmatic • Client-control • Use up to 7±2 mediators • Mediation • Formal, reliable service • Domain-Expert control • Use up to 7±2 sources

  42. Application C Application B Application I M2 M N 1 N 2 DB P DB Q Allocation Flexibility User Interfaces copy Provider of medi- ator N Provider of Mediator M Copy- if high intensity of interaction with 1. Application (M2) 2. Resources (N1,2) 3. Processing (M1) HPC N M1 Mediators are only code DBS R Databases

  43. Human « Computer {x-widgets, HTML} Application « Mediator {OQL, KQML, ...} Mediator « Data sources {SQL, TQL, XML, … } Data ¬ real world {sensors, clerks, … } Interfaces

  44. Current Technologies • SQL • One Verb - SELECT with primitive aggregation • One Database at a time • One Datatype: Tables • Object-orientation • Group data into objects = predefined aggregation • Program snippets -- methods -- with the data • Midleware (ex.: CORBA) • Fetch objects from server • Assume coherent domains

  45. CORBA (Common Object Request Broker) IBM SOM, DSOM DOE (Distributed Objects Everywhere) SunSoft DOME EZ-bridge System Strategies inc. ILU (InterLanguage Unification) Xerox ISIS KQML (Knowledge Query & Manipulation Lang.) MQM (Message Queing Middleware) IBM (for mainframe connections) OLE (Microsoft: Object embedding and Linking) OpenDOC (Apple) PDES (Product Data Interchange using STEP) TIB (Teknekron Information Bus) { Shared speci- fication Middleware Many standards by many vendor groups

  46. } KQMLKNOWLEDGE QUERY & MANIPULATION LANGUAGE = Ontology = Representation • Get, • Put, • Infer, • Subscribe, • Advertise, • . . . Hq97 performatives speak KIF, objects, tuples, equations

  47. Section 4 • Motivation • Function needed • Architecture • Maintenance • Basic Concepts • Applications • Security ? • Current Status • Research • Effects

  48. ? 13 12 11 years 10 100% 9 90 8 80 7 70 6 60 5 50 4 40 3 30 2 20 1 10 0 Maintenance is good for you relative annual maintenance cost depreciation = 1 / lifetime lifetime automobile hardware software

  49. X s Client system Fast build of clients by resource reuse data and simulation resources Changes (x) are difficult, can affect many clients Client-Server Architecture

  50. Systems with Mediators Applications . . . . Gio Wiederhold. 1995 Mediators . . . . . . Data Resources . . .

More Related