1 / 48

Web services and data integration

Web services and data integration. S. Abiteboul Omar Benjelloun Tova Milo INRIA and Xyleme INRIA INRIA and Tel Aviv. Organization. The context Accessing information on the Web Web services SOAP WSDL UDDI Active XML AXML documents AXML services

myra
Download Presentation

Web services and data integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web services and data integration S. Abiteboul Omar Benjelloun Tova Milo INRIA and Xyleme INRIA INRIA and Tel Aviv

  2. Organization • The context • Accessing information on the Web • Web services • SOAP • WSDL • UDDI • Active XML • AXML documents • AXML services • Architecture et implementation • Applications • Conclusion

  3. The context The Web and XML are changing dramatically the management of distributed information

  4. Distributed data management • Warehousing • Mediation • Management of data in cooperative work • Management of data in distributed scientific applications • Mobile data management • Document management • Web sites • Portals, etc. • Information used to live in islands and this is changing

  5. The Web of yesterday • Protocol: HTTP • Documents: HTML • Millions of independent Web sites and billions of documents • Browsing and full-text indexing • Publication of databases using forms • Data management with the Web • HTML is primarily to be read by humans • Data management applications over Web data • Based on hand-made wrappers • Expensive, incomplete, short-lived, not adapted to the Web constant change No real support for distributed data management!

  6. Information used to live in islands but it is changing • Different formats: relational, metadata, documents, text • A Web standard for data exchange, XML, is fixing it • XML captures all kinds of information over a wide spectrum • XML comes with a family of emerging standards: XML schema, XSL/T, Xquery, domain specific schemas… • Different computers, platforms, languages, applications • A standard for Web services, SOAP, is fixing it • SOAP allows ubiquitous computing on the Internet • SOAP comes with a family of emerging standards: WSDL, UDDI • This provides a uniform access to information… …the dream for distributed data management

  7. Minimal structure The information spectrum Semi-structured data and XML Structured Data Hierarchy + Meta data Books Contracts Catalogs Bank accounts Emails Financial Reports Insurance Policies Economical Analysis Derivatives Inventory Political analysis Insurance Claims Financial News Sports News Resumes

  8. What can be captured with XML? • Very structured information such as database, knowledge base • Most DBMS now export in XML • Semi-structured data such as data exchange formats (ASN.1, SGML), e.g., technical documentation • Less structured data: documents • Meta-data: Author, date, status • Existing structure in them: chapter, section, table of content and index • Possibly tagging of elements in it (citation, lists) • Links to other documents • Plain text • Meta data for unstructured data such as images and sound

  9. A standard for information: XML labeled ordered trees where leaves are text • Marriage of document and database worlds • Marriage of full text indexing and structure indexing • Is it the ultimate data model? No • Purely syntax – more semantics needed • Is it OK for now? Definitely yes (because it is a standard)

  10. Applications need typing and XML data can be typed if needed (DTD and XML schema) Trees Semantics and structure are in tags and paths product-table/product/reference product-table/product/price product-table product reference price designation description The main asset of XML: typing

  11. A standard for distributed computing: Web services • Possibility to activate a method on some remote Web server • Exchange information in XML: input and result are in XML • Ubiquitous XML distributed computing infrastructure • 2 main applications • E-commerce • Access to remote data • With XML and Web services, it is possible • To get information from virtually anywhere • To provide information to virtually anywhere

  12. The basic picture XML m() Black box query SOAP messages answer XML SOAP service Web client Internet

  13. Accessing and integrating information

  14. Accessing remote information Query some data services that provide candidate genes Multi formats + multi protocoles Gene banks Application using gene banks processing Use some processing services processing processing

  15. Same with Web services Query some data services that provide candidate genes Web Gene banks Application using gene banks processing Use some processing services processing processing

  16. The big picture: peer2peer Web service DB Web Service queries Web queries DB Web Service Web service Data warehouses Databases Web pages PC, PDA, cell phones… …

  17. The main roles Client Look up Service Registry bind publish Service Provider

  18. Simple view: Looking for information about Gismos • Query some yellow-pages: Who knows about Gismos? 2. Negotiate with Gismo specialists Nature of the service Quality, cost 3.Get the information Order, payment, delivery Integration in my information system 4. Eventually publish information … and all this automatically…

  19. Ontologies Find ontologies to build wrappers Data integration – Logical view Service directories Mediator or warehouse wrapper1 Service descriptions Get service description wrapper2 wrapper3 source2 source3 source1

  20. The Web service solution Data and service repository Web UDDI Data and service description wsdl RDF Data and service semantics worklow wsfl XML+SOAP

  21. Mediation with Web services Service directories Service descriptions wrapper3 source3 Mediator Web wrapper1 source1 wrapper2 source2 • Web services: • Service directories • Service descriptions • Wrappers • Sources • Mediators/warehouses

  22. Advantages for data integration • A universal model for data integration = XML • Solves the heterogeneity issue • A universal protocol for distribution = SOAP • A language for describing the interface of data sources = WSDL • Simple object access protocol (something like Corba) • Web service description language (something like IDL) • Solves the interoperability issue • A standard for publication and discovery of information = UDDI • Universal Description, Discovery and Integration • A standard for describing the semantics of sources = RDF • Resource description framework

  23. Advantages – continued – the goal • The system can find a new source of information using UDDI • Understand its syntax using WSDL • Understand its semantics using RDF • Get it using SOAP • The information is in XML, can be restructured and integrated automatically • Not yet… But soon?

  24. Active XML Joint work with: Bernd Amann, Jerôme Baumgarten, Angela Bonifati, Ioana Manolescu, Frederic Ngoc and others

  25. AXML = XML + embedded SOAP calls SOAP messages AXML AXML AXML m() query query Web server Web client answer answer AXML q1($1,$2), Q2, Q3… (XPATH, Xquery) Internet Internet AXML peer: client and server

  26. Active XML AXML peer • Peer-to-peer architecture • Each Active XML peer • Repository: manages active XML data with embedded Web service calls • Web client: activate calls in the documents • Web server: provides Web services defined as (parameterized) queries over the repository soap

  27. Build on existing standards Tree data: XML • internal data representation and • data exchange XML AXML Web services: SOAP, WSDL Query languages: Xquery/Xpath

  28. AXML peer: repository of AXML documents <directory> <dep name="Toy“> <sc>toy.xyz.com/GetToyPersonel()</sc> </dep> <dep name=“DVD“> <sc>dvd2000.com/GetDVDPersonnel()</sc> </dept> </directory> Service calls May contain calls to any SOAP Web service e-bay.net, google.com, etc. to any AXML Web service

  29. AXML peer: Web client <directory> <dep name="Toy“> <person pname=“Smith”> <phone>01…</phone> <pda> <sc>toy.xyz.com/GetPDA(../../@pname)</sc> </pda> </person> <sc>toy.xyz.com/GetToyPersonel()</sc> </dep> <dep name=“DVD“> <sc>dvd2000.com/GetDVDPersonnel()</sc> </dept> </directory> Result

  30. Controlling the evaluation • Activation of calls and data lifespan are controlled • frequency: when is the service called ? (« call each day ») • validity: how long is the retrieved data valid ? • mode: immediateor lazy ?

  31. Example: control attributes <directory> <dep name="Toy“> <sc valid=“rt+1 week” mode=“immediate” > toy.xyz.com/GetToyPersonel() </sc> </dep> <dep name=“DVD“> <sc valid=“0” mode=“lazy” > dvd2000.com/GetDVDPersonnel() </sc> </dept> </directory>

  32. AXML peer: Web server • AXML Web services: defined using XQuery over AXML documents let service Get-Toy-Personnel( ) be for $a in document("toy.xyz.com/members.axml")/member, $b in $a//name, $c in $a//phone, $d in $a//pda return <person pname={ $b/text() }> { $c } { $d } </person>

  33. The crux: the exchange of AXML data • Arguments & result of calls are AXML • Data is thus intentional & dynamic • Distributed computing: by sending data containing service calls, one can delegate some work to other peers • Partial computations: by returning data containing service calls, one can give to the receiver the control of these calls • All this can be controlled

  34. Example: Tourist guide … <sc>yahoo.com/Temp(“Paris”)</sc>… I need to evaluate the temperature of Paris • I call Yahoo: <sc>meteoF.com/t(“Paris”)</sc> • I call meteoF: <t type=“celcius”>0</t> I am asked what is the temperature of Paris • … <t type=“celcius”>0</t> • … <sc>meteoF.com/t(“Paris”)</sc>… • … <sc>yahoo.com/Temp(“Paris”)</sc>…

  35. Continuous services • Inside the tourist guide: new events • Pull mode : standard SOAP query • Ask once a week • Push mode : subscription to a continuous service • When new events are announced, they are pushed to the AXML document • Possibility to define AXML continuous services

  36. Architecture andimplementation

  37. Global architecture AXML peer S2 AXML peer S1 query SOAP XQuery processor Evaluator AXML AXML peer S3 AXML read update SOAP wrapper read update consults SOAP service descriptions SOAP service XML AXML document store AXML SOAP client service call service result

  38. Illustration: 3 applications

  39. Application 1: Warehousing(DEC9) • Construction of warehouses with Web data • Monitoring of changes on the Web • Kind of services that are used • Google search engine • wget • Classification • XML Diff and site changes • Page monitoring system • etc.

  40. Application 2: Mobile data • AXML peers as mobile entities • Active data store with query capabilities • Metadata and object profiles • Issues • Storage services for mobile objects • Processing services for mobile objects • Use proxies for that • European Project DBGlobe

  41. Application 2: Mobile data • Light-weight AXML peers • PDA, cellular phone, laptop… • Limited storage, network bandwidth • Sometime disconnected • Limited functionalities • E.g., support for continuous services based on a mail server and SMTP

  42. Application 2 : context awareness • Where am I? (geographical position) • Where is the « nearest » AXML proxy? (network position) • Active use of this information • For providing context dependent data (e.g., time, temperature, nearest restaurants, etc.) • For selecting services (e.g., choose a nearby proxy for caching)

  43. Each peer proposes some auctions The document records the peer’s items and the bids Each peer knows about some auctions of other peers Each peer can bid on any auction The peer recalls the bids she has put When an auction closes, the winner is notified No centralization Application 3: P2P Auction

  44. Conclusion and on-going work

  45. AXML services • A simple, declarative way to create Web services compatible with current standards for Web services invocation • AXML services are powerful tools for data integration • They allows for new, powerful features • Intentional parameters and results: AXML documents (containing service calls) that are exchanged. • Continuous services send back a stream of answers (SOAP messages) to the caller

  46. Many issues • Security • Typing of parameters • Lazy evaluation and optimization • Replication • Mobility: dbglobe project • Termination • Implementation • Foundations • And more

  47. Using type to control the use of services Accept Peer1 Peer2 f f g Evaluate g before sending data Peer1 tells which kind of data it exports and Peer2 which kind it accepts

  48. Distribution and replication • Motivated by mobile devices with limited resources • Allows to distribute one XML document on several peers • Allows to replicate an XML-sub-tree on several peers • Query optimization

More Related