1 / 37

Matching and Reuse of XML Schemas

Matching and Reuse of XML Schemas . Sample XML Schema. <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="car"> <xs:complexType> <xs:sequence> <xs:element name="make" type="xs:string"/> <xs:element name="model" type="xs:string"/>

daktari
Download Presentation

Matching and Reuse of XML Schemas

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Matching and Reuse of XML Schemas

  2. Sample XML Schema <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="car"> <xs:complexType> <xs:sequence> <xs:element name="make" type="xs:string"/> <xs:element name="model" type="xs:string"/> <xs:element name="year" type="xs:string"/> <xs:element name="color" type="xs:string"/> <xs:element name="driver"> <xs:complexType> <xs:sequence> <xs:element name="first" type="xs:string"/> <xs:element name="last" type="xs:string"/> <xs:element name="license" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>

  3. What is XML schema matching • Matching – identifying the relations among the corresponding elements of two schemas • e.g. customer/firstName <==> client/name/first customer/name <==> concatenate (client/name/first, client/name/last) • Calculate the distance between two Schemas • E.g., distance between customer.xsd and client.xsd is 0.67.

  4. Why XML Schema matching • From data integration point of view: • Purpose: Automatically identifying corresponding elements between two schemas • Relevant works: • Database schema matching/mapping, e.g., A. Doan, et al., Reconciling schemas of disparate data sources: A machine-learning approach. SIGMOD, 2001 • Generic schema mapping, e.g., J. Madhavan, P. A. Bernstein, E. Rahm. Generic schema matching with Cupid. VLDB, 2001. • XML Schema matching. E.g. H. Do, E. Rahm. COMA A system for flexible combination of schema matching approaches. VLDB 2002. • From web service composition point of view • e.g., matching the output type of one service with the input of another in sequential composition • From software reuse point of view: • Purpose: Build XML Schema categories and search engines; • Relevant works: • Software component search: A Mili, R Mili, RT Mittermeir, A survey of software reuse libraries, Annals of Software Engineering, 1998. • Agent and service matching: Katia Sycara, Jianguo Lu, Matthias Klusch, Interoperability among Heterogeneous Software Agents on the Internet, Technical Report CMU-RI-TR-98-22, CMU.

  5. What are the problems • Modelling • As graph • As tree matching • Node similarity • Name, type, cardinality. • Structure similarity • Tree edit distance • K. Zhang, D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing, 1989.

  6. Node Relations Modelling Name Relations XMLSchema NodeSimilarity Structural similarity XMLSchema Results retrieval Overview of our system Structural Relations NameSimilarity

  7. Name Similarity NodeSimilarity Structural Similarity User-defineddata type Built-indata type Cardinality Compatibilitytables Three similarities Node name Hierarchicalstructure WordNet, string matchingHungarian method Tree matchingalgorithm

  8. Model schemas as trees <xs:element name="driver" type="driverType"/> <xs:attribute name="license" type="xs:string"/> Modelling

  9. Model schemas as trees Address_ca.xsd Address_us.xsd schema customerOrder paper address schema address shipping billing customerOrder address reference shipping billing street postcode province contents author title street date zip bill2Add state date ship2Add paper refNo postcode street province date bill2Add date ship2Add Modelling Recursion Reference Importing and Inclusion

  10. Model schemas as trees name name last first first last Information excluded in Modelling • Related to elements or attributes • Default value, value range, unique, nullable… • Related to structure • Sequence • All • Choice

  11. Node similarity Computing node similarity • Computing name similarity with the help of: • WordNet and its API • String matching • Hungarian method • Add the similarity of other information • Data type • Minimum cardinality • Maximum cardinality

  12. Node similarity customerDeliveryAddress vs. clientRequiredShippingAddress client sim0,0 customer require delivery shipping address address simi,j Name similarity from token lists • Tokenize names • E.g. clientName -> client name submittedReports -> submit report • Similarity between two token lists • Using Hungarian method for Weighted Bipartite Graph Matching (WBGM)

  13. Structure similarity Tree 1 Tree 2 Determine the structural relation

  14. Structure similarity make firstName model lastName year license car driver make color model first car year last driver color license Common substructure

  15. Structure similarity make firstName model lastName year license car driver make color model first car year last driver color license Approximate Common Structure

  16. Structure similarity make model year car color first (firstName) last (lastName) driver license Mappings in an ACS mACS1 = {(s1.car, s2.car), (s1.make, s2.make), (s1.year, s2.year), (s1.color, s2.color)} ACS1 ACS2 mACS2 = {(s1.dirver, s2.driver), (s1.fist, s2.firstName), (s1.last, s2.lastName), (s1.license, s2.license)}

  17. Evaluation Evaluation • Criteria • Matching outcomes • Mappings • Schema similarity • Execution time • Collected four groups of Schemas • Purchase orders used in COMA (5) • Large schemas from XML.org (86) • Schemas on hospitality domain (95) • Extract from WSDL (419)

  18. Evaluation Comparison with edit distance algorithm element mapping on data group 1 Method 1: our algorithm Method 2: edit distance

  19. Evaluation Comparison with edit distance: schema similarity data group 3 and 4 Method 1: our algorithm Method 2: edit distance

  20. Evaluation Comparison with edit distance: performance on data group 2 Method 1: our algorithm Method 2: edit distance

  21. Evaluation Comparison with COMA (Mapping) Overall is a measure that combines precision and recall. It reflects the efforts of removing incorrect mappings and adding missing ones.

  22. Conclusion • Scalable schema matching • Wang Lian, David W. Cheung, Nikos Mamoulis, and Siu-Ming Yiu, An Efficient and Scalable Algorithm for Clustering XML Documents by Structure, TKDE, 2005. • Subtyping • Apply to web service matching

  23. Web service synthesis

  24. composition Web Service Composition • Composite web service: “service implemented by combining the functionality provided by other web services” –G. Alonso et al. • Web service composition: the process of developing a composite web service • Approaches to web service composition: • Conventional programming languages, such as Java, C#; • Web service composition languages, such as BPEL; • Workflow, pi-calculus, petri net, automata… • Web service synthesis.

  25. composition Web Service Synthesis • BPEL and the like are still programming languages • They describe exactly howto compose the web services. • Web service synthesis • We describe what is the service. But don’t describe how to implement it; • We don’t even know what are the component services involved; • The relevant services are discovered and invoked dynamically; • The implementation is synthesized from the web service specification, automatically. • Program synthesis has a long history.

  26. composition WS Syntactic Specification (WSDL) Semantic Specification (Datalog) WS2 WS1 Service Specification (WSDL/Datalog) Service Implementation Web Service Synthesis WS Service Implementation (BPEL)

  27. composition Synthesis Example Chapters MetaSearchService Syntactic specification: … Semantic Specification: chapters(ISBN, PRICE, TITLE, AUTHOR) <- Chapters(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR). Service specification Syntactic: Interface definition defined by WSDL Semantic: Q(ISBN, PRICE, TITLE, RATE) <- Chapters(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR), Book2(ISBN, COMMENT, RATE). amazon Service Specification Syntactic specification: WSDL file Semantic Specification: amazon(ISBN, PRICE, RATE, TITLE, AUTHOR) <- Amazon(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR), Book2(ISBN, COMMENT, RATE). MetaSearchService Implementation Service Implementation Java code, database ??

  28. composition Generate the abstract implementation by query rewriting Chapters MetaSearchService Syntactic specification: … Semantic Specification: chapters(ISBN, PRICE, TITLE, AUTHOR) <- Chapters(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR). Service specification Syntactic: Interface definition defined by WSDL Semantic: Q(ISBN, PRICE, TITLE, RATE) <- Chapters(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR), Book2(ISBN, COMMENT, RATE). amazon Service Specification Syntactic specification: WSDL file Semantic Specification: amazon(ISBN, PRICE, RATE, TITLE, AUTHOR) <- Amazon(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR), Book2(ISBN, COMMENT, RATE). MetaSearchService Abstract Implementation Service Implementation Java code, database Q(ISBN, PRICE, TITLE, RATE) <- amazon(ISBN, PRICE, RATE, TITLE', AUTHOR'), chapters(ISBN, PRICE0, TITLE, AUTHOR).

  29. composition MetaSearchService Abstract Implementation MetaSearchService Concrete Implementation Q(ISBN, PRICE, PRICE0, TITLE, RATE) <- amazon(ISBN, PRICE, RATE, TITLE', AUTHOR'), chapters(ISBN, PRICE0, TITLE, AUTHOR). Invoke amazon; Invoke chapters; Combine the output; Generate the Concrete Implementation MetaSearchService Chapters Service specification Syntactic: Interface definition defined by WSDL Semantic: Q(ISBN, PRICE, PRICE0, TITLE, RATE) <- … Syntactic specification: … Semantic Specification: chapters(ISBN, PRICE, TITLE, AUTHOR) <- Chapters(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR). amazon Service Specification Syntactic specification: WSDL file Semantic Specification: amazon(ISBN, PRICE, RATE, TITLE, AUTHOR) <- Amazon(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR), Book2(ISBN, COMMENT, RATE). Service Implementation Java code, database

  30. composition It is a lightweight approach… • Web services are restricted to be database queries or functions that can be described by database queries or Datalog; • Semantic specification is Datalog instead of more powerful specification mechanism employing ontology; • Compositions are restricted to data composition instead of full-blown process specification such as BPEL. • All those choices are meant for the construction of a practical web service synthesis system…

  31. composition Mapping between Datalog and Web Services • Database vendors also provide wrappers for web services • Behind a web service there is a SQL query that corresponds to the web service; • SQL defines the semantics of the web service. • Major database vendors support the mapping between SQL and Web service; • We experimented with DB2WS. Malaika, S. et al. DB2 and Web Services. IBM System Journal, 41(4), pp. 666-685. 2002.

  32. composition Generate the Abstract Implementation by Query rewriting Definition: Given a query Q and a set of views V. A rewriting of Q using V is a query Q’ such that Q=Q’, and Q’ refers to one or more views in V. Views: V1T1,T2. V2T2,T3. Rewriting 1: Q V1, T3. Query: Q  T1, T2, T3. Rewriting 2: Q  V1, V2.

  33. composition Our query rewriting system

  34. composition Limitations of our approach • Focus on database web services; • Datalog is not expressive enough. • Query rewriting in Description Logic, or OWL. • Assume the existence of global database schemas: • Service providers need to provide the semantic definition of web services in terms a global database schema; • New service specification is also defined using the common schema • Schema matching

  35. Other threads • Web service collection and clustering • From UDDI, Crawler, Search engines such as Google • Master thesis to be finished this summer • Web service metrics • Schema subtyping • Based on regular tree grammar • Master thesis to be finished this summer • Bottom up web service composition • Semantic web service

  36. Discovery agency Provider Requester Service Oriented Architecture publish find interact

  37. Web service discovery • Keywords search • Based on IR techniques, such as vector space model • Fast, but not accurate • Signature matching • Decide subtype relations between input and output of web services • Used in service composition, to find composable web services • Relaxed matching • Approximate matching, allowing small deviations in both structure and words/tags • Semantic matching • Matching functional requirements of web services • Used in adaptive, autonomous systems

More Related