A Comparison of Web Service Interface Similarity Measures

A Comparison of Web Service Interface Similarity Measures Natallia Kokash

Introduction • Semantic Service Descriptions • Web Service Description Languages • Questions • Motivating Example • Related Work • Matching Algorithm • Structural Matching • Lexical Matching • Vector-Space Model (VSM) • VSM + WordNet • Semantic Matching • Experimental Results • Conclusions and Future Work

Semantic Service Descriptions • Case Frames (FrameNet) • Agentive, dative, instrumental, factive, locative, objective cases for verbs • Vast semantic descriptions are required • Language for Advertisement and Request for Knowledge Sharing (LARKS) • Ontological description of terms, constraints and effects (rules), inputs and outputs, context • Does not describe what service does • OWL-S • Provides possibilities to describe semantics, restrictions, quality of service, usage guide • Does not provide context identification • Does not describe objects used by the service but not provided by the client • Does not describe what service does • Semantic Web Rule Language (SWRL) • Describes relations between inputs, outputs, preconditions and effects • Fits only domains that can be easily formalized • Capability conceptual model (Oaks, P. and A. H. M. Hofstede and D. Edmond, “Capabilities: Describing What Services Can Do”, ICSOC, 2003, pp. 1-16.) • Roles: perform action, has synonym… Objects: is performed with, in context of, operates on… • Does not address specialization of capability descriptions (more general or more specific capabilities) • Vast semantic descriptions are required • Logic-based approaches (Description Logics, First-Order Logic, Logic Programming, Transaction Logic, F-Logic, Action theories, etc.) • Web service Modeling Language (WSML) - formal syntax and semantics for the Web Service Modeling Ontology (WSMO)

Web Service Description Languages Web Service Description Language (WSDL) • Identity – unique identity of the interface • Syntax • Input – the meaning of input parameters • Output – the meaning of output parameters • Faults – specify the abstract message format for any error messages that may be output as the result of the operation • Types – declare data types used in the interface (XML Schema) • Documentation – natural language service description and usage guide • Ontology Web Language for Services (OWL-S) • Semantics • Preconditions – a set of semantic statements that are required to be true before an operation can be successfully invoked • Effects – a set of semantic statements that must be true after an operation completes execution after being invoked. Different effects can be true depending on whether the operation completed successfully or unsuccessfully. • Restrictions – a set of assumptions about the environment that must be true • Quality of service – a set of quality attributes such as performance or reliability • WSDL-S • References to external ontologies such as OWL-S

Questions • Realistic expectations on Web service descriptions • Accurate but still efficient discovery • Pragmatic evaluation of matching methods • Domain-specific ontologies • Domain-independent ontologies • WordNet • Hypernyms – words that are more generic than a given word • Synonyms – words with similar or identical meanings • …

Motivating Example

Related work • [Sajjanhar’04] Sajjanhar, A., Hou, J., Zhang, Y.: ”Algorithm for Web Services Matching”, Proceedings of APWeb, 2004, pp. 665–670. • [Bruno ’05] Bruno, M., Canfora, G. et al.: ”An Approach to support Web Service Classification and Annotation”, IEEE International Conference on e-Technology, e-Commerce and e-Service, 2005. • [Corella’06] Corella, M.A., Castells, P.: “Semi-automatic semantic-based web service classification”, International Conference on Knowledge-Based Intelligent Information and Engineering Systems, 2006. • [Dong’04] Dong, X.L. et al.: ”Similarity Search for Web Services”, Proceedings of VLDB, 2004. • [Platzer’05] Platzer, C.; Dustdar, S.: “A vector space search engine for Web services”, Proceedings of IEEE European Conference on Web services (ECOWS), 2005. • [Stroulia’05] Stroulia, E., Wang, Y.: ”Structural and Semantic Matching for Accessing Web Service Similarity”, International Journal of Cooperative Information Systems, Vol. 14, No. 4, 2005, pp. 407-437. • [Wu’05] Wu, J., Wu, Z.: ”Similarity-based Web Service Matchmaking”, IEEE International Conference on Services Computing, 2005, pp. 287-294. • [Zhuang’05] Zhuang, Z., Mitra, Pr., Jaiswal, A.: ”Corpus-based Web Services Matchmaking”, AAAI, 2005. • [Syeda-Mahmood’05] Syeda-Mahmood, T., Shah, G., et al.: “Searching service repositories by combining semantic and ontological matching”, International Conference on Web Services, 2005, pp. 13-20. • [Verma’05] Verma, K., Sivashanmugam, K., et al.: “Meteors wsdi: A scalable p2p infrastructure of registries for semantic publication and discovery of web services.” Journal of Information Technology and Management. Special Issue on Universal Global Integration, Vol. 6, No.1, 2005, pp. 17-39.

Matching Algorithm Implementation • Available: http://dit.unitn.it/~kokash/sources • Preprocessing: • tokenization • sequences of more than one uppercase letters • sequences of an uppercase letter and following lowercase letters • sequences between two non-word symbols Example: ”tns:GetDNSInfoByWebAddressResponse”  {tns, get, dns, info, by, web, address, response}. • word stemming • stopwords removing

Structural Similarity • Maximum weight bipartite matching • Kuhn’s Hungarian method • Polynomial time • Define overall similarity score • Query type: similarity or inclusion

Lexical Similarity • Vector-Space Model • Term Frequency-Inverse Document Frequency (TF-IDF) • VSM + WordNet • Semantic Similarity • part of speech tagging • word sense disambiguation • semantic matching of word pairs • a WordNet-based semantic similarity measure: Seco, N., Veale, T., Hayes, J.: “An intrinsic information content metric for semantic similarity in WordNet”, ECAI, 2004, pp. 1089-1090 • semantic matching of sentences • Maximum weight bipartite matching

Experimental Results (Test 1) 40 web services in 5 groups (XMethods.com): “ZIP”, “Weather”, “DNA”, “Currency”, “SMS” (Wu & Wu 2005) Average precision:

Experimental Results (Test 1)

Experimental Results (Test 2) 447 web services in 68 groups: ”business”, ”communication”, ”games”,… (Stroulia & Wang 2005) VSM is consistently better than VSM+ WordNet (p-value 0.003465 < 0.01)

Conclusions and Future Work Observations: • Data types are the most informative part of WSDL • VSM over-performs Semantic Matching • WordNet context is too general: • Batch - deal, flock, hatful, spate, lot, muckle, great deal, wad, mickle, mint, clutch, mass, quite a little, good, deal, heap, peck, stack, pile, plenty, mess, raft, pot, whole lot, sight, slew, tidy sum, whole slew • WordNet context is not sufficient: • sim(country, currency) = 0 !? • Currency is: • a country’s unit of exchange issued by their government or central bank whose value is the basis for trade • the type of money that a country uses • sim(getRateRequest(country1, country2), ConversionRate(fromCurrency, toCurrency)) = ? Future Work: • Evaluation of other algorithms and metrics (> 80) • Ex: Latent Semantic Indexing • Hybrid algorithms (Rocha, C. et al.: “A Hybrid Approach for Searching in the Semantic Web”, Proceedings of the International World Wide Web Conference, 2004, pp. 374-383) • Evaluation using collection of services with richer semantic descriptions • Enhancing of service requests with semantic information

A Comparison of Web Service Interface Similarity Measures