On the Evaluation of Semantic Web Service Matchmaking Systems

On the Evaluation of Semantic Web Service Matchmaking Systems Vassileios Tsetsos, Christos Anagnostopoulos and Stathes Hadjiefthymiades Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and Telecommunications University of Athens – Greece ECOWS ’06 @ Zurich

Outline • Introduction • Problem Statement • A Generalized Fuzzy Evaluation Scheme for Service Retrieval • Experimental Results • A Pragmatic View • Conclusions

SWS Matchmaking • Matching service requests and advertisements, based on their semantic annotations (expressed through ontologies) • Numerous matchmaking approaches • Logic-, similarity-, structure-based (graph matching) • Various matched entities • functional service parameters (e.g., IOPE attributes) • Non-functional parameters (e.g., QoS attributes) • Ultimate goal: More effective service discovery, based on semantics and not just on syntax of service descriptions

Degree of Match • A value that expresses how similar two entities are, with respect to some similarity metric(s) • Important feature of almost all SWS matchmaking approaches • Allows for ranking of discovered services • Example DoM set: exact, plugin, subsumes, subsumed-by, fail

Matchmaking Engine Expert S1 e(R,S1) r(R,S1) S2 r(R,S2) e(R,S2) R R . . . r(R,Sn) e(R,Sn) Sn Evaluation Basics • Most works evaluate the performance of SWS Discovery (i.e., response times, scalability) • Limited contributions to the evaluation of retrieval effectiveness (i.e., the ability to discover relevant services) Q: possible service requests S: advertisements of published services e: QxS→W (DoM, analogous to Retrieval Status Value in IR) r: QxS→W (expert mappings) Evaluation is the determination of how closely vector e approximates vector r

Evaluation Schemes • W is the set of values denoting DoM (for e) or degree of relevance (for r) • W defines different evaluation schemes (EVS):

Boolean Evaluation (EVS1) W={0,1} Information Retrieval (IR) measures can be used: Precision (PB) and Recall (RB) RT: set of retrieved advertisements RL: set of relevant advertisements

Si e(R,Si) Si e’(R,Si) S1 A S2 B S3 A S4 D S5 D S6 C S7 B S1 1 S2 1 S3 1 S4 0 S5 0 S6 0 S7 1 Threshold = “B” Problem Statement (1/2) • Since, SWS matchmaking systems have multi-valued vectors e, application of Boolean evaluation implies the introduction of a relevance threshold • Problem 1: This “Booleanization” process filters out any service semantics captured through DoM • Problem 2: An optimal threshold value is hard to find

Problem Statement (2/2) • Problem 3: Boolean expert mappings are too coarse-grained and do not always reflect the intention of the domain expert. • Experiment • Manually defined multi-valued mappings between 6 requests and 135 advertisements of TC2 with W={0, 0.25, 0.5, 0.75, 1} • Calculation of deviation from existing Boolean mappings • Only ~33% of the Boolean mappings agree with the multi-valued ones • ~40% of the Boolean mappings are not even close to the multi-valued ones (deviation > 0.25)

A Generalized Fuzzy Evaluation Scheme • Such scheme (EVS2) can provide solutions to the aforementioned problems • Main design decisions • Expert mappings are fuzzy linguistic terms • DoM are fuzzy sets • Boolean measures are substituted by generalized ones • Why fuzzy modeling? • Relevance is an “amorphic” concept (L. Zadeh). I.e., its complexity prevents its mathematical definition • Numeric values have vague semantics • Fuzzy linguistic variables assume values from a linguistic term set, with each term being a fuzzy variable set • Warning: Fuzziness does not refer to the matchmaking process per se

I S SW R V F SB S P E 1.0 1.0 Membership Value Membership Value 0.0 0.0 0.5 0.5 1.0 1.0 Degree of Relevance Degree of Match I: Irrelevant S: Slightly relevant SW: Somewhat relevant F: FAIL SB: SUBSUMED-BY S: SUBSUMES P: PLUGIN E: EXACT R: Relevant V: Very relevant Fuzzification of e and r fr: QxS→[0,1] fe: QxS→[0,1] If there is not one-to-one correspondence between the number of fuzzy variables in each set, fuzzy modifiers could be used (e.g., dilutions, concentrators)

Generalized Evaluation Measures • Based on [Buell and Kraft, “Performance measurement in a fuzzy retrieval system”, 1981] the following measures are defined: • The cardinalities of the sets RT and RL are transformed to fuzzy set cardinalities, since the above sets are fuzzy. • Note: the evaluation measures take into account all services Si

ExperimentalResults (1/3) • Manual assessment of fuzzy relevance in the “Education” subset of TC v2 • Matchmaking engine: OWLS-MX Matcher • Used only logic-based matching algorithms • Threshold = FAIL Difference between RG and RB is due to considerable deviation between Boolean and fuzzy expert mappings

Experimental Results (2/3) • Sensitivity of the proposed scheme • Only the generalized measures, are affected by “stronger” false negatives/positives

EVS1 EVS2 EVS1 (average) EVS2 (average) Experimental Results (3/3) • Similar overall behavior but better accuracy/sensitivity as already shown

Statistics Logic implications Boolean Value (e.g., “1”) Adjusted Fuzzy Value (e.g., “relevant”) Other inference rules Reasoning about “Relevance” A Pragmatic View • A reasonable assumption • experts are not willing to provide more than Boolean mappings • Automatic fuzzification of Boolean expert mappings would be valuable

Service S1 Sx S3 R S5 S6 S7 A First Approach • Services are represented as concepts and form a service profile ontology • Then an inference matrix is used for adjusting the Boolean r values

Experimental Results • The new scheme (EVS2’) approximates EVS2 better than EVS1 • Under the assumption that EVS2 is more accurate, the EVS2’ seems promising EVS1 EVS2 EVS1 (average) EVS2 (average) EVS2’

Conclusions • Service retrieval evaluation should be semantics-aware • A generalization of the current evaluation measures is deemed necessary • Fuzzy Set Theory may assist towards this direction • However, many practical issues remain open

Thank You! Questions??? http://p-comp.di.uoa.gr

On the Evaluation of Semantic Web Service Matchmaking Systems

On the Evaluation of Semantic Web Service Matchmaking Systems

Presentation Transcript

Trust on the Semantic Web

Data on the (Semantic) Web

Semantic Matchmaking Algorithm

Semantic Web Service

Hera: Development of Semantic Web Information Systems

A Software Framework for Matchmaking based on Semantic Web Technology

The Ontological Semantic Perspective on the Semantic Web

Semantic Web Fred Automated Goal Resolution on the Semantic Web

Semantic Web Policy Systems

Agents on the Semantic Web

Languages on the Semantic Web

XML on Semantic Web

Presenting Knowledge on the Semantic Web

Semantic Web Service Systems

Engineering Semantic Web Information Systems

Semantic Web Instance Data Evaluation

Data Quality on the Semantic Web

Instance Data Evaluation on the Semantic Web

Instance Data Evaluation for Semantic Web-Based Knowledge Management Systems

Multimedia on the Semantic Web

Matchmaking Service

Semantic Web: A Study on Web Service Composition Approaches