- 84 Views
- Uploaded on
- Presentation posted in: General

On the Evaluation of Semantic Web Service Matchmaking Systems

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

On the Evaluation of Semantic Web Service Matchmaking Systems

Vassileios Tsetsos, Christos Anagnostopoulos and

Stathes Hadjiefthymiades

Pervasive Computing Research Group

Communication Networks Laboratory

Department of Informatics and Telecommunications

University of Athens – Greece

ECOWS ’06 @ Zurich

- Introduction
- Problem Statement
- A Generalized Fuzzy Evaluation Scheme for Service Retrieval
- Experimental Results
- A Pragmatic View
- Conclusions

- Matching service requests and advertisements, based on their semantic annotations (expressed through ontologies)
- Numerous matchmaking approaches
- Logic-, similarity-, structure-based (graph matching)

- Various matched entities
- functional service parameters (e.g., IOPE attributes)
- Non-functional parameters (e.g., QoS attributes)

- Ultimate goal: More effective service discovery, based on semantics and not just on syntax of service descriptions

- A value that expresses how similar two entities are, with respect to some similarity metric(s)
- Important feature of almost all SWS matchmaking approaches
- Allows for ranking of discovered services
- Example DoM set: exact, plugin, subsumes, subsumed-by, fail

Matchmaking Engine

Expert

S1

e(R,S1)

r(R,S1)

S2

r(R,S2)

e(R,S2)

R

R

.

.

.

r(R,Sn)

e(R,Sn)

Sn

- Most works evaluate the performance of SWS Discovery (i.e., response times, scalability)
- Limited contributions to the evaluation of retrieval effectiveness (i.e., the ability to discover relevant services)

Q: possible service requests

S: advertisements of published services

e: QxS→W (DoM, analogous to Retrieval Status Value in IR)

r: QxS→W (expert mappings)

Evaluation is the determination of how closely vector e approximates vector r

- W is the set of values denoting DoM (for e) or degree of relevance (for r)
- W defines different evaluation schemes (EVS):

W={0,1}

Information Retrieval (IR) measures can be used:

Precision (PB) and Recall (RB)

RT: set of retrieved advertisements

RL: set of relevant advertisements

Si

e(R,Si)

Si

e’(R,Si)

S1A

S2B

S3A

S4D

S5D

S6C

S7B

S11

S21

S31

S40

S50

S60

S71

Threshold = “B”

- Since, SWS matchmaking systems have multi-valued vectors e, application of Boolean evaluation implies the introduction of a relevance threshold

- Problem 1: This “Booleanization” process filters out any service semantics captured through DoM
- Problem 2: An optimal threshold value is hard to find

- Problem 3: Boolean expert mappings are too coarse-grained and do not always reflect the intention of the domain expert.
- Experiment
- Manually defined multi-valued mappings between 6 requests and 135 advertisements of TC2 with W={0, 0.25, 0.5, 0.75, 1}
- Calculation of deviation from existing Boolean mappings

- Only ~33% of the Boolean mappings agree with the multi-valued ones
- ~40% of the Boolean mappings are not even close to the multi-valued ones (deviation > 0.25)

- Such scheme (EVS2) can provide solutions to the aforementioned problems
- Main design decisions
- Expert mappings are fuzzy linguistic terms
- DoM are fuzzy sets
- Boolean measures are substituted by generalized ones

- Why fuzzy modeling?
- Relevance is an “amorphic” concept (L. Zadeh). I.e., its complexity prevents its mathematical definition
- Numeric values have vague semantics
- Fuzzy linguistic variables assume values from a linguistic term set, with each term being a fuzzy variable set
- Warning: Fuzziness does not refer to the matchmaking process per se

I S SW R V

F SB S P E

1.0

1.0

Membership Value

Membership Value

0.0

0.0

0.5

0.5

1.0

1.0

Degree of Relevance

Degree of Match

I: Irrelevant

S: Slightly relevant

SW: Somewhat relevant

F: FAIL

SB: SUBSUMED-BY

S: SUBSUMES

P: PLUGIN

E: EXACT

R: Relevant

V: Very relevant

fr: QxS→[0,1]

fe: QxS→[0,1]

If there is not one-to-one correspondence between the number of fuzzy variables in each set, fuzzy modifiers could be used (e.g., dilutions, concentrators)

- Based on [Buell and Kraft, “Performance measurement in a fuzzy retrieval system”, 1981] the following measures are defined:

- The cardinalities of the sets RT and RL are transformed to fuzzy set cardinalities, since the above sets are fuzzy.
- Note: the evaluation measures take into account all services Si

- Manual assessment of fuzzy relevance in the “Education” subset of TC v2
- Matchmaking engine: OWLS-MX Matcher
- Used only logic-based matching algorithms
- Threshold = FAIL

Difference between RG and RB is due to considerable deviation between Boolean and fuzzy expert mappings

- Sensitivity of the proposed scheme

- Only the generalized measures, are affected by “stronger” false negatives/positives

EVS1

EVS2

EVS1 (average)

EVS2 (average)

- Similar overall behavior but better accuracy/sensitivity as already shown

Statistics

Logic

implications

Boolean

Value

(e.g., “1”)

Adjusted

Fuzzy

Value

(e.g., “relevant”)

Other inference

rules

Reasoning about “Relevance”

- A reasonable assumption
- experts are not willing to provide more than Boolean mappings

- Automatic fuzzification of Boolean expert mappings would be valuable

Service

S1

Sx

S3

R

S5

S6

S7

- Services are represented as concepts and form a service profile ontology
- Then an inference matrix is used for adjusting the Boolean r values

- The new scheme (EVS2’) approximates EVS2 better than EVS1
- Under the assumption that EVS2 is more accurate, the EVS2’ seems promising

EVS1

EVS2

EVS1 (average)

EVS2 (average)

EVS2’

- Service retrieval evaluation should be semantics-aware
- A generalization of the current evaluation measures is deemed necessary
- Fuzzy Set Theory may assist towards this direction
- However, many practical issues remain open

Questions???

http://p-comp.di.uoa.gr