An Improved Discovery Engine for Efficient and Intelligent discovery of Web Service with publication facility Vandan Tewari1 . N Dagdee2 . Inderjeet Singh1 . Nipur Garg1 . Preeti Soni1 1. Shri G.S. Institute of Technology & Science, Indore 2. Shri S.D. Bansal Institute of Tech. & Science, Indore 1

Contents • Background • Related work on Web Service Discovery • Addressed Issues • Our proposal • Proposed architecture • Proposed Algorithm • Modules implemented • Test Case and Results • Conclusion & Future Enhancements 2

Background • SOA (Service Oriented Architecture) Service Oriented architecture is the latest evolution of distributed computing which enables software components to be exposed as services. • Web Service A web service is a stand alone software component designed to support interoperable machine-to-machine interaction over a network. 3

Web Service & SOA Find Publish Bind An example scenario of web service 4

Related work on Web Service DiscoveryAvailable sources for service discovery & their respective drawbacks. A. Centralized Service Broker (UBRs) * • Single point failure. • Performance Bottlenecks B. Federated Registries* • Inconsistent policies to be employed so real time search is inefficient. • No advance search facility is available. * Ref.[E.Al-Masri, www2008,pp.795-804] 5

Continued… • Search Engine ** • Inability to distinguish between web page & web service document (WSDL) leads to data irrelevancy. • Web Crawler Engine*** • Problem of service overload still exists. • ** Ref. [K.Sivashanmugam et.al. ISWC,pp. 270-278,2004] • ***[E. Al-Masri, Q. H. Mahmoud, IEEE ICWS 2007, pp.1104-1111] 6

Technical limitations of UDDI • Passivity of UDDI since service revocation is voluntary, it results in passive data in UDDI. • Absence of QoS parameters for Web Services. • Absence of web service life cycle management. Ref. [K.Sivashanmugam et.al. IEEE,ISWC ,2004,pp. 270-278] 7

Addressed Issues • How to deal with passivity of UBRs to increase service availability. • Due to service overload if UBRs are overflowing with services, difficulty in discovering appropriate services. • Suggesting appropriate services to the service requester based on service feedbacks and frequency of usage. 8

Our Proposal A “Discovery cum Publishing Engine” has been designed which increases the service availability by removing passive web services from UBR and improve service search time by applying data mining techniques on the contents of UBR and also uses past user service feedbacks and usage frequency to suggest appropriate services to the service consumer . 9

Assumptions • Domain of trust among UBRs is already established. • Test case is developed on small set of experimental data. • Predefined classification scheme is used based on “Location parameter” of Travel service. 10

ProposedArchitectureof Our System UBR1 UBR2 UBR3 UBRn Crawl Crawl Crawl Discovery cum Publishing Engine Publish Manager Search Manager Validation Module Add Review Discover Publish Bind Service consumer Service Provider 11

Modules Implemented • Publish Manager B. Search Manager • UBRs Crawl Module • Search Module • Dynamic IP Module • Cluster Module C. Validate Module • WSDL Parser Module • Delete Module D. Add Review Module 12

Working of Proposed System 13

Mechanism of Dynamic IP Module Update the IP Table of Engine dynamically : Step 1 : Starts crawling on initial seeds. Step 2 : From each initial seed it finds out the IP addresses of the service providers. Step 3 : From each provider it fetches the IP Addresses of UBRs in which they have published their other web services . Step 4 : Those fetched IP Addresses will be compared with initial seeds, if any new IP is identified it will be stored in its local IP Table ; rest will be overlooked in order to avoid redundancy. 14

Proposed Algorithm for Publishing Publish Manager Step 1: Start Step 2:Select UBRs IP Address where publishing is required. Step 3: Accept details of web services along with its location that acts as a predefined class from the service provider Step 4: Classify the web service based on its location which acts as a class. Step 5: Store the details of web service information into selected UBR in a particular class to which that service belongs. Step 6: Stop 15

Classification Scheme followed Table 1.1 List of Class along with number of published web services. Just an example scenario 16

Proposed Algorithm for Searching Step 1: Start Step 2: Enter keyword for which services to be searched (for ex. Traveli.e. choosing the super class.) Step 3: Select Location of service (Let it be denoted by a class and serve as centroid for selection of cluster). Step 4: Initialize IPTable for initial seeds of UBR. Do Step 4a. Call Dynamic IP Module. Step 4b: Call Cluster module to create cluster based on location attribute. Step 4c:If (Location is not chosen) Treat all classes in a single cluster. goto step: 4d Else If (Maximal distance <=min threshold) Put the location class in same cluster. Select the cluster in which centroid belongs. Step 4d: For each location class in selected cluster, fetch all services belonging to each of the class along with their frequency of usage data. Step 4e: Call the cluster module to create cluster based on service usage frequency. 17

Continued… Step 4f: Parse WSDL document against access point URL for each discovered web services i.e. validate web service. Step 4g:IfWeb service is Active Store it locally Else Fetch service Key against that access point URL from UBRs and pass it to delete module that store it locally for future use and delete the web service from respective UBR. Until all IP Seeds are visited from UBR crawl queue . Step 5:Add Service Reviews to each service of active service list which has been stored locally from virtual UBR on which engine resides. Step 6: Display the list of web service to the end user. Step 7: If User binds the service Ask the user to write a feedback of the used service. Accept details of user along with comment and rating to the service and store these details to extended service registry structure. Step 8:end 18

Agglomerative Algorithm for Complete-Link Clustering • It looks for cliques. • Find the maximal distance between any clusters so that two clusters are merged if the maximum distance is less than or equal to the distance threshold. • Euclidean distance Between points p and q can be calculated as 19

Adjacency Matrix for Maximal distance (Based on the location attribute)

Mechanism for service rating • Extended service registry design is proposed. The schema design of this template table is as follows. • The data regarding the frequency of invocation of services is also kept in the virtual root registry in the proposed architecture and is to be published by the service provider periodically. • Calculate average rating for a service considering an equal share of user reviews as well as frequency of usage of service. 21

How service usage frequency is used for clustering the services: An Example Here Ts1, Ts2, Ts3, Ts4,Ts5 are representing the travel services used by various users. Choosing threshold t = 6 Clusters formed are ( Ts1, Ts2, Ts3) ,Ts4, Ts5 User will be presented with first cluster since average invocation frequency is highest for this cluster 22

Test case and Results If user searches for Travel web services at a location like Annapurna Road 23

Continued… List of search result for Annapurna Road 24

Continued… If user not choose any location then list of search results are If User wants to execute the service he will have to click on service name 25

Continued… Here user can rate as well as review the service they used 26

The proposed Engine can… • Reduce population of passive web services from UBR using validation mechanism. • Crawl over an IP list which can grow dynamically. • Narrow down the search space of UBR. • Suggest the services to the user based on user feedbacks and service usage frequency • Provides web service publication facility to user. 27

Conclusion A Discovery cum Publishing Engine for searching web services has been proposed which uses service ranking techniques for efficient and effective web service discovery. We have used Data Mining Techniquesto narrow down the search space in UBRs. In addition ,an extended design of service registry has been proposed which stores service feedback and service usage frequency along with the service information, which has been used to rank services within a selected cluster. This work may further be generalized if instead of taking a services attribute ,we consider non functional parameters or semantics of services for applying the data mining techniques. 28

References • E. Al-Masri, and Q. H. Mahmoud, “WSCE: A crawler engine for large-scale discovery of web services”, In Proceedings of IEEE ICWS pp.1104-1111, 2007. • K.Sivashanmugam,K.Verma and A Seth.Discovery of web services in a federated environment. In proceedings of ISWC,pp270-278,2004. • Yan Li , Yao Liu, Liangjie Zhang, Ge Li, Bing Xie, Jiasu Sun , An Exploratory study of Web Services on the internet,. In ICWS 2007(IEEE). • E.Al-Masri and Q.H. Mahmoud, Crawling Multiple UDDI Business Registries, Proc. 16th Int’l World Wide Web Conf., ACM. • E.Al-Masri,Q.H.Mahmoud, Discovering Web Services in Search Engine, WWW 2007, May8-12, 2007,Banff,Alberta,Canada. • “Data Mining Introductory and Advanced Topics” by Margaret H. Dunham & S. Sridhar. 29

THANK YOU 30

Contents

Contents

Presentation Transcript

Contents

Contents

Contents

Contents

Contents

Contents

Contents

CONTENTS

Contents

Contents