1 / 24

Swoogle

Swoogle. Semantic Search Engine Web-enhanced Information Management Bin Wang. Outline. Background Introduction Semantic Web Semantic Search Swoogle – Semantic Search Engine Swoogle Architecture Semantic Web documents Finding SWDs Ranking SWDs

louvain
Download Presentation

Swoogle

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang

  2. Outline • Background Introduction • Semantic Web • Semantic Search • Swoogle – Semantic Search Engine • Swoogle Architecture • Semantic Web documents • Finding SWDs • Ranking SWDs • Swoogle Indexing and Retrieval • Conclusion

  3. Background Introduction • What is Semantic Web? • An evolving development of WWW. • The semantics of information and services in the web is well-defined. • It makes it possible for web to understand and satisfy the requests of people and machines to use the web content.

  4. Background Introduction • What is Semantic Web? The Semantic Web Layers

  5. Background Introduction • What is Semantic Search? • A set of techniques on the management of documents, especially semantically supported document retrieval. • Two forms of Search: Navigational Search, Research Search; Semantic Search belongs to the second category. • It attempts to augment and improve traditional search results by using data from Semantic Web.

  6. Swoogle – Semantic Search Engine • Swoogle – A crawler-based indexing and retrieval system for semantic web – RDF and OWL documents encoded in XML and N3 • It automatically discovers SWDs, indexes the metadata and answers queries about it. • SWDs are characterized by semantic annotation and meaningful references to other SWDs; conventional search engines do not take advantage of these features.

  7. Swoogle Search Interface Developed by UMD

  8. Activities that Swoogle can do • Finding appropriate ontologies It allows users to query for ontologies that contain specified terms anywhere in the document. The ontologies returned are ranked by Ontology Rank algorithm. • Finding instance data It helps users to integrate data distributed on the web. • Characterizing the Semantic Web It reveals interesting structural properties about the semantic web by extracting metadata and especially inter-document relations.

  9. Swoogle Architecture • Four main components: SWD discovery, metadata creation, data analysis and interface

  10. SwoogleArchiteture • SWD discovery component: • discovers potential SWDs throughout the web • keeps up-to-date information about SWDs. • Metadata creation component: • generates objective metadata about SWDs at both the syntax level and the semantic level. • Data analysis component: • derives analytical reports, such as classification of SWOs and SWDBs, rank of SWDs and IR index of SWDs • Interface component:

  11. Semantic Web Documents(SWDs) • A SWD is a document in a semantic web language(based on RDF, e.g. RDFS, DAML+OIL, and OWL) that is online and accessible to web users and software agents. • There are two kinds of documents in SWDs: • SWOs (Semantic Web Ontology) • SWDBs (Semantic Web Databases)

  12. Semantic Web Documents(SWDs) • SWOs(Semantic Web Ontology) A SWD with a significant proportion of the statements it makes define new terms or extend the definitions of terms defined in other SWDs. • SWDBs(Semantic Web Databases) A SWD without defining or extending a significant number of terms.

  13. Finding SWDs • Develop a Google Crawler to search URLs using the Google Web Service. • starts with type extensions(e.g. .rdf, .owl, .daml, and .n3, good SWD indicators ) • Develop a Focused Crawler to crawl documents within a given website. • only crawls URLs relative to the given base URL • invites SW community to submit the URLs

  14. Finding SWDs • Develop the JENA2 based Swoogle Crawler. • It verifies if a document is a SWD or not • It revisits discovered URLs to check updates • Some heuristics are used to discover new SWDs through semantic relations. --A URIref is highly likely to be the URL of a SWD. --OWL: imports links to an external ontology, which is a SWD. --etc. .

  15. SWD Metadata • It is collected to make SWD search more efficient and effective. • It is derived from the content of SWDs as well as the relations among SWDs. • Swoogle identifies three categories of metadata: • Basic metadata; • Relations; • Analytical results such as SWO/SWDB classification, and SWD ranking.

  16. SWD Metadata • 1. Basic metadata It considers the syntactic and semantic features of a SWD. • Language feature It refers to the properties describing the syntactic or semantic features of a SWD. • RDF statistics It refers to the properties summarizing node distribution of the RDF graph of a SWD. • Ontology annotation It refers to the properties that describe a SWD as an ontology.

  17. SWD Metadata • 2.Relations among SWDs Swoogle focuses on SWD level relations which generalize RDF node level relations. The following relations are captured: • TM/IN: captures term reference relations between two SWDs; • IM: shows that an ontology imports another ontology; • EX: shows that an ontology extends another ontology; • PV: shows that an ontology is a prior version of another; • CPV: shows that an ontology is a prior version of and is compatible with another; • IPV: shows that an ontology is a prior version of and is incompatible with another.

  18. Ranking SWDs • Rational Random Surfer • A user will arrive at a given page ->by directly addressing it ->by following one of the links pointing to it; • Different links may stand for different relations, thus have different weights. Explore all linked SWOs Jump to arandom page SWO? yes no bored? yes no Follow arandom link

  19. Ranking SWDs • Rational Random Surfer - raw rank T(x): the set of SWDs that x links to; L(a): the set of SWDs that links to a; d: a damping factor, typically set to 0.85.

  20. Ranking SWDs • Rational Random Surfer – final rank TC(A) is the transitive closure of SWOs imported by a. • Swoogle computes the rank for SWDBs using the first one, and computes the rank for SWOs using the sec one.

  21. Swoogle Indexing and Retrival • Swoogle adapts the Sire, a custom indexing and retrieval engine: • It employs a TF/IDF model with a standard cosine similarity metric. • It indexes discovered documents by using either character N-Gram or URIrefs as keywords to find relevant documents and to compute the similarity among a set of documents.

  22. Conclusion • Introduces a prototype crawler-based indexing and retrieval system for Semantic Web documents. • One of the interesting properties computed for each SWD is its ontology rank. Here it uses the rational surfing model, different from what is used in conventional search engine.

  23. References • Li Ding , Tim Finin , Anupam Joshi , Rong Pan , R. Scott Cost , YunPeng , PavanReddivari , VishalDoshi , Joel Sachs, Swoogle: a search and metadata engine for the semantic web, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA • R. Guha , Rob McCool , Eric Miller, Semantic search, Proceedings of the 12th international conference on World Wide Web, May 20-24, 2003, Budapest, Hungary • Berners-Lee, Tim; James Hendler and OraLassila (May 17, 2001). "The Semantic Web". Scientific American Magazine.

  24. Thank You!

More Related