1 / 19

Swoogle: A Semantic Web Search and Metadata Engine

Swoogle: A Semantic Web Search and Metadata Engine. Li Ding, Tim Finin, Anupam Joshi, Rong Pan, R. Scott Cost, Yun Peng Pavan Reddivari, Vishal Doshi, Joel Sachs Department of Computer Science and Electronic Engineering University of Maryland Baltimore County CIKM ‘04 ------- Dongmin Shin

ethan
Download Presentation

Swoogle: A Semantic Web Search and Metadata Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Swoogle: A Semantic Web Search and Metadata Engine Li Ding, Tim Finin, Anupam Joshi, Rong Pan, R. Scott Cost, Yun Peng Pavan Reddivari, Vishal Doshi, Joel Sachs Department of Computer Science and Electronic Engineering University of Maryland Baltimore County CIKM ‘04 ------- Dongmin Shin IDS Lab 2008.10.22

  2. Index • Introduction • Semantic Web Documents • Swoogle Architecture • Finding SWDs • SWD Metadata • Ranking SWDs • Indexing and Retrieval of SWDs • Conclusions • Evaluation and Discussion Center for E-Business Technology

  3. Introduction • Semantic Web documents(SWDs) are characterized by semantic annotation and meaningful references to other SWDs • Conventional search engines do not take advantage of these features • A search engine customized for SWDs is needed • Swoogle is a crawler-based indexing and retrieval system for the Semantic Web Center for E-Business Technology

  4. Introduction • Three Activities of Swoogle • Finding appropriate ontologies • Allows users to query for ontologies that contain specified terms anywhere in the document • The ontologies returned are ranked • Finding instance data • Enables querying SWDs with constraints on what classes and properties being used/defined by them • Characterizing the Semantic Web • Be collecting metadata about the Semantic Web, Swoogle reveals interesting structural properties • Swoogle automatically discovers SWDs, indexes their metadata and answers queries about it Center for E-Business Technology

  5. Semantic Web Documents • SWD • A document in a semantic web language that is online and accessible to web users and software agents • Two kinds of documents of SWD • SWOs (Semantic Web Ontologies) • Correspond to T-Boxes • Significant proportion of the statements it makes define new terms or extend the definitions of terms defined in other SWDs • SWDBs (Semantic Web Databases) • Correspond to A-Boxes • It does not define or extend a significant number of terms • It can introduce individuals and make assertions about them or make assertions about individuals defined in other SWDs Center for E-Business Technology

  6. Swoogle Architecture • SWD discovery • Discovers potential SWDs throughout the Web • Metadata creation • Caches a snapshot of a SWD and generates objective metadata about SWDs • Data analysis • Uses the cached SWDs and the created metadata to derive analytical reports • Interface • Providing data services to the Semantic Web community Center for E-Business Technology

  7. Finding SWDs • Google Crawler • Using Google Web Service • Start with type extensions • Append some constraints(keywords) to construct more specific queries, and then combine their results • Focused Crawler • Crawls documents within a given website • Extension constraint • e.g. not “.jpg” or “.html” • Focus constraint • only crawl URLs relative to the given base URL Center for E-Business Technology

  8. Finding SWDs • Web interface • Registered users can submit a URL of either a SWD or a web directory • JENA2 based Swoogle Crawler • Analyzes the content of a SWD and discovers new SWDs • E.g. Use URIref, owl:imports, rdfs:seeAlso, foaf:Person Center for E-Business Technology

  9. SWD Metadata – Basic Metadata • Language feature • Properties describing the syntactic or semantic features of a SWD • Encoding : syntactic encoding of a SWD : RDF/XML, N-TRIPLE, N3 • Language : Semantic Web language used by a SWD : OWL, DAML, RDFS, RDF • OWL Species : language species of a SWD written in OWL : OWL-LITE, OWL-DL, OWL-FULL • RDF statistics • Properties summarizing node distribution of the RDF graph • Focus on how SWDs define new classes, properties and individuals • SWDB & SWO by ontology-ratioR(foo) Center for E-Business Technology

  10. SWD Metadata – Basic Metadata • Ontology annotation • Properties that describe a SWD as an ontology • label. i.e. rdfs:label • comment. i.e. rdfs:comment • versionInfo. i.e. owl:versionInfo and daml:versionInfo Center for E-Business Technology

  11. SWD Metadata – Relations among SWDs • TM/IN • Term reference relations between two SWDs • i.e. a SWD is using terms defined by some other SWDs • IM • An ontology imports another ontology • EX • An ontology extends another • i.e. ontology A defines class AC which has the “rdfs:subClassOf” relation with class BC defined in ontology B • PV • An ontology is a prior version of another • CPV • An ontology is a prior version of and is compatible with another • IPV • An ontology is a prior version of but is incompatible with another Center for E-Business Technology

  12. Ranking SWDs • Random surfing model(PageRank) • not appropriate for the Semantic Web • Semantics of links lead to a non-uniform probability of following a particular outgoing link • Rational random surfing model • Inter-SWD links into four categories • imports(A,B), uses-term(A,B), extends(A,B), asserts(A,B) • The more terms in B referenced by A, the more likely a surfer will follow the link from A to B Center for E-Business Technology

  13. Ranking SWDs • Google • Swoogle 0.3 0.2 B B 0.4 0.1 0.4 A C A C 0.6 0.5 D D 0.7 0.1 PR(A) = (1-d) + d( 1/4 + 1/2 + 1/3) rawPR(A) = (1-d) + d( 0.4/(0.4+0.3+0.2+0.4) + 0.6/(0.6+0.1) +0.5/(0.5+0.1+0.7)) Center for E-Business Technology

  14. Ranking SWDs Center for E-Business Technology

  15. Indexing and Retrieval of SWDs • Using traditional IR techniques • Reasoning over large collections of documents can be expensive • IR techniques have the advantage of being faster, while taking a somewhat more coarse view of the text • Including well researched method for ranking matches, computing similarity between documents • Using N-grams • Can result in a larger vocabulary • Inter-word relationships are preserved • Somewhat resistant to certain kinds of errors Center for E-Business Technology

  16. Conclusions • Current web search engines • Do not work well with SWDs, as they are designed to work with natural languages and expect documents to contain unstructured text composed of words • Swoogle • A prototype crawler-based indexing and retrieval system for Semantic Web documents Center for E-Business Technology

  17. Evaluation and Discussion • Pros • Clear contribution on the method: • How to discover potential SWDs • How to rank SWDs • Cons • Poor explanation about ranking algorithm • The reason they differentiated between SWOs and SWDBs • How the ranking formula(which are different depend on type of SWD) comes out • Discussion • How can Semantic Web retrieval system process conflict between SWDs • By ranking? Or by TF-IDF? Or else method? Center for E-Business Technology

  18. Current Status (@ 2005) • Referenced from • Li Ding et al., "Finding and Ranking Knowledge on the Semantic Web", Proceedings of the 4th International Semantic Web Conference, November 2005. • Tim Finin et al., "Swoogle: Searching for knowledge on the Semantic Web", AAAI 05 (intelligent systems demo), July 2005 • System architecture • Metadata creation -> Digest • Computes metadata for SWDs and semantic web terms(SWTs) as well as identifies relations among them Center for E-Business Technology

  19. Current Status (@ 2005) • Size • SWDs : 135K -> 368K SWDs • SWOs : 13.29% of SWDs -> 1% of SWDs • Ranking SWDs and SWTs Center for E-Business Technology

More Related