350 likes | 462 Views
Finding and Ranking Knowledge on the Semantic Web. Li Ding, Rong Pan, Tim Finin, Anupam Joshi, Yun Peng and Pranam Kolari University of Maryland, Baltimore County. http://creativecommons.org/licenses/by-nc-sa/2.0/
E N D
Finding and RankingKnowledge on theSemantic Web Li Ding, Rong Pan, Tim Finin, Anupam Joshi, Yun Peng and Pranam Kolari University of Maryland, Baltimore County http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by DARPA contract F30602-97-1-0215, NSFgrants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.
This talk • Motivation • Swoogle overview • Bots navigate the Semantic Web • Ranking Semantic Web content • Use cases and applications • Conclusions
tell register But what about our agents? A Google for knowledge on the SemanticWeb is needed by people and software agents
This talk • Motivation • Swoogle overview • Bots navigate the Semantic Web • Ranking Semantic Web content • Use cases and applications • Conclusions
title • text
Swoogle Architecture data analysis interface IR analyzer SWD analyzer Web Server Web Service SWD Cache SWD Metadata metadata creation Agent Service SWD Reader SWD discovery Candidate URLs The Web Web Crawler Swoogle 2: 340K SWDs, 48M triples, 5K SWOs, 97K classes, 55K properties, 7M individuals (4/05) Swoogle 3: 700K SWDs, 135M triples, 7.7K SWOs, (11/05)
Find “Time” Ontology Demo1 We can use a set of keywords to search ontology. For example, “time, before, after” are basic concepts for a “Time” ontology.
Demo2(a) Digest “Time” Ontology (document view)
Demo2(b) Digest “Time” Ontology (term view) TimeZone before …………. intAfter
Demo3 Find Term “Person” Not capitalized! URIref is case sensitive!
Demo4 Digest Term “Person” 167 different properties 562 different properties
Demo5(a) Swoogle Today
Demo5(b) Swoogle Statistics FOAF Trustix W3C Stanford
Swoogle’s Triple Store lets you shop And check out your triples into any of several reasoners
Summary 2004 • Automated SWD discovery • SWD metadata creation and search • Ontology rank (rational surfer model) • Swoogle watch • Web Interface Swoogle (Mar, 2004) • Ontology dictionary • Swoogle statistics • Web service interface (WSDL) • Bag of URIref IR search • Triple shopping cart Swoogle2 (Sep, 2004) • Better (re-)crawling strategies • Better navigation models • Index instance data • More metadata (ontology mapping and OWL-S services) • Better web service interfaces • IR component for string literals 2005 Swoogle3 (July 2005)
This talk • Motivation • Swoogle overview • Bots navigate the Semantic Web • Ranking Semantic Web content • Use cases and applications • Conclusions
Literal Resource The Semantic Web Onion Universal RDF Graph The “Semantic Web” (About 10M documents) Physically hosting knowledge (About 100 triples per SWD in average) RDF Document Class-instance triples modifying the same subject Molecule Finest lossless set of triples Triple Atomic knowledge block Swoogle maintains metadata about objects in different layers of the Semantic Web Onion.
Semantic Web Navigation Model sameNamespace, sameLocalname Extends class-property bond Term Search 1 RDF graph Resource SWT literal uses populates 2 5 4 3 defines isUsedBy isPopulatedBy officialOnto isDefinedBy Web rdfs:subClassOf SWD SWO 6 7 rdfs:seeAlso rdfs:isDefinedBy owl:imports … Document Search Navigating the HTML web is simple; there’s just one kind of link. The SW has more kinds of links and hence more navigation paths.
Semantic Web Navigation Model sameNamespace, sameLocalname Extends class-property bond Term Search 1 RDF graph Resource SWT literal uses populates 2 5 4 3 defines isUsedBy isPopulatedBy officialOnto isDefinedBy Web rdfs:subClassOf SWD SWO 6 7 rdfs:seeAlso rdfs:isDefinedBy owl:imports … Document Search Relations in 1 and 3 and parts of 4 require a global view to discover
This talk • Motivation • Swoogle overview • Bots navigate the Semantic Web • Ranking Semantic Web content • Use cases and applications • Conclusions
Rank has its privilege • Google introduced a new approach to ranking query results using a simple “popularity” metric. • It was a big improvement! • Swoogle ranks its query results also • When searching for an ontology, class or property, wouldn’t one want to see the most used ones first? • Ranking SW content requires different algorithms for different kinds of SW objects • For SWDs, SWTs, individuals, “assertions”, molecules, etc…
Jump to arandom page bored? Follow arandom link Google’s PageRank • A page’s rank is a function ofhow many links point to it and the rank of the pages hosting those links. • The “random surfer” model provides the intuition: • Jump to a random page • Select and follow a random link on the page and repeat until ‘bored’ • If bored, go to (1) • Ranked pages by the relative frequency with which they are visited. yes no
Ranking Semantic Web Documents • Target: a pure SW dataset • Nodes: a collection of online SWDs (330K SWDs, 1.5% are labeled as ontologies) • Links: in addition to hyperlinks, term level relations are generalized into TM, EX, IM. • Rational surfer model (extension of weighted PageRank) • Semantic content (term level relations) encoded into links • rank of node iteratively spread via links • weight/capacity of link vary according to link semantics • propagate weight to imported ontologies • Evaluation • Method: Compare OntoRank with PageRank for promoting ontologies even using the same Pure SW Dataset
An Example http://www.w3.org/2000/01/rdf-schema wPR =300 OntoRank =403 TM http://xmlns.com/wordnet/1.6/ TM wPR =3 OntoRank =103 EX http://xmlns.com/foaf/1.0/ TM wPR =100 OntoRank =100 http://www.cs.umbc.edu/~finin/foaf.rdf wPR =0.2 OntoRank =0.2
Ontology Dictionary • Motivation • One ontology does not always provide all needed vocabulary • There could be many scenario that requires assembling terms from multiple ontologies • DIY ontology engineering • Search an appropriate class C • Search for popular properties used for modifying C’s class instance • Go back to step 1 if more classes are needed
Ranking Semantic Web Terms • Pr(Term|Doc) can be measured by the normalized value of the product of the term’s • Popularity: how many SWDs is using the term. • Frequency: how many times the term is used in the SWD • SWDs are accessed non-uniformly by OntoRank • TermRank estimates a term’s importance as ∑ Pr(Term|Doc) * OntoRank(Doc) • Evaluation • Compare TermRank with Term’s popularity for the top 10 highest rated terms and compose analytical evaluation.
Class-Property Bonds • Class-Property Bond • (introduced by ontology) • foaf:mbox • foaf:name SWD1 foaf:mbox • Class Definition • rdfs:subClassOf -- foaf:Agent • rdfs:label – “Person” • Class-Property Bond • (introduced by instances) • foaf:name • dc:title foaf:name rdfs:domain rdfs:domain SWD3 SWD2 rdf:type owl:Class rdf:type foaf:Person foaf:name rdfs:subClassOf “Tim Finin” foaf:Agent dc:title rdfs:comment “Tim’s FOAF File” “a human being”
This talk • Motivation • Swoogle overview • Bots navigate the Semantic Web • Ranking Semantic Web content • Use cases and applications • Conclusions
Applications and use cases • Supporting Semantic Web developers, e.g., • Ontology designers • Vocabulary discovery • Who’s using my ontologies or data? • Etc. • Searching specialized collections, e.g., • Proofs in Inference Web • Text Meaning Representations of news stories in SemNews • Supporting SW tools, e.g., • Discovering mappings between ontologies
This talk • Motivation • Swoogle overview • Bots navigate the Semantic Web • Ranking Semantic Web content • Use cases and applications • Conclusions
Will it Scale? How? Here’s a rough estimate of the data in RDF documents on the semantic web based on Swoogle’s crawling We think Swoogle’s centralized approach can be made to work for the next few years if not longer.
How much reasoning? • SwoogleN (N<=3) does limited reasoning • It’s expensive • It’s not clear how much should be done • More reasoning would benefit many use cases • e.g., type hierarchy • Recognizing specialized metadata • E.g., that ontology A some maps terms from B to C
Conclusion • The web will contain the world’s knowledge in forms accessible to people and computers • We need better ways to discover, index, search and reason over SW knowledge • SW search engines address different tasks than html search engines • So they require different techniques and APIs • Swoogle like systems can help create consensus ontologies and foster best practices
For more information http://ebiquity.umbc.edu/ Annotatedin OWL