1 / 29

Helping people find content … preparing content to be found

Helping people find content … preparing content to be found. Enabling the Semantic Web Joseph Busch. Outline. Why Semantics Matter What is the Semantic Web Semantic Content Management. Why Semantics Matter. When you own a Rembrandt you can spell his name any way you want.

amiel
Download Presentation

Helping people find content … preparing content to be found

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Helping people find content … preparing content to be found Enabling the Semantic Web Joseph Busch

  2. Outline • Why Semantics Matter • What is the Semantic Web • Semantic Content Management

  3. Why Semantics Matter

  4. When you own a Rembrandt you can spell his name any way you want.

  5. But when you want to find a Rembrandt … you better spell his name correctly.

  6. Vocabulary resources can help find the right artist even if their name is typed incorrectly.

  7. Users cannot type in the complex queries needed to find all the relevant items... But this can be done automatically.

  8. Complex queries are even more important when you search the entire web.

  9. So you find Rembrandt the Dutch guy...

  10. … And not Rembrandt the toothpaste.

  11. 19% 21% 40% 20% Search Failure • 19% Character errors. (Young, et al) • 40%Vocabulary errors. (Seaman) • 20% Index confusion. • 21% Successful (Nielsen)

  12. Search Solution • Generate more consistent content to search on. • Correct user errors. • Map the language of users to the language of the target content.

  13. Search Alternatives

  14. Solution for Search Alternatives • Predictable standardized structures, and • Consistent semantics to work on … so machines can understand it.

  15. What is the Semantic Web

  16. Berners-Lee’s Semantic Web • Formatting content so that machines can understand it. • Use XML/RDF: • Infinitely flexible markup language. • Process content in many more ways than simply for viewing it. • Problem: Mostly syntax … not semantics (in the human sense of meaning, i.e., language)

  17. XML is a Grail-like Object • XML is just a means for encoding information—an envelope standard. The real value is still in the information that you put in the envelope. • Filling XML placeholders such as <meta>, <subject>, and <maker> requires semantic information management.

  18. Soergel’s SemWeb Proposal • System of integrated access to data on concepts and terminology. • Bring together variety of sources that exist largely in separate worlds, including dictionaries, thesauri, classification schemes, etc. • Federated system with multiple collaborators. • Common interface to all concept & terminology knowledge bases on the Internet.

  19. The Real Semantic Web • Namespace for uniquely identifying a semantic scheme & each concept within each scheme. • Broad template or conceptual schema for holding all types of semantic information & specifying relationships among them. • Definitions of services for interacting with the System.

  20. Vocabulary Markup Language (VocML) • XML schema for the Semantic Web. • Broad template for structured representation of semantic schemes. • Dublin Core metadata. • Tags and syntax for uniquely identifying each concept. • Typed relationships (hierarchical, associative, etc.) • Typed notes. Networked Knowledge Organization Systems nkos.slis.kent.edu

  21. Dublin Core Unique ID Typed Relationships <?xml version="1.0"?> <!DOCTYPE VocML SYSTEM "VocML.dtd“> <VocML version=”1.1“> <SrcVocab> <SVHeader> <dc:Title>DFSIC-1998</dc:Title> <dc:Source>Standard Industrial Classification (1987)</dc:Source> <dc:Creator>Interwoven</dc:Creator> <dc:Contributor>U.S. Department of Commerce</dc:Contributor> … <workNum UIDprefix=”DFSIC-1998”DisplayTitle=”Standard Industrial Classification”BriefDisplay=”SIC”> </SVHeader> <SVTermUID=”DFSIC-1998::0139”CCID”104:43”> <label>Field Crops, except Cash Grains, not elsewhere classified</label> <definition>Establishments primarily engaged in the production of field crops, except cash grains, not elsewhere classified. This industry also includes establishments deriving 50 percent or more of their total value of sales of agricultural products from field crops, except cash grains (Industry Group 013), but less than 50 percent from products of any single industry.</definition> <cla>0139</cla> <typedRelation UREF=”DFSIC-1998::013” UTYPE=”Z39.19-1980::2" Name=”BT”> <typedRelation UREF=”DFSIC-1998::013900” UTYPE=”Z39.19-1980::3" Name=”NT”> …

  22. Implementing the Semantic Web

  23. The Holy Grail is ... • Accurate information automatically processed so that it can easily be found and used for applications. • A rich web of linked information, with markup allowing machines to route relevant information to the audiences that value it most.

  24. Metatagging • The hard work is mining content to extract key information: • Recognize the mentions of people, organizations, places, and things. • Infer the subject matter. • And putting it into formats with standard labels for effective exploitation.

  25. User Queries • database search • text search Exploit It Relevant Information • found items • granular text Raw Content • unstructured text • untagged data Tag It Vocabularies • Structured Content • • metadata • XML/RDF Semantic Content Management

  26. Exploiting the Semantic Web • Route content to audience segments that value it most. • Link mentions of people, organizations, places, and things to other information related to those entities. • Populate portal directories. • Precisely search heterogeneous content items.

  27. Predictions

  28. Predictions • VocabularyML. • Semantic standard for unique identifiers (a namespace) for people, organizations, places, and things and the relationships among them. • See: nkos.slis.kent.edu • Technologies that enable the persistent naming of the information inside XML envelopes. • Generation of enormous value through interoperability among web applications.

  29. Joseph A. BuschContent Intelligence Evangelist ASIST President, 2001 • 415-778-3129 • fax 415-778-3131 • jbusch@interwoven.com • Moving business to the Web • www.interwoven.com

More Related