Enabling Semantic Searching

Enabling Semantic Searching by Stefano Mazzocchi <stefano@apache.org>

What is the “Semantic Web”? The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. [Tim Berners-Lee, James Hendler, Ora Lassila]

Didn’t get it? Let’s try again • The web is the most successful publishing media of the history of mankind. • And still growing!! • The ‘semantic web’ dream is to make it possible to have machines that help us consuming that much information!

What do we need to build a semantic web? • Data identification and retrieval • Development of vocabularies • Model constraints • Assertion and proofs [Eric Prud’hommeaux]

All that? • Unfortunately, yes… • …but each time we reach one of these steps, the capabilities end up to be surprising!

One example for all: Google! • Google infers page importance from the global web hyperlink topology. • This is possible because the semantics of hyperlinks are well determined, thus understandable by machines. • The result of such a simple elaboration are astonishing.

Semantic Searching The act of looking for data with the help of information inferred from some well-defined meaning of the data itself.

Warning: Problems Ahead! • The Babel Problem • The Chicken-Egg Problem • The ROI Problem • The Screen-Scrape Problem • The Marginal Costs Problem

The Babel Problem (1) • XML makes it possible to create new markup languages to fit each little need. • In many situations, existing markups are complex and their learning curve is too steep… thus: • We see an explosion of markup languages

The Babel Problem (2) • It is not obvious that this trend will come to a saturation (especially with the advent of SOAP-based web services) • Automatic translation between markups is not always algorithmically possible.

The Chicken-Egg Problem • People won’t feel the need to publish information in more semantically meaningful languages, until there will be some use of them. • And no use will emerge until there will be enough of such semantic information to work on.

The ROI Problem • If writing ‘semantized’ information is more expensive than writing ‘non-semantized’ information… • … and the return on this extra costs don’t pay them off, it simply won’t happen!

The Screen-Scrape Problem • The great majority of web information is published using HTML, which has intrinsically poor semantic capabilities. • If the extraction of semantic information from HTML is done using ‘screen-scraping’ the costs will always exceed the benefits!

The Marginal Cost Problem • If the marginal cost of adding semantic information while authoring some text is linear with the text size, the whole semantic web might never economically scale! (especially together with the ROI problem)

Enabling semantic searching • We need a way to solve all the previous problems, or there will never be something better than Google.

Enter the solutions! • XML-based Web Publishing • Standardized semantic HTTP variants • Semantic-aware content editors

XML-based Web Publishing • XML-based web publishing systems make it ‘economically worth’ to create XML content. • This partially solves the chicken-egg and the ROI problems since such systems allow people to have immediate benefits (especially for those with cross-media publishing needs)

HTTP Variants! • HTTP/1.1 has the notion of ‘resource variants’. So it is possible to ask for a specific flavor of a given resource. • If ‘semantic variants’ were standardized, this might solve, together with XML-based publishing systems, the Screen-Scrape problem. • Apache Cocoon already implements such a concept with ‘resource views’.

Semantic-aware Content Editors • A simple and cost-effective solution for semantic-aware content editing is a conditio sine qua non for the production of semantically-meaningful content.

Conclusions (1) • Searching is the first scenario of use of semantic web technologies since it doesn’t require all the infrastructure to be present. • Still, many problems must be faced, especially those socio-economically-related ones that academia is currently ignoring.

Conclusions (2) • Without an incremental and economically feasible plan of adoption, the semantic web is unlikely to happen. • The proposed plan of adoption that uses XML publishing on the server side along with standardized semantic HTTP variants

Conclusions (3) • Still, the biggest problem to face is semantically-aware content editing and the solution of the Babel problem without requiring the creation of huge ontologies that will very unlikely be manageable for the entire web.

ToDo (1) • Agree on a way to publish the different resource variants! • Agree on markups/metadata or, at least, provide mechanical ways to translate one into another. • Enforce the use of namespaced XML (despite the lack of validation support in DTD and lack of coherence between the infoset and the syntax)

ToDo (2) • Think about semantic-aware editing (which is not only XML-aware, but also RDF-aware!) • Research into less expressive (than RDF) but more practical and cost-effective solutions to encode semantic information into the schemas instead of their content (semantic-sheets?, semantic relevance ratings?)

Thanks!Any questions?

Enabling Semantic Searching

Enabling Semantic Searching

Presentation Transcript

A Hybrid Approach for Searching in the Semantic Web

Searching and Ranking Documents based on Semantic Relationships

 -Queries: Enabling Querying for Semantic Associations on the Semantic Web

Semantic Internet Searching Using Active Structure

Searching

VIVO: A Semantic Web Network Enabling Collaboration Among Scientists

Searching

Searching, Navigating, and Querying the Semantic Web with SWSE

Enabling Objectives

Searching The Semantic Web

The Semantic Web: It’s not just for searching anymore!

The LUISA framework for enabling semantic search of learning resources

Searching/Sorting/Searching

SmartNews : Semantic Searching of News Video

Searching for Knowledge and Data on the Semantic Web

Searching

Searching/Sorting/Searching

Searching

A New Web Semantic Annotator Enabling A Machine Understandable Web