1 / 25

Enabling Semantic Searching

Enabling Semantic Searching. by Stefano Mazzocchi <stefano@apache.org>. What is the “Semantic Web”?. The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.

Anita
Download Presentation

Enabling Semantic Searching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enabling Semantic Searching by Stefano Mazzocchi <stefano@apache.org>

  2. What is the “Semantic Web”? The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. [Tim Berners-Lee, James Hendler, Ora Lassila]

  3. Didn’t get it? Let’s try again • The web is the most successful publishing media of the history of mankind. • And still growing!! • The ‘semantic web’ dream is to make it possible to have machines that help us consuming that much information!

  4. What do we need to build a semantic web? • Data identification and retrieval • Development of vocabularies • Model constraints • Assertion and proofs [Eric Prud’hommeaux]

  5. All that? • Unfortunately, yes… • …but each time we reach one of these steps, the capabilities end up to be surprising!

  6. One example for all: Google! • Google infers page importance from the global web hyperlink topology. • This is possible because the semantics of hyperlinks are well determined, thus understandable by machines. • The result of such a simple elaboration are astonishing.

  7. Semantic Searching The act of looking for data with the help of information inferred from some well-defined meaning of the data itself.

  8. Warning: Problems Ahead! • The Babel Problem • The Chicken-Egg Problem • The ROI Problem • The Screen-Scrape Problem • The Marginal Costs Problem

  9. The Babel Problem (1) • XML makes it possible to create new markup languages to fit each little need. • In many situations, existing markups are complex and their learning curve is too steep… thus: • We see an explosion of markup languages

  10. The Babel Problem (2) • It is not obvious that this trend will come to a saturation (especially with the advent of SOAP-based web services) • Automatic translation between markups is not always algorithmically possible.

  11. The Chicken-Egg Problem • People won’t feel the need to publish information in more semantically meaningful languages, until there will be some use of them. • And no use will emerge until there will be enough of such semantic information to work on.

  12. The ROI Problem • If writing ‘semantized’ information is more expensive than writing ‘non-semantized’ information… • … and the return on this extra costs don’t pay them off, it simply won’t happen!

  13. The Screen-Scrape Problem • The great majority of web information is published using HTML, which has intrinsically poor semantic capabilities. • If the extraction of semantic information from HTML is done using ‘screen-scraping’ the costs will always exceed the benefits!

  14. The Marginal Cost Problem • If the marginal cost of adding semantic information while authoring some text is linear with the text size, the whole semantic web might never economically scale! (especially together with the ROI problem)

  15. Enabling semantic searching • We need a way to solve all the previous problems, or there will never be something better than Google.

  16. Enter the solutions! • XML-based Web Publishing • Standardized semantic HTTP variants • Semantic-aware content editors

  17. XML-based Web Publishing • XML-based web publishing systems make it ‘economically worth’ to create XML content. • This partially solves the chicken-egg and the ROI problems since such systems allow people to have immediate benefits (especially for those with cross-media publishing needs)

  18. HTTP Variants! • HTTP/1.1 has the notion of ‘resource variants’. So it is possible to ask for a specific flavor of a given resource. • If ‘semantic variants’ were standardized, this might solve, together with XML-based publishing systems, the Screen-Scrape problem. • Apache Cocoon already implements such a concept with ‘resource views’.

  19. Semantic-aware Content Editors • A simple and cost-effective solution for semantic-aware content editing is a conditio sine qua non for the production of semantically-meaningful content.

  20. Conclusions (1) • Searching is the first scenario of use of semantic web technologies since it doesn’t require all the infrastructure to be present. • Still, many problems must be faced, especially those socio-economically-related ones that academia is currently ignoring.

  21. Conclusions (2) • Without an incremental and economically feasible plan of adoption, the semantic web is unlikely to happen. • The proposed plan of adoption that uses XML publishing on the server side along with standardized semantic HTTP variants

  22. Conclusions (3) • Still, the biggest problem to face is semantically-aware content editing and the solution of the Babel problem without requiring the creation of huge ontologies that will very unlikely be manageable for the entire web.

  23. ToDo (1) • Agree on a way to publish the different resource variants! • Agree on markups/metadata or, at least, provide mechanical ways to translate one into another. • Enforce the use of namespaced XML (despite the lack of validation support in DTD and lack of coherence between the infoset and the syntax)

  24. ToDo (2) • Think about semantic-aware editing (which is not only XML-aware, but also RDF-aware!) • Research into less expressive (than RDF) but more practical and cost-effective solutions to encode semantic information into the schemas instead of their content (semantic-sheets?, semantic relevance ratings?)

  25. Thanks!Any questions?

More Related