1 / 95

Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University

Semantic Annotation and Hyperlinking for Associative Digital Memories Vision, Methods, Applications. Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University. The Vision of the Semantic Web.

atwoods
Download Presentation

Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Annotation and HyperlinkingforAssociative Digital Memories Vision, Methods, Applications • Hans Uszkoreit • German Research Center for Artificial Intelligence • and Saarland University

  2. The Vision of the Semantic Web • A new development in web technology is aimed at structuring some of the rich knowledge contained in unstructured data. • The envisaged result will be a growing layer of formalized knowledge above and associated with the wealth of unstructured data. • A multitude of ontologies will provide the conceptual texture for annotating rich unstructured content. • The result will be a semantically structured densely associated web of knowledge.

  3. Structure of the Semantic Web • The well-known layer cake of the Semantic Web proposedby Tim Berners-Lee employs... • XML for markup, • relational ontologies as the basis for describinginformation resources, • RDF coded in XMLas the language for suchsemantic descriptions, • a logic language such asOWL coded in RDFas the format for further logical descriptions suchas rules and constraints.

  4. Semantic Web and Language Technology I will first point out five central issues for language technology resulting from the visison of the Semantic Web. I will then briefly argue that there are no feasible models for realizing the semantic web through creation or evolution. Next, I will argue that there is a less ambitious stage of a semantically enriched web, that can be realized gradually. This vision is built on the notion of associative digital memories lying in between digital repositories and digital knowledge. Then I will describe the language and web technologies needed for realizing such digital memories.

  5. Semantic Web and Language Technology 1 • The employment of language technology for the construction of useful ontologies: • One of the shortcomings of hand-crafted AI ontologies has been their artificial nature. Useful ontologies do rarely meet the high aesthetic standards of philosophers or domain-specialized theoreticians. Can data-oriented language technology facilitate the detection of useful ontologies that reflect the needs and daily tasks of their users?

  6. Semantic Web and Language Technology 2 • The exploitation of Semantic Web ontologies for LT applications such as information extraction: • Domain modelling is a serious bottleneck for many language technology applications. Can the Semantic Web movement help us by providing well-designed ontologies for a multitude of knowledge domains?

  7. Semantic Web and Language Technology 3 The challenge of (partially) automating the detection and annotation of concepts: • One of the major shortcomings of the original Semantic Web vision is its reliance on extensive hand annotation of large volumes of digital resources. As we know from daily experience, content developers (authors) do not even exploit the modest means for encoding meta-information that is provided by HTML. They do not have the time and patience to find and insert the most useful hyperlinks. How can one expect that the web will become semantified by human annotation?

  8. Semantic Web and Language Technology 4 The utilization of the Semantic Web as a resource for machine learning in NLP: • Supervised learning from hand-annotated texts plays a major role in language technology research and development. Will the Semantic Web movement create large volumes of annotated texts? Can these texts be used for machine learning techniques that improve topic detection, information extraction, question answering and other language technologies? Can systems for automatic annotation be trained in a bootstrapping fashion?

  9. Is the vision realistic? Authors make little use of the available means of annotation/markup such as • hyperlinks • metainformation The enrichment of the available volumes of digital information is a huge task.

  10. Semantic Web and Language Technology 5 The relationship between the Semantic Web and multilinguality: • The planned dense semantic markup will facilitate cross-lingual navigation and information retrieval. Will the semantic web really contribute to overcoming language barriers by making information better accessible across languages? Will contents in all languages be annotated and crosslinked at the same time and in comparable proportions? What is the role of language technology in this process? Will the Semantic Web help to reduce the knowledge gap among or will this gap be widened?

  11. outline • the concept of digital memory • automatic semantic hyperlinking • personal digital memories • collective digital memories • conclusions

  12. more than metaphors? • digital libraries • digital archives • digital knowledge • digital memories

  13. (associative) memory • stored information • associatively interconnected • immediately accessible by association • grounded in experience • Special form of memory: episodic memory

  14. knowledge • stored information • strongly semantically interconnected • immediately accessible • suited for inferencing • grounded in more basic knowledge and perception

  15. associations • neighborhood in a high-dimensional space • accessibility paths • connections in a graph

  16. hyperlinks • the concept behind the success of the Internet • hypertext: associately interconnected text • hypermedia: associately interconnected medial representation of information • association is more than a reference it is an access mechanism

  17. Company Info Homepage Other News Products Indicators Contact Experts Contacts Accounts THE ONE-CLICK APPROACH • New wireless voice technology introducedPosted at 5:09 PM PT, Feb 8, 1999 • By Stephen Lawson, InfoWorld Electric • NTT Labs on Monday brought Dick Tracy into the enterprise, introducing a wireless voice and • data system that can use a wrist radio at the Demo 99 conference. • AirWave technology, demonstrated for the first time in the United States at this week's confe- • rence in Indian Wells, Calif., is based on a wireless PBX. Small, handheld phones -- and a • wrist radio that looks like an oversized watch -- can be used to make voice calls and exchange • data around a building or campus. The handheld phones can be switched to a public cellular • mode to become conventional cell phones. • Company representatives touted the system as offering higher voice quality than a typical PBX. • Airwave is based on NTT's Personal Handyphone System, which is currently deployed by more • than 600 users in Japan, according to the company. • Modems built in to both devices allow users to plug in a notebook or portable device for dial-up • data connections as fast as 64Kbps. Users can exchange files or e-mail, or access a LAN or the • Internet. There is no airtime charge for AirWave communications in the building or campus. • AirWave systems are scheduled to be available through distribution partners by the end of this • year, priced as low as $400 per user. • NTT Labs, the research and development arm of NTT Corp., in Tokyo, can be reached at www.nttlabs.com.

  18. Language Technology • recognition of domain-relevant named entities with statistical and rule-based methods • tolerance with respect to morphological and syntactic variation • recognition of synonyms • exploitation of thesauri and ontologies with conceptual relations • recognition of syntactic functions and thematic roles for appropriate anchor specification • annotation of documents with hyperlink designators

  19. Web Technology • Hyperlinks need to be: • relational • typed • external • possibly multidirectional

  20. functional hyperlinks • today's hyperlinks are • functional • unidirectional • untyped

  21. relational hyperlinks • Relational Hyperlink {person, homepage • person, email-address} • Relational Labelled Hyperlink • {person, „homepage“, homepage • person, „email“, email-address} • Relational Typed Labelled Hyperlink • person: {person, „homepage“, homepage • person, „email“, email-address}

  22. Relational Hyperlinks as Types

  23. Link Ontologies • Link Ontologies • and link DBs

  24. Customized Ontologies • Ontologies can be customized by • Extension • Expansion • Overwriting • Merging

  25. Recursion • A type  can have an attribute a with the value • person := • name: title: string first_name: string other_given_names: string last_name: string aka.: string ... • father: person ...

  26. Recursion The embedded type  can can be expanded: person :=name: title: string first_name: string other_given_names: string last_name: string aka.: string ... father: name: name... father: person ...

  27. Multiple Inheritance location building palace

  28. Equality person := name: title: string first_name: string other_given_names: string last_name: 1string aka.: string ... father: name: last_name:1string... father: person ...

  29. Extension • New attributes are added • examples: • new attribute: restored • new attribute for: citation in <Coleman et al.>

  30. Expansion • An atomic type is expanded into an AVM • location: Berlin is expanded into an address • technique: oil_on_canvas • is expanded into canvas: paints: layers: etc.

  31. Overwriting • The value of attributes can be overwritten • for corrections • alternative pointers to information sources • alternative representations

  32. Merging • The attributes of concepts from two ontologies can be merged • ontologies from different disciplinesor from a discipline and a metadata initiative • equality can be employed to state identity between values • example: the value of "creator" in the Dublin Corecan be set equal to "author" in BibTex bibliography format

  33. Problem: Ambiguity • Im Jahr 1942 wurde von Essen in einer kleinen Stadt in Südschweden geboren. • In the year 1942 von Essen was born in a small town in the south of Sweden. • "Essen" may be • the name of a city • the plural of "Esse" meaning smokestack • the word for food • a family-name, • the name of a Bank "Von Essen Bank"

  34. Problem: Polysemy, Aspects, Views • Often a descriptor or designator can be used in different aspects of meaning. • One of the sources of this type of uncertainty is systematic polysemy. • Another one is the aspects or views associated with a context or a user type.

  35. Polysemy • The assembly takes five minutes. • The assembly is in Building Five. • The iBook has a G3 processor and a DVD drive. • The PowerBook can be checked ouat but the iBook is currently in use by the project BABEL. • CNN has a special at 5 p.m. • Then he became Senior Vice President of CNN.

  36. Aspects • The iBook has a G3 processor and a DVD drive. • The iBook is reduced by 15% in our clearance sale. • Peter Norman will answer your questions. • The new Department Chair is Peter Norman.

  37. The Ultimate Information Management • Provide: • the right information • to the right people • in the right time • and in the right form

  38. decision triggers • All kinds of forms requiring • Approvals, • Recommendations, • Selections • Examples: • Application for a Building Permit • Credit application • Request for a comment on a hiring decision • Good decision triggers contain information relevant for the decision or references to such pieces of information

  39. Possible Targets • Short Pieces of Information (e.g., translation into Spanish) • Regular Hyperlink (e.g., homepage) • DB Access (e.g., lookup of account status) • Start of a Process (e.g., start a credit check) • Notification of a person (e.g., send query to expert) • Search out of context (e.g., search in inter-, intra-. extranet)

  40. first step: densely hyperlinked texts • in the ideal case: every meaningful unit carries typed relational hyperlinks • words, names, symbols, pictures, elements of pictures,

  41. Possible Targets • Short Pieces of Information (e.g., translation into Spanish) • Regular Hyperlink (e.g., homepage) • DB Access (e.g., lookup of account status) • Start of a Process (e.g., start a credit check) • Notification of a person (e.g., send query to expert) • Search in a context (e.g., search in inter-, intra-. extranet)

  42. Applications • Enrichment of specialized Web Sites example: SOG and Saarland Online • Enrichment of Portals example: LT World • Email Processing example: MailMinder Extension • Legacy Code example: Dresdner Bank HyperCode • Information and Knowledge Management no example yet • Associative Digital Memories

  43. second step: grounding in experience • From densely associative hypertexts to episodically enriched memories • calendar, biography, timeline • media • situations

  44. example 1: personal digital memory • calendar: 2000-12000 calender entries • email: 20.000-100.000 messages • addresses: 100-2000 addresses • photographs: 1000-30000 pictures • written papers, reports, reviews: 100-1000 documents • music: 500-5000 titles • talks: 50-500 slide sets • read electronic papers: 200-2000 documents • visited web-pages: 20.000-100.000 pages

  45. example 1: personal digital memory • establish a set of relevant entities and concepts • dates • persons • themes • locations • functions • sources • find connections among email, calendar, photographs, addresses, papers,

  46. example 1: personal digital memory • entry points • words • names • topics • dates • dynamically adapt the links to new entities and concepts

  47. long term vision: third level • First Level: structure the personal information space, i.e., information that is already on your machine, e.g., texts, correspondence, direct messages (SMS etc.), calendar, graphics • Second Level: add personal archives by digitizing additional information: e.g., photographs, personal sound recordings, musical records, movie clips, etc. • Third Level: add episodal memory (life records), create extensive sound (and image) archives of selected episodes of your daily life, including meetings and pictures of people, sights documents • Not manageable without dense associative hyperlinking

  48. example 1: personal digital memory • related projects • LifeBits (Microsoft) • Haystack (MIT) • and relevant but less related • LifeLog (DARPA)

  49. example 1: personal digital memory • related projects • LifeBits (Microsoft) • Haystack (MIT) • and relevant but less related • LifeLog (DARPA)

  50. example 2: collective/social memories • historical memories • memories of scientific developments • a combination of both

More Related