1 / 32

NLP Interchange Format (NIF )

NLP Interchange Format (NIF ). Presented by : Swaran Lata Email : slata@mit.gov.in Dated:1 st March 2013. Paradigm shift in the evolution of internet. “Internet is the network of networks.”. Web 1.0. Web 2.0. Web 3.0. Web 1.0.

jariah
Download Presentation

NLP Interchange Format (NIF )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NLP Interchange Format (NIF) Presented by : Swaran Lata Email : slata@mit.gov.in Dated:1st March 2013

  2. Paradigm shift in the evolution of internet • “Internet is the network of networks.” Web 1.0 Web 2.0 Web 3.0

  3. Web 1.0 • The first stage was linking web pages and sharing with web pages • The concept of Hyperlink was introduced in 1993 • Characteristics • Personal Web pages • Static web pages • HTML based sites • HTML forms sent via email • Use of framesets • The main type of connection was dialup having 50k bandwidth • Read only content • EgYoutube (Business Paradigm Shift in web) • Rebecca black Justin Beiber have become international stars overnight • Dhanush’sKolaveri D has become international hit

  4. Web 1.0 era Portals Directories HTML static web pages Web 1.0 Content Management Systems Netscape

  5. Web 2.0 • Web 1.0 graduated into Web 2.0 during 2003-06 • Web 2.0 is about user-generated content and the read-write web. People are consuming as well as contributing information through blogs • Concept of “prosumer” i.e. minimal differentiation between producer and consumer of content • Examples • Social Networking Sites – Hosted services • Blogs – Web Applications • Wikis – Mashups • Video Sharing Sites – Folksonomies

  6. Web 2.0 era Web 2.0 RSS Feed

  7. Web 3.0 • Will be metaverse • Will be a web development layer that includes characteristics • TV-quality open video • 3D simulations • augmented reality • human-constructed semantic standards • pervasive broadband, wireless, and sensors • a time when "the internet swallows the television.“ • Web 3.0 will allow the user to sit back and let the Internet do all of work for them

  8. Web 3.0 (Contd..) • Web 3.0 Technologies (Semantic Web) Includes 1. Artificial intelligence 2. Automated reasoning 3. Cognitive architecture 4. Composite applications 5. Distributed computing 6. Knowledge representation 7. Ontology (computer science) 8. Recombinant text 9. Scalable vector graphics 10. Semantic Web 11. Semantic Wiki 12. Software agents

  9. Why we are moving Towards Semantic Web 3.0

  10. Web 3.0 era Cloud Ontologies Better Search Engines Web 3.0 SPARQL RDF Linked data Machine Readable data

  11. What is Semantic web • Web of data • The Semantic Web, an extension of the current one[]. • It provides well-defined information, • Enabling computers and people to work in cooperation • Framework for sharing and reusing of data • Correlation of data with real world objects

  12. Important components of Semantic Web • Major components: • Resource Description Framework (RDF) • Web Ontology language(OWL) • Linked Data • Vocabulary • SPARQL • Simple Knowledge Organization system (SKOS)

  13. Resource Description Framework (RDF) • An XML-based language used to describe resources • Resources can include entities, concepts, properties and relations • Captures the meta data about the “externals” of a document • Can use a serialized model, RDF triplets, special notation, or graphs to describe data

  14. Resource Description Framework Triplets

  15. Web Ontologies (OWL) • An ontology is an explicit specification of a conceptualization. • An ontology consists of a set of axioms which place constraints on sets of individuals (called "classes") and the types of relationships permitted between them. • To define an instantiate of  Web ontologies. • OWL is a family of knowledge representation languages for authoring ontologies. • OWL differs from an XML schema in that it is a knowledge representation, not a message format. • Documents from different domains can be merged together to answer a user query.

  16. Linked Data and it components • Linked Data describes a method of publishing structured data making it more useful & understand . • Linked Data publishes data on the web in such a way that it is machine readable. • Linked Data may be as diverse as databases maintained by two organisations in different geographical locations, or heterogeneous systems within one organisation that have not easily interoperated at the data level. Components: • URIs are used to identify things. • Use HTTP URIs so that these things can be referred to and looked up ("dereferenced") by people and user agents. • Provide useful information about the thing in the standard formats such as RDF/XML. • Include links to other, related URIs to improve discovery of other related information on the Web.

  17. Linked open Data (LOD 2) Technology • The LOD2 stack is an integrated distribution of aligned tools which support the life-cycle of Linked (Open) Datafrom extraction, authoring/creation over enrichment, interlinking, fusing to visualization and maintenance. The life-cycle comprises in particular the stages : • Extraction of RDF from text, XML and SQL • Querying and Exploration using SPARQL • Authoring of Linked Data using a Semantic Wiki • Semi-automatic link discovery between Linked Data sources • Knowledge-base Enrichment and Repair

  18. Linked open Data (LOD 2) Project • NLP2RDF is a LOD project that is developing the NLP Interchange Format (NIF). • NIF aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. • The output of NLP tools can be converted into RDF and used in the LOD Stack.

  19. Semantic Web Structure

  20. What is NIF • NLP Interchange Format (NIF) is an RDF/OWL-based format that allows to combine and chain several Natural Language Processing (NLP) tools in a flexible, light-weight way. The core of NIF consists of three parts: 1. A set of URI recipes, used to create unique and potentially stable URIs to anchor annotations in documents. 2. A vocabulary, which can represent Strings, Words and Sentences as RDF resources. 3. Transformations for the programmatic usage of the Ontologies of Linguistic Annotations (OLiA).

  21. Important Components Of NIF • Structural Interoperability :URI recipes are used to anchor annotations in documents with the help of fragment identifiers. The URI recipes are complemented by two ontologies (String Ontology and Structured Sentence Ontology), which are used to describe the basic types of these URIs (i.e. String, Document, Word, Sentence) as well as the relations between them. • Conceptual Interoperability:The Structured Sentence Ontology (SSO) was especially developed to connect existing ontologies with the String Ontology and thus attach common annotations to the text fragment URIs. The NIF ontology can easily be extended and integrates several NLP ontologies. • Access Interoperability: A REST interface description for NIF components and web services allows NLP tools to interact on a programmatic level.

  22. Architecture Overview

  23. NIF – Integration Architecture NIF Wrapper NIF Wrapper NLP TOOL NLP TOOL NLP TOOL NIF Wrapper RDF Model Wordnet

  24. Associated Standard • Web Ontology language(OWL) • NLP • Linked Data • RDF

  25. How NIF Helps NLP Requirements of Web • All URIs created by the mentioned URI recipes should be typed with the respective OWL Class. • In each returned NIF model there should be at least one URI that relates to the document as a whole. • Each other annotated String should be related to the URI given to the Document with a property that is a sub property of str:subString. • For each annotation, a reference model should be used, so the annotations are machine-interpretable.

  26. How NLP Tools are integrated with NIF Models • NLP tools can be integrated with NIF, if an adapter is created, that is able to parse a NIF Model into the internal data structure and also to output the NIF as a serialization. A NLP pipeline can then be formed by either: • Passing the NIF RDF Model from tool to tool • Passing the text to each tool and then merge the NIF output to a large model. The URI recipes of NIF are designed to make it possible to have zero overhead and only use one triple per annotation

  27. Linking of NIF with Wordnet and XML Models

  28. The Structure of Word net Wn: word Wn: synset Wn: word Sense Wn :word Wn: has sense Wn: in synset Rdf: type Wn: lexical form संज्ञा(Noun) बातचीत Wn: word Wn: word Sense Wn: synset Wn :word Wn: has sense Rdf: type Wn: in synset क्रिया(Verb) Wn: lexical form कर्म Relation to other word senses, e.g. antonym Relation to other synset e.g. hypernym , hyponym

  29. How word net is related to semantic web/RDF Data Base of different lexical and semantic web relation b/w Hindi words RDF OWL Linked Data Hindi Word Net

  30. XML Model • XML is a tree-structured document • Nodes • Element nodes • Children can be ordered • Recursive elements (parts under parts) • Attribute nodes • Mandatory or optional • Edges • Sub-element edges • Attribute edges • IDRef edges • Constraints • References • Value restrictions, OneOf • Cardinality • Trees are more flexible than tables • Any number of nodes can be added anywhere without breaking the model

  31. Future work • Wordnet to RDF format • Wordnet with other Ontologies like - Library - ISSN • Matching of Wordnet vis-à-vis generic ontology. • Proliferation of Semantic Web/Linked Data through creating awareness. • Development Semantic Web/Linked Data for use in Wordnet. • To evolve the opportunities for implementation of Semantic web in Indian Languages.

  32. കൂ କ ಕ ਕ క क గ ક ক ಕ ક ಕ କ ਕ ক क ક గ ಕ Thanks & Questions slata@mit.gov.in 91-11-24301272

More Related