300 likes | 597 Views
Linguistic Web Services for Semantic Web. BT Short Term Research Fellowship. Dr. Vassil T. Vassilev London Metropolitan University. Part I Semantic Web and Linguistic Data Processing. Content. 1 Project Background: Semantic Web and NLP 2 RDF – Lingua Franca of Semantic Web
E N D
Linguistic Web Services for Semantic Web BT Short Term Research Fellowship Dr. Vassil T. Vassilev London Metropolitan University July - October 2003
Part I Semantic Web and Linguistic Data Processing
Content 1 Project Background: Semantic Web and NLP 2RDF – Lingua Franca of Semantic Web 3 The need for linguistic support of Semantic Web 4WordNet: Universal Linguistic Resource • WordNet as a model of the word semantics • WordNet as an online thesaurus • WordNet as a relational database 5 Step One: Putting WordNet on the Web 6 Step Two: Extending WordNet 7 Step Three: LinguaShare 8 Problems and Directions
1 Project Background: Semantic Web and NLP Semantic Web:Model-driven framework for semantically rich data processing over the Web – • RDF – Dublin Core (1999), W3C (1999) • DAML – DARPA (2000); OIL – FP5 (2000) http://www.w3c.org/2001/sw/ http://www.dublincore.org/documents/dces/ Semantic Thesaurus: Linguistic database containing word meanings and semantic relations • WordNet – George Miller, Princeton Univ. (1990) • EuroWordNet – FP4 (1997); BalkaNet – FP5 (2000) http://www.cogsci.princeton.edu/~wn/ http://www.hum.uva.nl/~ewn#EuroWordnet
1.1. Semantic data processing over the Web • Syntactic markup of the data (RDF,Topic Maps) • Using a kind of a meta-language (schema) for providing intended semantics of the data represented (RDFS, DAML) • Specify domain ontologies for representing the restrictions, dependencies, regularities and rules for inference (KIF, OIL, OWL)
1.2. Computer-based semantic thesaurus • Explaining the meaning of the words • Finding other words with the same meaning (synonyms) • Finding of other words with similar meaning in the same context (synonymous usage) • Finding of semantically independent, related or dependent word forms (semantic referencing)
Determining ontological information using lexical information EXAMPLE:Type inference through analysis of the argument structure of verb phrases and their syntactic appearance in texts: • The varieties of argument structure for EVENT-verbs suggests seven major subtypes: PHENOMENON, ASPECTUAL, STATE, ACT, PSYCHOLOGICAL_EVENT, CHANGE and CAUSE_CHANGE • Based on them, we can differentiate COGNITIVE_EVENT (experiencer is syntactic subject, e.g. fear) from ACT (experiencer is syntactic object, e.g., frighten)
1.3 Project definition Aims: • utilizing the full potential of WordNet multilingual thesauri as an universal linguistic ontology for semantic verification of specialist terminology • embedding it in applications for semantic data processing over the Web • using contemporary Semantic Web Services technologies and tools Methodology: • Analytical research (WordNet) • Modeling (relational models, UML) • Software prototyping (Tomcat, MySQL)
2 RDF – Lingua Franca of Semantic Web • Language to describe resources primarily on the Web (has semantics); can be used not only on the Web – e.g. Dublin Core for library catalogues • Use XML as a syntax representation of RDF statements (serialization syntax); there are alternative serializations (e.g. triplets), but XML is the most popular • The language can formulate statements about the language itself (meta-description); RDF Schema or RDFS • The statements can be stored, processed and transported over the Web (data persistence)
2.1 RDF Model Resources – Things being described by RDF expressions. Resources are named by URIs Examples: HTML document, XML element within the document, Collection of pages, Book Properties – Specific attributes or relations used to describe a resource. Attributes and relations can be also used as resources. Examples: Creator, Title, Name Values – Simply literals or references to resources Statements, e.g. Predicate(Property) Subject(Resource) Object(Value)
Example “Vassil Vassilev whose e-mail is v.vassilev@londonmet.ac.uk is the creator of web page http://www.lgu.ac.uk/~vassil/index.html” Subject (Resource): ‘http://www.lgu.ac.uk/~vassil/index.html’ Predicate (Property): ‘Creator’ Object (Value): ‘Vassil Vassilev’
Serialized representation in XML <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/” xmlns:vcard="http://imc.org/vCard/3.0#"> <rdf:Description about=“http://www.lgu.ac.uk/~vassil/index.htm” <dc:creator> <rdf:Description> <vcard:FN>Vassil Vassilev</vcard:FN> <vcard:EMAIL>v.vassilev@londonmet.ac.uk</vcard:EMAIL> </rdf:Description> </dc:creator> </rdf:Description> </rdf:RDF>
2.2 Semantic Web Applications • Context-based Information Retrieval (search after semantic patterns) • Personalized Information Delivery(data presentation based on user profiles) • User tracking(dynamic construction of user profiles based on log analysis) • Document summarizing (text generation based on models of the meaning) • Automatic translation (text transformation which uses meaning models)
2.3 Semantic Web Tools • Persistent storage and query interpreters (XML databases/XQuery, RDF repositories/RQL) • Ontology visualizers and editors (OntoEdit, Protégé, etc.) • Ontology navigators and semantic searchengines (AskJeeves, RDF Quiz, OntoSearch) • Ontology-based inference engines (Cyc, Kaon, OMM)
Some observations • Layers separation (data storage, data communication, information description, terminology definition, fact inference) • Layers isolation (syntactic wrapping vs. semantic mapping) • Information processing concentrated on the most abstract level (ontology) • Hierarchy of languages SQL XMLRDF RDFS OWL
3 The Need for Linguistic Support of Semantic Web Why: • For combining multiple namespaces and syntactic names reconciliation • For word disambiguation in text analysis • For semantic indexingof text corpora • For resolvingsemantic inaccuracies in texts (esp. similarity, alternatives, exclusion, generalization,etc) • For representing text meaning in transformations which use an intermediate model of the meaning
4 WordNet as Universal Linguistic Resource • Word forms (nouns, verbs, adjectives and adverbs) and lexical relations between them • Synsets and meaning relations (synonymy, antonymy, hyponymy, meronymy, troponimy, etc) • Lexicaldatabase (set of indexed files or a database) • Command language interface (originally Tcl/tk scripts for direct file manipulation, but APIs for Java and other languages also available) • Multi-lingualthesauri (network of WordNet databases for most of the languages)
4.1 WordNet semantics • Relational model with both standard (ATTRIBUTE, ANTONYM, ENTAILMENT, CAUSE) and transitive relations (HYPERNYM,HOLONYM, MERONYM) • Formally can be interpreted in first-order relational structures (Kripke structures) – requires modal logic • For adequate representation of the relations either object-relational, or relational database with additional indexing of the transitive relations (transitive closure) is necessary
4.2 Relational schema of the original WordNet thesaurus word represents the syntactic word forms divided into four main categories – noun phrases, verb phrases, adjectives and adverbs synset defines the different meaning sets used for giving semantic interpretation of the word forms sense many-to-many relationship between word forms and synsets lexrel purely lexical relationships which hold between the word forms semrel semantic relationships between the word forms which contains the semantic thesaurus
5 Putting WordNet on the Web • Synchronous query/response model of working (CGI calls) • Purely relational database for storing the thesaurus (MySQL) • Front-end implemented as a set of servlets which query the thesaurus on behalf of other applications • XML format of the data returned as a result of the queries • Separated from the applications and use of independent server (Tomcat)
Part II LinguaShare: Linguistic Web Service for Semantic Web