1 / 13

Comparability of language data and analysis

Comparability of language data and analysis. Using an ontology for linguistics. Scott Farrar, U Bremen Terry Langendoen, U Arizona. Multiple language resources. Symposium focus so far has been on digital preservation of the work of individual projects.

kmillwood
Download Presentation

Comparability of language data and analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U Bremen Terry Langendoen, U Arizona Symposium on Best Practice LSA, Boston, MA

  2. Multiple language resources • Symposium focus so far has been on digital preservation of the work of individual projects. • Imagine there are 100,000 or more Web accessible digital language archives covering most of the world’s languages. • annotated texts, lexicons, grammatical descriptions, research papers, typological comparisons, ... Symposium on Best Practice LSA, Boston, MA

  3. Limits on access to content • Metadata gets you only a little way in. • String searching gets results, but it’s often not reliable (low “precision” and “recall”). • Database searches typically can only be carried out one site at a time. Symposium on Best Practice LSA, Boston, MA

  4. Smart searches need smart data • Use informational, not presentational, markup (cf. presentations by Simons and Lewis). • XML can be used to represent linguistic analyses to any desired degree of refinement. • Analyses in other formats (e.g. relational databases) can be migrated to XML for both archiving, and smart web searching. Symposium on Best Practice LSA, Boston, MA

  5. Smart markup isn’t enough • Meaning and use of structural markup varies from site to site. • Same term used with different meanings. • Different terms used with the same meaning. • Markup element and attribute names and values, and structural content may be in different natural languages. • Sites are encoded at different levels of granularity. Symposium on Best Practice LSA, Boston, MA

  6. How to say what you mean • Markup is syntax; it’s meaning can only be inferred for individual sites, or groups of sites that use a common markup scheme (e.g. TEI). • So if markup term T means “x” in archive A and “y” in archive B, then we need: • A resource (called an ontology) that provides the definitions “x” and “y” in a systematic and machine-interpretable format. • A mechanism to link T to “x” in A and T to “y” in B. Symposium on Best Practice LSA, Boston, MA

  7. What is an ontology? • A computational artifact; • A conceptualization of a domain; • A theory of what is; • The types in a knowledge base. • There can be many ontologies for a given domain. Symposium on Best Practice LSA, Boston, MA

  8. Why an ontology for linguistics? • Language documentation • need to decipher markup • semantics and markup • Semantic Web implementation • Natural language processing • conceptual basis for semantics (grounding) • as a common framework for linguistic and non-linguistic knowledge Symposium on Best Practice LSA, Boston, MA

  9. GOLD • General Ontology for Linguistic Description—http://emeld.org/gold • Incorporated in EMELD’s FIELD tool. • Built using an upper ontology (SUMO) http://ontology.teknowledge.com • Currently in a very early stage of development. Symposium on Best Practice LSA, Boston, MA

  10. Object Perdurant Relation Attribute Proposition Region Agent Quantity SetOrClass Collection SelfConnected- Object Partial SUMO taxonomy Entity Abstract Physical Symposium on Best Practice LSA, Boston, MA

  11. What currently is in GOLD? • Categories for: • linguistic form • morphosyntactic categories • features • values • semantics for morphosyntactic categories • using SUMO • documentation Symposium on Best Practice LSA, Boston, MA

  12. Format of GOLD • Semantic Web initiative • http://w3.org/2001/sw/ • Web Ontology Language (OWL) • An emerging Web standard and growing user base • Extensible • Lots of visualization tools and APIs are available for OWL. Symposium on Best Practice LSA, Boston, MA

  13. What’s still needed • Buildout of GOLD (and/or development of companion ontologies) to cover the entire field. • Mechanisms to link sites to ontologies. • Can be done in part using metadata. • Development of additional ontology-aware tools for data creation and migration. • A way of ensuring that ontologies endure just like the data they help interpret. Symposium on Best Practice LSA, Boston, MA

More Related