130 likes | 133 Views
Comparability of language data and analysis. Using an ontology for linguistics. Scott Farrar, U Bremen Terry Langendoen, U Arizona. Multiple language resources. Symposium focus so far has been on digital preservation of the work of individual projects.
E N D
Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U Bremen Terry Langendoen, U Arizona Symposium on Best Practice LSA, Boston, MA
Multiple language resources • Symposium focus so far has been on digital preservation of the work of individual projects. • Imagine there are 100,000 or more Web accessible digital language archives covering most of the world’s languages. • annotated texts, lexicons, grammatical descriptions, research papers, typological comparisons, ... Symposium on Best Practice LSA, Boston, MA
Limits on access to content • Metadata gets you only a little way in. • String searching gets results, but it’s often not reliable (low “precision” and “recall”). • Database searches typically can only be carried out one site at a time. Symposium on Best Practice LSA, Boston, MA
Smart searches need smart data • Use informational, not presentational, markup (cf. presentations by Simons and Lewis). • XML can be used to represent linguistic analyses to any desired degree of refinement. • Analyses in other formats (e.g. relational databases) can be migrated to XML for both archiving, and smart web searching. Symposium on Best Practice LSA, Boston, MA
Smart markup isn’t enough • Meaning and use of structural markup varies from site to site. • Same term used with different meanings. • Different terms used with the same meaning. • Markup element and attribute names and values, and structural content may be in different natural languages. • Sites are encoded at different levels of granularity. Symposium on Best Practice LSA, Boston, MA
How to say what you mean • Markup is syntax; it’s meaning can only be inferred for individual sites, or groups of sites that use a common markup scheme (e.g. TEI). • So if markup term T means “x” in archive A and “y” in archive B, then we need: • A resource (called an ontology) that provides the definitions “x” and “y” in a systematic and machine-interpretable format. • A mechanism to link T to “x” in A and T to “y” in B. Symposium on Best Practice LSA, Boston, MA
What is an ontology? • A computational artifact; • A conceptualization of a domain; • A theory of what is; • The types in a knowledge base. • There can be many ontologies for a given domain. Symposium on Best Practice LSA, Boston, MA
Why an ontology for linguistics? • Language documentation • need to decipher markup • semantics and markup • Semantic Web implementation • Natural language processing • conceptual basis for semantics (grounding) • as a common framework for linguistic and non-linguistic knowledge Symposium on Best Practice LSA, Boston, MA
GOLD • General Ontology for Linguistic Description—http://emeld.org/gold • Incorporated in EMELD’s FIELD tool. • Built using an upper ontology (SUMO) http://ontology.teknowledge.com • Currently in a very early stage of development. Symposium on Best Practice LSA, Boston, MA
Object Perdurant Relation Attribute Proposition Region Agent Quantity SetOrClass Collection SelfConnected- Object Partial SUMO taxonomy Entity Abstract Physical Symposium on Best Practice LSA, Boston, MA
What currently is in GOLD? • Categories for: • linguistic form • morphosyntactic categories • features • values • semantics for morphosyntactic categories • using SUMO • documentation Symposium on Best Practice LSA, Boston, MA
Format of GOLD • Semantic Web initiative • http://w3.org/2001/sw/ • Web Ontology Language (OWL) • An emerging Web standard and growing user base • Extensible • Lots of visualization tools and APIs are available for OWL. Symposium on Best Practice LSA, Boston, MA
What’s still needed • Buildout of GOLD (and/or development of companion ontologies) to cover the entire field. • Mechanisms to link sites to ontologies. • Can be done in part using metadata. • Development of additional ontology-aware tools for data creation and migration. • A way of ensuring that ontologies endure just like the data they help interpret. Symposium on Best Practice LSA, Boston, MA