Building a large scale knowledge base for machine translation
This presentation is the property of its rightful owner.
Sponsored Links
1 / 10

Building a Large-Scale Knowledge Base for Machine Translation PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on
  • Presentation posted in: General

Building a Large-Scale Knowledge Base for Machine Translation. Kevin Knight and Steve K. Luk Presenter: Cristina Nicolae. Linguistic resources combined into PANGLOSS. PENMAN Upper Model (Bateman 1990) top-level network of 200 nodes implemented in the LOOM KR language

Download Presentation

Building a Large-Scale Knowledge Base for Machine Translation

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Building a large scale knowledge base for machine translation

Building a Large-Scale Knowledge Base for Machine Translation

Kevin Knight and Steve K. Luk

Presenter: Cristina Nicolae


Linguistic resources combined into pangloss

Linguistic resources combined into PANGLOSS

  • PENMAN Upper Model(Bateman 1990)

    • top-level network of 200 nodes implemented in the LOOM KR language

    • makes extensive use of syntactic-semantic correspondences (taxonomy  grammar)

  • ONTOS(Carlson & Nirenburg 1990)

    • top-level ontology designed to support machine translation

  • Longman’s Dictionary (LDOCE)

    • words with definition, usage, syntactic code ([B3] for adj+to), semantic code ([H] for human), pragmatic code ([ECZB] for economics/business)

  • WordNet(Miller 1990)

    • semantic word database

  • Collins Bilingual Dictionary

    • Spanish-English dictionary


Merging resources

Merging resources


Merging resources contributions

Merging resources – contributions

  • LDOCE: syntax and subject area

  • WordNet: synonyms and hierarchical structuring

  • the upper structures: organize the knowledge for NLP in general and the English generation in particular

  • the bilingual dictionary: lets us index the ontology from a second language


Definition match algorithm

Definition Match Algorithm

  • two word senses should be matched if their two definitions share words

  • looks also at related words and senses (e.g. synonyms)

    LDOCE

  • (batter_2_0) “mixture of flour, eggs and milk, beaten together and used in cooking”

  • (batter_3_0) “a person who bats, esp. in baseball – compare BATSMAN”

    WordNet

  • (BATTER-1) “ballplayer who bats”

  • (BATTER-2) “a flourmixture thin enough to pour or drop from a spoon”

  • Match:

    • (batter_2_0) with (BATTER-2)

    • (batter_3_0) with (BATTER-1)


Definition match algorithm results

Definition Match Algorithm – Results

Ran algorithm on all nouns from LDOCE and WordNet.


Hierarchy match algorithm

Hierarchy Match Algorithm

  • uses sense hierarchies inside LDOCE and WordNet

  • once two senses are matched, it is a good idea to look at their respective ancestors and descendants for further matches

  • Match:

    • animal_1_2 with ANIMAL-1

    • and their respective animal-subhierarchies

  • start with unambiguous words and match them, then look downward and upward in the hierarchies rooted at them and match those too


Hierarchy match algorithm results

Hierarchy Match Algorithm – Results

  • In the end, the algorithm produced

    11,128 noun sense matches at 96% accuracy.


Bilingual match algorithm

Bilingual Match Algorithm

  • goal is to annotate the ontology with a large Spanish lexicon

  • from:

    • mappings between Spanish and English words (from Collins)

    • mappings between English words and ontological entities (from WordNet)

    • conceptual relations between ontological entities

  • we obtain:

    • direct links between Spanish words and ontological entities


Discussion

Discussion

  • each merge algorithm presented above is verified by humans afterwards (humans are faster at verifying info than generating it from scratch)

  • semi-automatic merging brings together complementary sources of information

  • also allows us to detect errors and omissions where resources are redundant


  • Login