Felix Zhang
1 / 12

Development of a German-English Translator - PowerPoint PPT Presentation

  • Uploaded on

Felix Zhang Period 5 2007-2008 Thomas Jefferson High School for Science and Technology Computer Systems Research Lab. Development of a German-English Translator. Summary of Quarter 2. NP Chunking Lemmatization Dictionary Lookup Inflection Noun-verb agreement. Scope for this quarter.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Development of a German-English Translator' - dolan

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Felix Zhang

Period 5


Thomas Jefferson High School for Science and Technology Computer Systems Research Lab

Development of a German-English Translator

Summary of quarter 2
Summary of Quarter 2

  • NP Chunking

  • Lemmatization

  • Dictionary Lookup

  • Inflection

  • Noun-verb agreement

Scope for this quarter
Scope for this quarter

  • Focus less on statistical methods

  • Get rudimentary grammar system working

  • Fix all the bugs I’ve made since September

New and modified components
New and Modified Components

  • More info stored in NP chunking

  • Better noun-verb agreement

  • Grammar

    • Element Assignment

    • Priority Number Assignment

Noun verb agreement
Noun-verb agreement

  • Simple method to eliminate more ambiguities

    def eliminateother(attribs, sub, closest):

    for x in attribs:

    if x[0][1] == "nou" and x != sub:

    for y in x[1]:

    if y[0]== "nom":


    return attribs

Noun phrase chunking
Noun phrase chunking

  • Now used for English sentences

  • Stores more info for later methods

  • “the man make the children”

  • NP Chunked English: [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]]], ['make', 'ver', [['3', 'pl'], 'pres']], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]]]]

Element assignment
Element Assignment

  • Based on linguistic information

  • If case is nominative, chunk is subject

  • If accusative, chunk is direct object

  • [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj'], ['make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub']]

Priority assignment
Priority Assignment

  • Each sentence element is assigned priority number

  • Based on position in English sentence

  • Assignments:

    • sub 1

    • mverb 2

    • auxverb 3

    • iobj 4

    • dobj 5

  • Sort by number for English grammar

Full run of program
Full run of program

input: “den Mann machen die kleinen Kinder”

The small children make the man

[email protected] ~/research $ python proj.py

Part of speech tags: [['den', 'art'], ['Mann', 'nou'], ['machen', 'ver'], ['die', 'art'], ['kleinen', 'adj'], ['Kinder', 'nou']]

Morphological analysis: [[['Mann', 'nou'], [['akk', 'mas'], ['dat', 'pl']]], [['machen', 'ver'], [['1', 'pl'], ['3', 'pl'], 'pres']], [['kleinen', 'adj'], [['nom', 'pl'], ['akk', 'pl']]], [['Kinder', 'nou'], [['nom', 'pl'], ['akk', 'pl']]]]

Disambiguated after noun-verb agreement: [[['Mann', 'nou'], [['akk', 'mas'], ['dat', 'pl']]], [['machen', 'ver'], [['3', 'pl'], 'pres']], [['kleinen', 'adj'], [['nom', 'pl'], ['akk', 'pl']]], [['Kinder', 'nou'], [['nom', 'pl']]]]

Lemmatized: [['Mann', ['Mann', 'Man']], ['machen', ['machen']], ['kleinen', ['klein']], ['Kinder', ['Kind']]]

Root translated: [['den', 'the'], ['Mann', 'man'], ['machen', 'make'], ['die', 'the'], ['kleinen', 'small'], ['Kinder', 'child']]

NP Chunked English: [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]]], ['make', 'ver', [['3', 'pl'], 'pres']], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]]]]

Inflected (only works before chunking):

['the', 'the'] ['man', ['akk', 'mas'], 'man'] ['man', ['dat', 'pl'], 'mans'] ['make', ['3', 'pl'], 'make'] ['the', 'the'] ['small', 'small'] ['child', ['nom', 'pl'], 'childs']

Assigned an element type:

[[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj'], ['make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub']]

Assigned priority:

[['5', ['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj'], ['2', 'make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], ['1', ['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub']]

Rearranged to English structure:

[['1', ['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub'], ['2', 'make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], ['5', ['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj']]


  • Ambiguities (again)‏

    • One ambiguity can change the entire structure of the sentence

    • “I gave a horse the hat” vs. “I gave the hat a horse”

    • Attempt at all permutations possible

      • User disambiguation


  • Inflexible

    • Grammar can only be rearranged in one specific way

    • Subject – Main verb – Indirect – Direct – Auxiliary Verb

    • Does not accommodate for prepositions, conjunctions, etc.

Future research
Future research

  • Implement more statistical methods

    • Morphological info

    • Actual translation – bilingual corpus

  • Create better parse tree – Dependency grammar