john tinsley n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
John Tinsley PowerPoint Presentation
Download Presentation
John Tinsley

Loading in 2 Seconds...

play fullscreen
1 / 25

John Tinsley - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

ACL 4 NCLT Seminar Presentation, 7 th June 2006. John Tinsley. Morphological Analysis of Spanish Using Finite-State Transducers. Introduction. What is this project about? Provide morphological information on Spanish strings Generate strings from morphologcal descriptions

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'John Tinsley' - ostinmannual


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
john tinsley

ACL 4

NCLT Seminar Presentation, 7th June 2006

John Tinsley

Morphological Analysis of Spanish Using Finite-State Transducers

introduction
Introduction
  • What is this project about?
    • Provide morphological information on Spanish strings
    • Generate strings from morphologcal descriptions
  • What were my aims?
    • Robust, fast, application – easily integrated into other systems
    • 80% token coverage on unrestricted text
    • 100% coverage of Spanish morphology
design methodology
Design Methodology
  • Formalisation
    • Discovery of Spanish morphological rules
  • Implementation
    • Coding of morphological model with Xerox Finite-State Tools
  • Evaluation
    • Check for accuracy & well-formedness
    • Assess language coverage
spanish morphology verbs
Spanish Morphology - Verbs
  • Inflected for person, tense/mood, number
  • Regular verbs
    • 3 regular conjugations identified by infinitive endings
    • ‘-ar’, ‘-er’, and ‘-ir’
  • Irregular verbs
    • 66 distinct irregularities
    • Varying degrees of irregularity
spanish morphology nouns
Spanish Morphology - Nouns
  • Inflected for number, gender
  • 7 types of noun
    • Feminine, masculine, neutral, derivative, profession, number invariant, proper
  • Irregularities
    • All arise via pluralisation
    • Accentuation, character alterations
spanish morphology adjectives
Spanish Morphology - Adjectives
  • Inflected for number, gender
  • 4 types of adjective
    • Neutral, derivative, profession, irregular
  • Adverbs derived from adjectives by addition of suffix ‘mente’
xerox finite state tools lexc
Xerox-Finite State Tools - lexc
  • Lexicon compiler
  • Compiles ‘continuation classes’ into lexical transducers
xerox finite state tools xfst
Xerox Finite-State Tools - xfst
  • Xerox finite-state tool
  • Compiles regular expressions into networks
  • Regular expression replace rules

[ String -> Replacement || left-context _ right-context ]

xerox finite state tool example
Xerox Finite-State Tool - example
  • conocer - ‘to know’
  • 1st person, pres. ind. ‘conozco’
  • Lexical transducer mappings
    • conoc:conoc
    • er+Verb:ε
    • +PresInd:^PresInd
    • +1P+Sg:o
xerox finite state tool example cont
Xerox Finite-State Tool - example cont…
  • Composed replace rule

[ c -> {zc} || _ ^PresInd ]

  • Triggered by the ^PresInd tag
  • Makes required changes, remove trigger
verb lexicon
Verb Lexicon
  • Coded in lexc
  • Model has 3 regular paths
  • 66 varieties of irregularity
    • e.g. poder ‘to be able to’

LEXICON Irreg43

0:^UE^VSoue^PRET1^FR ErV ;

[o -> {ue} || _Consonant^<4 [%^UE ?* [[%^PresInd | %^PresSubj] ?* [%^1PSg | %^2PSg | %^3PSg | %^3PPl] ]

noun lexicon
Noun Lexicon

LEXICON NounFem ! Feminine Nouns

!STEM !CONT. CLASS ! GLOSS

acción fIsNounEs ; ! action

LEXICON fIsNounEs ! feminine pluralised with 'es'

+Noun:0 fNounPluralES ;

LEXICON fNounPluralES

+Sg+Fem:0 # ;

+Pl+Fem:^NZ^NOes # ;

[z -> c || _ %^NZ]

[ó -> o || _ ?^<5 %^NO ]

adjective lexicon
Adjective Lexicon
  • Same process as noun lexicon
  • Uses the same replace rules
  • One exception for adverbs

LEXICON nIsAdjS

+Adj:0 nAdjPluralS ;

+Adj|+Adv:^AAOmente # ;

[o -> a || _ %^NAO %^AAO {mente}]

other transducers
Other Transducers
  • Overgeneration Filter
    • llover ‘to rain’
  • Capitalisation
  • Trigger Remover
  • Execution script

~[ $[{llov} ?* [[%+1P | %+2P] [%+Sg | %+Pl] | [%+3P %+Pl] ] ]

[ a (->) A || .#. _ ]

[ %^IE -> 0 ]

testing
Testing
  • Accuracy
    • Maintaining integrity of existing rules
      • Projection
      • Subtraction
  • Well-formedness
    • Ensuring tag order
assessing coverage
Assessing Coverage
  • Aim – 80% on unrestricted text
  • Statistical predictions (Crystal 1997)
  • Corpus compilation and processing
    • Europarl, 3 corpora (http://people.csail.mit.edu/koehn/publications/europarl/ )
  • Phase 1 – augmentation
  • Phase 2 – 81% coverage
  • Final assessment – 84.15% coverage
further details
Further Details
  • Generates approx. 44,000 unique morphological descriptions
  • Evaluation corpus – 1.26 analyses per input token on average
possible improvements
Possible improvements
  • Increase coverage
    • lexicon augmentation
  • Disambiguation using POS tagger
  • More derivational morphology
  • Deal with different dialects of Spanish
references
References
  • (Beesley & Karttunen 2003) Beesley, K. and Karttunen, L., Finite State Morphology, CSLI Publications, United States, 2003. 
  • (Claret 2005) Los Verbos Castellanos Conjugados, Sexta Edición, Editorial Claret, Barcelona, 2005
  • (Crystal 1997) Crystal, D., The Cambridge Encyclopedia of Language. (2nd. ed.) Cambridge University Press, 1997
  • Europarl - Europarl Parallel Corpus http://people.csail.mit.edu/koehn/publications/europarl/ - Last Accessed 19/05/2006
  • (Kendris 1990) Kendris, C. Spanish Grammar. Barron’s, 1990.
  • (Mateo & Rojo Sastre 1997) Mateo, F. and Rojo Sastre, A.J. Collection Bescherelle - Les verbes espagnols. Hatier, 1997.
  • Real Academia Española – http://www.rae.es/ - Last Accessed 25/05/2006
conclusions

Conclusions

Demonstration

slide25

LEXICON ArVerbs

!STEM !CONT. CLASS !GLOSS

abord ArV ; !to approach

LEXICON ArV

ar+Verb:0 ArConj ;

LEXICON ArConj

!TAGS !CONT.CLASS

+PresInd:^PresInd ArPresInd ;

+PretInd:^PretInd ArPretInd ;

LEXICON ArPresInd ! Present Indicative

+1P+Sg:o^1PSg #;

+2P+Sg:as^2PSg #;

+3P+Sg:a^3PSg #;