a renewed portuguese module for intex 4 3x n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A renewed Portuguese module for INTEX 4.3x PowerPoint Presentation
Download Presentation
A renewed Portuguese module for INTEX 4.3x

Loading in 2 Seconds...

play fullscreen
1 / 16

A renewed Portuguese module for INTEX 4.3x - PowerPoint PPT Presentation


  • 143 Views
  • Uploaded on

A renewed Portuguese module for INTEX 4.3x. Cristina Mota LabEL (CAUTL/IST) and Linguateca Av. Rovisco Pais I 1049-001 Lisboa, Portugal cristina@label.ist.utl.pt. 6 th Intex Workshop Sofia, Bulgaria - May 28-30. Overview. Two major issues will be discussed: .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A renewed Portuguese module for INTEX 4.3x' - mircea


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a renewed portuguese module for intex 4 3x

A renewed Portuguese module for INTEX 4.3x

Cristina Mota

LabEL (CAUTL/IST) and Linguateca

Av. Rovisco Pais I

1049-001 Lisboa, Portugal

cristina@label.ist.utl.pt

6th Intex WorkshopSofia, Bulgaria - May 28-30

overview
Overview

Two major issues will be discussed:

  • The analysis of diminutive, augmentative and superlative forms, in particular of those having accented base forms. For instance:
    • pá / pazinha (shovel / small shovel);
    • rápido / rapidíssimo (fast / very fast)
  • The aim of this presentation is to show how the new features of Intex 4.3x helped in the representation and treatment of Portuguese specific problems.
  • The analysis of modified verbal and clitic forms. Example:
    • Nós comprámos um livro (We bought a book);

Nós comprámo-lo (We bought it);

* Nós comprámos-o

analysis of diminutive augmentative and superlative forms
Analysis of diminutive, augmentative and superlative forms

Dim.

coelho [rabbit]

 coelhinho, coelhinha, coelhinho, coelhinha [little rabbit]

inho (or ito) diminutive suffix is added to the base form, inflecting in gender and number

leão [lion]

 leãozinho, leoazinha, leõezinhos, leoazinhas [little lion]

zinho (or zito) diminutive suffix is added to the inflected base form, inflecting in gender and number

Aug.

carro [car]

 carrão, carrões [big car]

Sup.

denso [dense]

 densíssimo, densíssima, densíssimos, densíssimas [very dense]

Nouns and adjectives in Portuguese vary in gender and number. Besides receiving the gender and number morphemes they also accept diminutive, augmentative and superlative (only the adjectives) suffixes.

analysis of diminutive augmentative and superlative forms1
Analysis of diminutive, augmentative and superlative forms
  • Representation by Inflectional Graphs
    • Prior to the new morphological parser, the only way of recognizing nouns and adjectives accepting grade variation was by introducing a code in the DELAS entries that allowed the generation of the corresponding diminutive, augmentative and superlative forms in the DELAF dictionary.

coelho [rabbit]  coelhinho, coelhinha, coelhinho, coelhinha

coelho,N001_dh001_dt001

coelhinho,coelho.N:ms

coelhinha,coelho.N:fs

coelhinhos,coelho.N:mp

coelhinhas,coelho.N:fp

analysis of diminutive augmentative and superlative forms2
Analysis of diminutive, augmentative and superlative forms
  • Whenever a noun or an adjective with acute or circumflex accent have grade variation, the corresponding forms do not have the accent. For instance:

pá [shovel]  pazinha

recaída [relapse]  recaidazinha

côdea [crust]  codeazinha

dúvida [doubt]  duvidazinha

célula [cell]  celulazinha

lágrima [tear]  lagrimazinha

  • The Problematic cases
    • In Portuguese, words may have one of four accents: acute (á, é, í, ó, ú), grave (à), circumflex (â, ê, ô) and tilde (ã, õ). There are a few words with two accents: an acute accent and a tilde.
  • Even though all these diminutive words are formed by adding the suffix –zinha, the base forms should have different inflectional codes, using this first approach, increasing the number of inflectional graphs.
  • In order to keep the same code, these forms are generated with accents and then a AWK script removes them obtaining the final DELAF.
analysis of diminutive augmentative and superlative forms3
Analysis of diminutive, augmentative and superlative forms

Analysis of diminutive forms -zinha of accented words: celulazinha, pazinha, etc.

Analysis of diminutive forms –zinha of non-accented words: aldeiazinha, aventurazinha, …

  • Representation by Derivational Graphs
    • The new morphological parser of INTEX 4.3x makes possible the representation of the accent deletion process.
analysis of diminutive augmentative and superlative forms4
Analysis of diminutive, augmentative and superlative forms

Results of the Derivational Graphs

After applying the derivational graph in conjunction with the DELAF dictionary, the morphological parser recognizes both the diminutives created from non-accented words:

  • aldeiazinha,{aldeia,.N306+dh306+dt306:fs}{zinha,zinha.SUF+Dim:fs}
  • aventurazinha,{aventura,.N306+dh306+dt306:fs}{zinha,zinha.SUF+Dim:fs}

as well as the diminutives created from accented words:

  • celulazinha,{célula,.N306+dh306+dt306:fs}{zinha,zinha.SUF+Dim:fs}
  • codeazinha,{côdea,.N306+dh306+dt306:fs}{zinha,zinha.SUF+Dim:fs}
  • duvidazinha,{dúvida,.N306+dh306+dt306:fs}{zinha,zinha.SUF+Dim:fs}
  • lagrimazinha,{lágrima,.N306+dh306+dt306:fs}{zinha,zinha.SUF+Dim:fs}
  • pazinha,{pá,.N306+dh306+dt306:fs}{zinha,zinha.SUF+Dim:fs}
  • recaidazinha,{recaída,.N306+dh306+dt306:fs}{zinha,zinha.SUF+Dim:fs}

Misleading Results

Laginha,{Laga,laga.N:fs}{inha,.SUF+Dim:fs} Laginha is a proper name

trocinhos,{trocos,troco.N:mp}{inhos,.SUF+Dim:mp} troquinhos is the diminutive

of trocos not trocinhos

analysis of diminutive augmentative and superlative forms5
Analysis of diminutive, augmentative and superlative forms

Solution A

Remove diminutives, augmentatives and superlatives from the DELAF. Since they can be homographs of other words, the derivational graphs will be very restrictive and used with normal priority.

Solution B

Keep diminutives, augmentatives and superlatives in the DELAF. The derivational graphs will be more flexible and conceived in a way they can help easily enlarging the DELAS. They will be used with low priority.

verb clitc analysis
Verb-Clitc Analysis
  • When the clitic pronouns o (3ms), a (3fs), os (3mp), as (3fp) are after the verbal form, bound to it by an hyphen, they may have undergone formal modifications, depending on the verbal form termination. Thus, if the termination is:
    • a vowel or an oral diphthong, the clitic forms do not undergo any modifications: o, a, os, as;
    • a nasal diphthong, the clitic forms change to: no, na, nos, nas;
    • -r, -s or -z, the clitic forms change to:lo, la, los, las. In this context, the verbal forms are also modified, loosing the final consonant. The vowel preceding the -r forms, will receive an accent (acute or circumflex depending on the thematic vowel of the verb).

No modification

Ele comprou um livro ontem [He bought a book yesterday]

Ele comprou-o ontem [He bought-it yesterday]

Clitic modification

Eles compraram um livro ontem [They bought a book yesterday]

Eles compraram-no ontem [They bougth-it yesterday]

Verbal form and Clitic modification

Nós comprámos um livro ontem [We bought a book yesterday ]

Nós comprámo(s)-lo ontem [We bought-it yesterday ]

verb clitc analysis1
Verb-Clitc Analysis

Simple Present of the Verb comprar(to buy)

compro,comprar.V:P1s

compras,comprar.V:P2s

compra,comprar.V:P2s:P2's:P3s:Y2s

compramos,comprar.V:P1p

compramo,comprar.V:P1p

comprais,comprar.V:P2p

comprai,comprar.V:P2p:Y2p

compram,comprar.V:P2'p:P3p

The entries containing inflectional information in bold correspond to verbal forms that are modified by the presence of clitics.

In the presence of reflexive and dative pronouns nos (1p) and vos (2p), the first and second plural verbal forms ending in -s are modified. The clitics do not suffer modifications.

Verbal form modification

Nós vestimo(s)-nos [We dressed ourselves]

The modified verbal and clitic forms are described in the inflectional graphs and consequently are generated simultaneously with the non-modified forms when the DELAF is created.

However, the Intex 4.2x DELAF version did not have information about the clitics that allowed to (i) distinguish the two forms and (ii) guaranty the correct combination of the verbal form with the clitic.

verb clitc analysis2
Verb-Clitc Analysis

In the new module, it was integrated information about clitics to the verbal and to the acusative, dative and reflexive clitic forms. The possible clitic codes are:

i the verbal form may occur without clitics

c the clitic is not modified and does not modify the verbal form

o clitic forms o, a, os , as

l clitic forms lo, la, los , las

n clitic forms no, na, nos , nas

q clitic may modify verbal form (nos and vos)

This information is enclosed between square brackets in the inflectional features field:

compro,comprar.V:P1s[icqo]

compras,comprar.V:P2s[icq]

compra,comprar.V:P2s[l]:P4s[icqo]:P3s[icqo]:Y2s[icqo]

compramos,comprar.V:P1p[ic]

compramo,comprar.V:P1p[ql]

Form occurs with clitic c orwithout clitic (i)

os,eu.PRO:4mp[o]:3mp[o]

los,eu.PRO:4mp[l]:3mp[l]

nos,eu.PRO:1p[q]:4mp[n]:3mp[n]

te,eu.PRO:2s[c]

Form occurs only with clitics q and l

verb clitc analysis3
Verb-Clitc Analysis

Disambiguation grammar for removing incorrect verb-clitic combinations.

The introduction of the clitic codes allows to disambiguate verb-clitic combinations.

verb clitc analysis4
Verb-Clitc Analysis

Analysis of the future with clitic included in a negative context; substitution by declarative context

Analysis of the future with clitic included in a declarative context; substitution by negative context

Eles não o comprarão Eles comprá-lo-ão

Eles comprá-lo-ão  Eles não o comprarão

The clitic codes can also be used in syntactic transformations to obtain the correct forms of the verbs and clitics.

the portuguese 4 3x module http label ist utl pt public resources html
The Portuguese 4.3x modulehttp://label.ist.utl.pt/public-resources.html

DELAS / DELAF

Enhanced with clitic information

  • Inflectional Graphs
    • Nouns, Adjectives
    • Verbs
    • Pronouns
    • Determiners, Conjunctions, Prepositions, Adverbs
  • DELAC / DELACF
    • Nouns
    • Adverbs
    • Prepositions
    • Conjunctions
  • Derivational Graphs
    • Superlative
    • Augmentative
    • Superlative
    • Other productive processes
  • Lexical graphs
    • Roman numerals
    • Cardinal numerals
    • Ordinal numerals

Acronym dictionaries(and corresponding description dictionary)

  • Local Grammars
    • Auxiliary Verb Tagging
    • Temporal Expressions
    • Numeric Expressions
  • Disambiguation Grammars
    • NP containing Adjectives
    • Verb-Clitic sequences
productive derivational creation
Productive Derivational Creation

The first steps towards a description of productive derivational processes are also being given. The main goal is to analyze unknown words and help in the enhancement of the DELAF.

productive derivational creation1
Productive Derivational Creation
  • Remarks
  • Even though the graph seems very productive, it should be stressed that it is not meant to be an alternative to not including, for instance, nouns resulting from nominalizations, in the DELAS.
  • If it was the case, the graph should be more restrictive:
    • <$verbo#ir.V+Nominalization:W> {ção,.SUF:fs}
  • and the verb entries should account for the possibility of the nominalization:
    • construir,V+Nominalization
  • Anyway it is important to relate the two entries (the verb and the noun) by adding the corresponding information to the entries:
    • construir,V_N=2ção
    • construção,N_V=3ir
  • The introduction of this type of information will be one of our major concerns.