Recognising nominalisations
Download
1 / 25

RECOGNISING NOMINALISATIONS - PowerPoint PPT Presentation


  • 738 Views
  • Uploaded on

RECOGNISING NOMINALISATIONS . Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh. DEFINITION. “Nominalisation refers to the process of forming a noun from some other word-class. (e.g. red+ness)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'RECOGNISING NOMINALISATIONS' - Rita


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Recognising nominalisations
RECOGNISING NOMINALISATIONS

  • Supervisors: Dr. Alex Lascarides

    Dr. Mirella Lapata

  • (Andrew) Yuk On KONG

  • University of Edinburgh


Definition
DEFINITION

  • “Nominalisation refers to the process of forming a noun from some other word-class. (e.g. red+ness)

  • or (in classical transformational grammar especially) the derivation of a noun phrase from an underlying clause (e.g. Her answering of the letter….from She answered the letter).

  • The term is also used in the classification of relative clauses (e.g. What concerns me is her attitude)…….” (Crystal 1997)


  • Nominalisations (1st definition) from verbs only are considered here, e.g. "statement" from "state".

  • Problem: WORD--noun? from a verb or not?

  • Nominalsations derived from verbs are very productive in English and are usually created by means of suffixation (i.e., suffixes that form nouns are attached to verb bases).


Exclusions
EXCLUSIONS

  • Nominals, e.g. the poor, the wounded

  • Nominalisation NOT From Verb, e.g. redness

  • -ing form, e.g. the making of the movie

  • Antidisestablish-ment-arian-ism


Regular
REGULAR?

  • Nominalise nominalisation

  • Interpret interpretation

  • Interrupt interruption

  • Associate association

  • delete deletion

  • break breakage

  • leak leakage


  • Confine confinement

  • Refine refinement

    (but

  • define definition)

  • submit submission

  • admit admission (but also admittance)

  • remit remission; remittance; remit


Verb noun
VERB=NOUN

  • Debate Debate (not debation); debater

  • Pay pay

  • Love love

  • Boss boss

  • Stand stand

  • purchase purchase

  • Lie lie (“tell a lie”)

  • (cf lie down)


Verb noun except stress
VERB=NOUN (except stress)

  • transfer transfer

  • transport transport

  • import import

  • rebel rebel; (rebellion)


1 verb 1 nouns
1 VERB, >1 NOUNS

  • Collect collection; collector

  • Interpret interpretation; interpreter

  • Cover cover; coverage

  • Conduct conduction; conductor;

  • Depend dependant/dependent; dependence; dependency


Semantics
SEMANTICS

  • Conduct conduction (conduct electricity/heat)

  • Conduct conduct (behave/organise)


When to use which suffix
WHEN TO USE WHICH SUFFIX

  • -tion/-sion

  • er/or

  • Debate debater

  • Talk talker

  • Collect collector

  • Conduct conductor


Irregular nominalisation
IRREGULAR NOMINALISATION

  • Choose choice

  • Succeed success;succession;successor

  • Decide decision

  • Sell sale


Pseudo nominalisation
PSEUDO-NOMINALISATION

  • mote?? Motion

    (noun; a very small piece of dust)

  • Depart Departure; Department???

  • Apart apartment????


Why bother
WHY BOTHER?

  • The identification of nominalisations and their associated verbs (e.g. "statement" and "state"). important for a number of NLP tasks:

    • machine translation

    • information retrieval

    • automatic learning of machine-readable dictionaries

    • grammar induction


HOW ?

  • nominalisation is a productive morphological phenomenon:

  • list all acceptable nominalised forms?

  • New words?


Techniques not focusing on nominalisations
techniques NOT focusing on nominalisations

  • build rules

  • machine-learning approaches to induce morphological structures using large corpora

  • knowledge-free induction of inflectional morphologies (Schone and Jurafsky 2001).


Schone and jurafsky 2001
SCHONE AND JURAFSKY (2001)

  • Schone and Jurafsky (2001) have performed work for acquiring cognates and morphological variants. 

    • Induced semantics—Latent Semantic Analysis (LSA)

    • Induced orthographic info

    • Induced syntactic info

    • Transitive information

    • Affix frequencies


Goal of this study
GOAL OF THIS STUDY

  • The principal goal of this project is to develop a system which can recognise nominalisations, together with the verbs from which they are derived.


Experiment 1 baseline
EXPERIMENT 1 (baseline)

  • identify nouns using the tags in the corpus

  • identify potential nominalisations from the list of nouns with a list of nominalisation suffixes

  • find the corresponding potential verb for each by identifying the verb (from among verbs as tagged) that shares with it the greatest number of letters in sequence

  • accept a pair of nominalisation and verb if the % letter matched > 50% and discard any other


Experiment 2
EXPERIMENT 2

  • using decision tree to build a model

  • possible features include:

    -letter similarity between verbs and nouns

    -suffix frequency

    -verb frequency

    -verb semantics

    -subject of noun

    -subject of verb


Evaluation
EVALUATION

  • experiments will be based on the BNC corpus.

  • The obtained nominalisations will be evaluated against the CELEX morphological lexicon and manually annotated data.

  • Precision, recall and F-score


British national corpus
BRITISH NATIONAL CORPUS

  • Over 100 million words

  • Corpus of modern English

  • Both spoken (10%) and written (90%)

  • Each word is automatically tagged by the CLAWS stochastic POS tagger

  • 65 different tags

  • encoded using SGML to represent POS tags and a variety of other structural properties of texts (e.g. headings, paragraphs, lists, etc.)


  • <item>

  • <s n=086>

  • <w NN1-VVG>Shopping <w PRP>including <w NN1>collection <w PRF>of

  • <w NN2>prescriptions

  • </item>

  • <item>

  • <s n=087>

  • <w VVG>Daysitting <w CJC>and <w VVG>nightsitting

  • </item>


Celex
CELEX

  • English, Dutch and German

  • Annotated by human using lemmata from two dictionaries of English

  • 52,446 lemmata and 160,594 wordforms

  • orthographic, phonological, morphological, syntactic and frequency information

  • morphological structure, e.g. ((celebrate),(ion))


Milestones
MILESTONES

  • 6/2002 Experiment 1—baseline

  • 7/2002 Experiment 2

  • 8/2002 Write-up

  • 9/2002 Finalise report


ad