A probabilistic approach to language structure
Download
1 / 34

A probabilistic approach to language structure - PowerPoint PPT Presentation


  • 107 Views
  • Uploaded on

A probabilistic approach to language structure. Annarita Felici and Paul Pal Royal Holloway, University of London Helsinki 2-4 June 2008 [email protected] [email protected] Outline. Field of investigation Research goals Data Probabilistic analysis Information Theory Entropy results.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' A probabilistic approach to language structure ' - holden


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
A probabilistic approach to language structure

A probabilistic approach to language structure

Annarita Felici and Paul Pal

Royal Holloway, University of London

Helsinki 2-4 June 2008

[email protected] [email protected]


Outline
Outline

  • Field of investigation

  • Research goals

  • Data

  • Probabilistic analysis

  • Information Theory

  • Entropy results

QITL3


Field of investigation
Field of investigation

  • Repetitive language structure in multilingual legal text

  • EU normative statements in translation

  • Languages of investigation

    • English, French, German and Italian

QITL3


Field of investigation legal norms
Field of investigation: legal norms

  • Deontic norms (from the Greek deon = duty).

     obligations, prohibitions, permissions and authorizations

  • Constitutive performatives

     The uttering of a performative is, or is part of, the doing of a certain kind of action or speech acts (Austin 1962)

    Uttering a sentence = doing things

QITL3


Other norm types
Other norm types

  • Logical necessity

     necessary requirements or competences

  • Non-binding norms

     guidelines, correct procedure

QITL3


Research goals
Research goals

  • To evaluate the degree of prescriptive standardization in French, German and Italian with reference to English

  • To predict translation equivalents in French, German and Italian

QITL3


under the conditions that:

  • English legal drafting is highly standardized

  • The EU and the main English drafting suggest modal verbs for prescriptive norms (Coode 1843, Driedger 1976, Dickerson 1975, Thornton 1996)

  • Text types under investigation are repetitive and reusable

  • Text types under investigation can be more or less binding

QITL3


Data

Multilingual parallel corpus

  • Origin: EU

  • Corpus size: 1.404.723 words

  • Text type: normative

  • Type of docs: Secondary Legislation(Regulations,Decisions,Directives, Recommendations)

  • Years:2001-04

  • Languages: English, French, German, Italian

QITL3


Probabilistic analysis
Probabilistic Analysis

Information Theory

To measure the amount of linguistic alternatives when translating a repetitive normative statement from English into French, German and Italian

= Quantifying information by reducing uncertainty

  • more alternatives = more uncertainty (high entropy)

  • less alternatives = more standardization, certainty (low entropy)

QITL3


Probabilistic variables
Probabilistic Variables

  • Categories of expressions

  • Linguistic forms

     English modals Entry point for parallel retrieval

     shall, must, may, can, should

QITL3


Categories of expression
Categories of expression

  • Constitutive norms and performatives

  • Logical necessity

  • Permissions and authorizations

  • Capability

  • Non-binding norms

QITL3


Linguistic forms
Linguistic forms

  • Indicative (pres.)

  • Modal verbs (mv)

  • Verbal periphrasis (vp)

  • Lexicalized modal expressions (le)

  • Ellipses (0- correspondence)

QITL3


Linguistic forms linguistic equivalents used in constitutive and performative norms
Linguistic formsLinguistic equivalents used in constitutive and performative norms

QITL3


Linguistic forms linguistic equivalents used to convey permissions and authorizations
Linguistic formsLinguistic equivalents used to convey permissions and authorizations

QITL3


QITL3


Information theory
Information Theory probability of choosing an equivalent modal verb in the translation of

  • the information value or content h(p) is dependent on the probability of occurrence (p) of an event (Shannon 1949)

    h(p)= - log (p) = log (1/p)

    Entropy degree of uncertainty

    (= shortage of information due to the large number of alternatives)

QITL3


Probabilistic analysis1
Probabilistic analysis probability of choosing an equivalent modal verb in the translation of

  • The frequency of occurrence (ni) of each linguistic form is associated with a category

  • A probability variable (pi) is derived from the estimated proportion of a particular linguistic form

QITL3


Probabilistic analysis2
Probabilistic analysis probability of choosing an equivalent modal verb in the translation of

  • In English

    P1 = p mv→ shall = n shall/ n; p2 = pmv → must = nmust/ n;

    p3 = pmv →should = nshould/n; p4 =pmv → can = ncan/n;

    p5 = pmv → may = nmay/ n

  • In French, German and Italian

    p1 = pindicative + pmv + pvp + pme + pellipses;

    p2 = pindicative + pmv + pvp + pme + pellipses

    and so on.

QITL3


Linguistic forms and frequencies of occurrences in the EU Regulation for the selected categories of 1) constitutive norms and 2) permissions and authorization

QITL3


Probabilistic approach
Probabilistic approach Regulation for the selected categories of 1) constitutive norms and 2) permissions and authorization

  • The sum of these probabilities produces different information values

  • The expected information content of a system is the sum of the information contents weighted by the probabilities for each possible outcome

QITL3


Entropy extrema
Entropy : extrema Regulation for the selected categories of 1) constitutive norms and 2) permissions and authorization

  • Variations in the language-specific p(i) values of linguistic forms produce distribution profiles reflecting the characteristics of the corresponding language.

  • Mathematically it can be shown that

    If all the p(i) values are equal (equi-probable situation), the profile is a uniform distribution and results in maximum entropy.

    If only one probability p(i) is maximum and the remaining p(i) values are zero, the entropy is minimum (e.g. English).

    All other distributions lie between these two limits (e.g. French, German and Italian)

QITL3


A concrete example
A concrete example Regulation for the selected categories of 1) constitutive norms and 2) permissions and authorization

  • Regulation document in English, French, German and Italian + a fictitious language.

  • One category of expression: e.g. the constitutive norms.

  • 5 linguistic forms for this category.

  • Total number of modal verbs and alternatives: 2075.

QITL3


Constitutive norm
Constitutive norm Regulation for the selected categories of 1) constitutive norms and 2) permissions and authorization

Frequency of occurrences of expression modes in 4 real languages and one fictitious language

QITL3


Histogram of 5 modes of expression
Histogram of 5 modes of expression Regulation for the selected categories of 1) constitutive norms and 2) permissions and authorization

QITL3


Comparison based on entropy
Comparison based on Entropy Regulation for the selected categories of 1) constitutive norms and 2) permissions and authorization

Computed Entropy of Constitutive norm

EN H = 0 + Hmv + 0 + 0 + 0 = 0.405

FR H = Hind + Hmv + Hvp + Hme + Hme =0.857

GE H = Hind + Hmv + Hvp + Hme + Hme =1.08

IT H = Hind + Hmv + Hvp + Hme + Hme =0.88

FI H = Hind + Hmv + Hvp + Hme + Hme =2.32

QITL3


Computed entropy of constitutive norms english french german italian and fictitious
Computed Entropy of constitutive norms Regulation for the selected categories of 1) constitutive norms and 2) permissions and authorization(English, French, German, Italian and Fictitious)

QITL3


Entropy results
Entropy results Regulation for the selected categories of 1) constitutive norms and 2) permissions and authorization

  • In the EU Regulation according to the 5 categories of expression

    (1. Constitutive and performative norms, 2. Logical necessity, 3.Permissions and authorizations, 4.Capability, 5. Non-binding norms)

  • In the EU Secondary Legislation overall according to the 4 types of documents

    (Regulations, Decisions, Directives, Recommendations)

QITL3


Entropy in the eu regulation
Entropy in the EU Regulation Regulation for the selected categories of 1) constitutive norms and 2) permissions and authorization

QITL3


Entropy results eu regulation
Entropy results Regulation for the selected categories of 1) constitutive norms and 2) permissions and authorizationEU Regulation

  • Logical necessity, permissions and authorizations and capability(< entropy)

    • quite standardized in the 4 languages = almost equivalent translations

  • Constitutive performative norms(> entropy)

    • translation is more difficult to predict

    • Definitions, const. statements, obligations

    • FR: < entropy than IT

    • DE: > entropy (VP sein/haben…zu)

QITL3


Entropy results eu regulation1
Entropy results Regulation for the selected categories of 1) constitutive norms and 2) permissions and authorizationEU Regulation

  • Non -binding norms

    • fairly amount of variation among the 4 languages

    • FR/IT: >entropy

    • DE: < entropy (should is most likely translated with sollen- Soll-Vorschriften)

QITL3


Entropy overall the 4 eu documents
Entropy overall the 4 EU documents Regulation for the selected categories of 1) constitutive norms and 2) permissions and authorization

QITL3


Entropy results eu secondary legislation
Entropy results Regulation for the selected categories of 1) constitutive norms and 2) permissions and authorizationEU Secondary Legislation

  • Regulations and Decisions(< entropy)

    • Direct applicability of the norms = more precision and standardization

    • FR looks more standardized than IT and DE

  • Directives(> entropy than Reg. and Dec.)

    • Binding only as to the result to be achieved

  • Recommendations (> entropy)

    • Not-binding: more freedom

    • DE : sollen

QITL3


Conclusions
Conclusions Regulation for the selected categories of 1) constitutive norms and 2) permissions and authorization

  • Given certain conditions, it is possible to predict with some certainty the occurrence of a particular factor

  • If applied to repetitive texts, entropy analysis can enhance research in langauge testing, evaluation and in the development of automated translation’s tools

QITL3


References
References Regulation for the selected categories of 1) constitutive norms and 2) permissions and authorization

  • Austin, J. L. 1962. How to do things with words.Oxford: Oxford University Press.

  • Coode, G. 1843. Legislative Expressions. Appendix to the Report of the Poor Law Commissioners on Local Taxation. Published separately 1845, 2nd Ed.1852.

  • Driedger, E. A. 1976. The Composition of legislation. Legislative forms and precedents(2nd Ed.). Ottawa:The Department of Justice

  • Shannon, Cand W. Weaver. 1963 (1949) The mathematical theory of communication. Urbana: University of Illinois Press.USA.

  • Thornton G.C. 1996. Legislative Drafting (4th Ed.). Butterworths, London.

  • http://publications.europa.eu/code/en/en-6000000.htm

QITL3


ad