Lin6932 topics in computational linguistics
Download
1 / 41

LIN6932 Topics in Computational Linguistics - PowerPoint PPT Presentation


  • 134 Views
  • Uploaded on

LIN6932 Topics in Computational Linguistics. Lecture 7 Hana Filip. Overview. Admin Stuff: All have access to their lin6932 class account? Historical background Definition of formal grammars Chomsky Hierarchy Computational complexity of natural language. Historical Background.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'LIN6932 Topics in Computational Linguistics' - Gabriel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Lin6932 topics in computational linguistics

LIN6932 Topics in Computational Linguistics

Lecture 7

Hana Filip

LIN 6932


Overview
Overview

Admin Stuff: All have access to their lin6932 class account?

  • Historical background

  • Definition of formal grammars

  • Chomsky Hierarchy

  • Computational complexity of natural language

LIN 6932


Historical background
Historical Background

  • In 1949, Shannon and Weaver published The Mathematical Theory of

    Communication, showing that statistical approximations to English based on

    Markov processes could be used to encode English efficiently for

    transmission in noise. Tasks like machine translation could be solved by

    treating e.g. Russian as a noisy encoding of English.

  • In 1957, Chomsky published Syntactic Structures, showing that natural

    grammars could not be exactly captured by such methods. It seemed to follow

    that machine translation could not be modeled as a noisy channel (although

    the machines were too small to actually try any of this).

  • Chomsky made a point of being open to the idea that STATISTICS could guide

    grammatical processing.

  • Nevertheless, most work in computational linguistics switched to

    linguistically informed high-level SYMBOLIC representations of syntax and semantics,

    and small knowledge domains.

LIN 6932


Historical background1
Historical Background

Given the grammar of a language, one can study the use of the language

statistically in various ways; and the development of probabilistic models

for the use of language (as distinct from the syntactic structure of the

language) can be quite rewarding.

Chomsky, 1957:17, note 4

LIN 6932


Historical background2
Historical Background

  • Around 1988, the machines got big enough to try both techniques.

  • Surprisingly, low level statistical approximations such as Markov processes worked better than linguistically informed representations on almost all tasks, such as speech recognition, parsing free text, and MT itself.

  • Most work in computational linguistics switched to linguistically uninformed,

    low-level statistical approximations and machine learning.

  • Around 2000, the process of putting linguistics, statistical models, and

    machine learning, back together again.

LIN 6932


Historical background3
Historical Background

  • Chomsky’s 1957 book Syntactic Structures, together with certain more

    technical papers from around the same time, is one of the most important

    documents in linguistics and cognitive science.

  • The theory in detail has been completely superseded.

  • Nevertheless, surprisingly many of the formal devices that it includes recur,

    particularly in the most modern descendents, including: kernel sentences (aka

    lexicalized elementary trees or categories etc.); generalized or

    “double-based” transformations (aka Merge, combinatory rules,

    tree-adjunction, etc.; affix-hopping); the role of statistics

    in natural language processing.

LIN 6932


Historical background4
Historical Background

Affix Hopping as PF Merger and VP Ellipsis*

LIN 6932


Historical background5
Historical Background

One crucial ingredient of the theory was a hierarchy of language types, now known as CHOMSKY’S HIERARCHY, each type characterized by a class of rules that are sufficient to specify all languages of that type, an automaton which is sufficient to recognize whether a sentence is from a given language of that type, and a class of languages including all those of classes lower in the hierarchy as a proper subset.

Question: What is the expressive/generative power of natural languages?

Where are human languages located on this hierarchy?

LIN 6932


Formal grammar review
Formal Grammar (Review)

A formal grammar G is quadruple G = <N, T, S, R>

N a finite set of nonterminal symbols

T a finite set terminal symbols

S start symbol (S  N)

R rules (‘productions’)

  • Rules take the form ,

    “ rewrites as ”, where ,  are strings of symbols from the infinite set of strings (TN)* and  must contain at least one non-terminal symbol.

    Other conditions on the rules are imposed for particular classes of grammars.

LIN 6932


Chomsky hierarchy
Chomsky Hierarchy

Other conditions are imposed for particular classes of grammars.

The Chomsky hierarchy:

Types of grammars defined in terms of additional restrictions on the form of the rules:

Type 0: No restriction. Each rule is of the form   , and  ≠.

Type 1: Each rule is of the form A  , where A N and ≠.

Type 2: Each rule is of the form A  . ( may be .)

Type 3: Each rule is of the form A aB or A a.

Common names:

Type 0: Unrestricted rewriting systems (Turing Equivalent)

Type 1: Context-sensitive grammars.

Type 2: Context-free grammars.

Type 3: Right-linear, or regular, or finite state automata / finite state grammars.

LIN 6932



Chomsky hierarchy2
Chomsky Hierarchy

of classes (or families) of languages, originally defined by the form of the rules needed to generate the languages in those classes, but which can also be characterized at least in part by "dependencies" between elements that appear in the strings that make up the languages of those classes.

The smallest infinite class of languages in the Chomsky hierarchy is the class RL of regular languages. These are the languages that can be represented by regular expressions.

The next larger class of languages in the Chomsky hierarchy is the class CFL of context-free languages. Every regular language is also a context-free language, but the converse is not true.

LIN 6932


Chomsky chierarchy
Chomsky Chierarchy

TYPE O: UNRESTRICTED REWRITING SYSTEMS

no restrictions on the rules: L(G) = {w  T* | S * w}

a set of strings composed of terminal symbols derived from S

“*” is the reflexive and transitive closure of “”

“*” can be constructed as a set of operators each of which is

obtained by applying  successively 0 or more times.

  • Every recursively enumerable language can be described by a rewriting system: an algorithm that "enumerates" the strings of the language, I.e., this means that its output is simply a list of the members of L: w1, w2, w3, ... . If necessary, this algorithm may run forever.

  • Recursively enumerable languages are languages for which there is a decision procedure for determining for any arbitrary string that it is a well-defined string in the language, but not necessarily for determining for any arbitrary string not in the language that it is not.

  • Membership in a type-0 language is undecidable.

LIN 6932


Chomsky chierarchy1
Chomsky Chierarchy

CONTEXT-SENSITIVE GRAMMARS

  • Subclass of type-0 grammars

  • Restriction: all rules take the form

     A  

    length()  length() where AN, , ,   (TN)*,  ≠ 

    Membership in a context-sensitive language (CSL) is decidable.

    CLSs are languages for which there is a decision procedure for determining whether an arbitrary string does or does not belong to the language.

LIN 6932


Chomsky chierarchy2
Chomsky Chierarchy

  • Not all decidable languages are context-sensitive (but most are)

LIN 6932


Chomsky chierarchy3
Chomsky Chierarchy

CONTEXT-FREE GRAMMARS

  • Subclass of context-sensitive grammars

  • Restriction: rules take the form

    A  

    where AN,   (TN)+

  • Membership in context-free language (CFL) is decidable

LIN 6932


Chomsky chierarchy4
Chomsky Chierarchy

REGULAR GRAMMARS

  • Subclass of context-free grammars

  • Restriction: all rules take the form

    A  a or

    A  aB

    where A,B  N and a  T

    Membership is decidable.

    RGs are expressively equivalent to finite state automata, or Markov process.

    Automata viewed as either generators or acceptors.

    Grammars viewed as either generators or acceptors.

LIN 6932


Chomsky chierarchy5
Chomsky Chierarchy

REGULAR GRAMMARS

A  a or A  aB

where A,B  N and a  T

LIN 6932


Chomsky chierarchy6
Chomsky Chierarchy

Strong and Weak Equivalence

  • Two grammars are said to be “weakly equivalent” if they generate the same

    language or string set: cp. categorial grammars and phrase structure grammars.

  • Two grammars are “strongly equivalent” if they assign the same tree(s) to

    each string in the same language.

  • All grammars at a given level in the hierarchy have strongly equivalent

    grammars at higher levels, but not vice versa.

  • A grammar or class of grammars is said to be “strongly adequate” to the

    capture of a language or class of languages if it assigns the “right” trees to

    strings. The “right” tree is the one we need for semantic interpretation.

  • Weakly equivalent grammars which assign the “wrong” trees are said to be

    only “weakly adequate.”

LIN 6932


Chomsky chierarchy7
Chomsky Chierarchy

Categorial GrammarPhrase Structure Grammar

John loves Mary S

n <n, <n,t>> n NP VP

n <n,t> N V NP

t (semantic type of a sentence) John loves Mary

combinatorially transparent categories

LIN 6932


Nl and chomsky hierarchy
NL and Chomsky Hierarchy

Where are natural languages located?

Expressive power of human languages

  • The first three chapters of Chomsky 1957 show that human languages fall

    outside the lowest level of Regular/Finite State languages, and are at least at the level

    of Context Free Languages.

  • The proof requires a distinction between ideal linguistic capacity, now known

    as Competence and the Performance mechanism that actually processes

    sentences.

  • Competence allows sentences that are so long or convoluted that none of us

    will live long enough or have enough memory to process them. Performance

    cannot cope with them—but clearly this limitation is accidental, not a fact

    about English.

  • Syntactic Structures goes on to suggest that the level of human grammars is

    still higher in the hierarchy. It raises (but does not answer) the question of

    which level is just high enough to contain all human languages.

LIN 6932


Nl and chomsky hierarchy1
NL and Chomsky Hierarchy

Where are natural languages located?

Typical argument for the complexity of NL:

  • Find a recursive construction C in a natural language L (English)

  • Assume that the construction type in question is theoretically unbounded: i.e., in theory, speakers could go on producing ever longer instances of the construction.

  • Argue that the competence of speakers admits unlimited recursion (while the performance certainly poses an upper limit; competence vs. performance distinction!)

  • Reduce C to a formal expression of known complexity in language L’ via a homomorphism (a structure-preserving mapping)

  • Make a case that L must be at least as complex L’

  • Extrapolate from this one instance to all human languages: if there’s this one construction C in this one language that has this complexity, then the human language faculty must allow this in general.

LIN 6932


Nl and chomsky hierarchy2
NL and Chomsky Hierarchy

NL is not regular: Chomsky’s 1957 original argument

Structure of his argument:

Consider 3 hypothetical languages:

1. ab, aabb, aaabbb (anbn)

2. aa, bb, abba, baab, aaaa, bbbb, aabbaa, abbbba, … (palindromic)

3. aa, bb, abab, baba, aaaa, bbbb, aabaab, abbabb, aababaabab (copy language)

  • It can be shown that these are not regular languages

LIN 6932


Nl and chomsky hierarchy3
NL and Chomsky Hierarchy

The Pumping Lemma (a technique for proving that certain languages are not regular).

If L is an infinite finite automaton language (FAL) over alphabet A, then there are strings x,y,z A* such that y ≠ and xynz L for all n ≥ 0.

Why: machines for infinite languages must have loops. The string y in the lemma corresponds to a string accepted during a traversal of a loop.

Note that lemma does not say ‘iff’.

LIN 6932


Nl and chomsky hierarchy4
NL and Chomsky Hierarchy

The Pumping Lemma (a technique for proving that certain languages are not regular).

Example: L = {anbn | n ≥ 0 }

If L were a FAL, then, by Pumping Lemma, there would be x,y,z A* such that

y ≠ and xynz L for all n ≥ 0.

Assume that such x,y,z exist so the string xyz is in L. But by

definition of L it should be in the form anbn for some n. What’s y in this case? It can’t be empty, so it would consist of (1) some number of a’s, or (2) some number of b’s, or (3) some number of a’s followed by some number of b’s. But it is easy to see that in any of those cases the strings xyyz, xyyyz, etc. could not belong to L.

But the pumping theorem is not always useful for showing a language to be non-regular.

LIN 6932


Nl and chomsky hierarchy5
NL and Chomsky Hierarchy

NL is not regular - at least context-free power: Chomsky’s 1957 original argument:

LIN 6932


Nl and chomsky hierarchy6
NL and Chomsky Hierarchy

Therefore, Chomsky claims, English cannot be regular

“It is clear, then that in English we can find a sequence a+S1+b, where there is a dependency between a and b, and we can select as S1 another sequence c+S2+d, where there is a dependency between c and d … etc. A set of sentences that is constructed in this way … will have all of the mirror image properties of (2) which exclude (2) from the set of finite languages.” (Chomsky 1957, p.21)

Note: Chomsky writes “finite languages”, but he means “regular languages”.

LIN 6932


Nl and chomsky hierarchy7
NL and Chomsky Hierarchy

Chomsky’s argument: because English contains these constructions, which are not regular, English is not regular.

As stated, the argument is fallacious.

LIN 6932


Nl and chomsky hierarchy8
NL and Chomsky Hierarchy

How to state the observation correctly

LIN 6932


Nl and chomsky hierarchy9
NL and Chomsky Hierarchy

Similar point about center-embedding/nested dependencies

the cats that the dogchases miau

the dependency between the dog and chases nests within the dependency between the cats and miau

Assume the following homomorphism:

a = {the cat, the dog, the rat, …}

b = {chase, miau, bite, bark, …}

Then this is an instance of anbn

Chomsky’s argument:

  • Any useful syntactic analysis will relate the nouns to their corresponding verbs.

  • No FSA is capable of keeping track of center embeddings of arbitrary depth (which would be required since the grammatical subset of L is infinite). No FSA can provide a useful syntactic analysis for center-embedding.

    Therefore, since English has such constructions, English is non regular language.

LIN 6932



Nl and chomsky hierarchy11
NL and Chomsky Hierarchy

The language anbn corresponds to the context-free grammar

S  a S b

S  a b

It gives rise to the following tree for the string aaabbb

LIN 6932


Nl and chomsky hierarchy12
NL and Chomsky Hierarchy

Dissenting view 1:

  • all arguments to this effect use center-embedding/nested dependencies,cp.

    the cats that the dog chasesmiau

    the dependency between the dog and chases nests within the dependency between the cats and miau

  • humans are extremely bad at processing center-embedding

  • notion of competence that ignores this is dubious

  • natural languages are regular after all

LIN 6932


Nl and chomsky hierarchy13
NL and Chomsky Hierarchy

Dissenting view 2:

  • Any *finite* language is a regular language.

  • If you don't distinguish performance and competence, then English as a language certainly couldn't contain any sentence longer than the number of words a human being could utter in a lifetime. (This assumes human lifetimes are finite, but that seems uncontroversial.)

  • This may be a HUGE number, but it is definitely finite, and so without the distinction English is formally a finite language and therefore regular.

LIN 6932


Nl and chomsky hierarchy14
NL and Chomsky Hierarchy

Are natural languages context-free?

  • history of the question: Chomsky’s 1957 conjecture that natural

    languages are not context-free

  • In the 60’s and 70’s, many attempts to prove that NL is not context-free

  • Pullum and Gazdar 1982 (Generalized Phrase Structure Grammar):

    - all these attempts have failed- for all we know, natural languages (conceived as string sets) might be context-free

  • Huybregts 1984, Shieber 1985:

    - proof that Swiss German is not context-free, cross-serial dependencies

  • Culy 1985: proof that Bambara (a Northwestern Mande language

    spoken in Mali) ) is not context-free

LIN 6932


Nl and chomsky hierarchy15
NL and Chomsky Hierarchy

  • Nested and Crossing Dependencies

    Context-free languages -- unlike regular languages -- can have unbounded dependencies.

    However, these dependencies can only be nested, not crossing.

    Example:

    anbn has unlimited nested dependencies: context-free

    The copy language has unlimited crossing dependencies: not context-free

LIN 6932


Nl and chomsky hierarchy16
NL and Chomsky Hierarchy

  • Nested and Crossing Dependencies

    Bar-Hillel and Shamir (1960):

    • English contains copy-language (crossing dependencies)

    • Cannot be context-free

      John, Mary, David, ... are a widower, a widow, a widower, ..., respectively.

      Claim: the sentence is only grammatical under the condition that if the nth name is male (female) then the nth phrase after the copula is a widower (a widow)

LIN 6932


Nl and chomsky hierarchy17
NL and Chomsky Hierarchy

  • Nested and Crossing Dependencies

    suppose the claim is true, intersect English with regular language

    L1 =(Paul|Paula)+are(a widower|a widow)+respectively

    Result: Copy language L3 {ww|w(a|b)+}

    English  L1 = L2, homomorphism L2  L3

    John, David, Paul, …  aa widower  a

    Mary, Paula, Betty, …  ba widow  b

    are, respectively  

LIN 6932


Nl and chomsky hierarchy18
NL and Chomsky Hierarchy

  • Result: Copy language L3

    {ww|w(a|b)+}

    Copy language is not context-free

    Hence L2 is not

    Hence English is not

LIN 6932


Nl and chomsky hierarchy19
NL and Chomsky Hierarchy

Cross-serial dependencies in Dutch

Huybregt (1976)

  • Dutch has copy-language like structures

  • thus Dutch is not context-free

    dat Jan Marie Pieter Arabisch laat zien schrijven

    that jan marie pieter arabic let see write

    ‘that Jan let Marie see Pieter write Arabic’

LIN 6932


Nl and chomsky hierarchy20
NL and Chomsky Hierarchy

Counterargument

Crossing dependencies only concern argument linking, i.e., semantics

As far as plain strings are concerned, the relevant fragment of Dutch has the structure

NPnVn

which is context-free

LIN 6932


ad