cs460 626 natural language processing speech nlp and the web lecture 1 introduction n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 1 – Introduction) PowerPoint Presentation
Download Presentation
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 1 – Introduction)

Loading in 2 Seconds...

play fullscreen
1 / 82

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 1 – Introduction) - PowerPoint PPT Presentation


  • 136 Views
  • Uploaded on

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 1 – Introduction). Pushpak Bhattacharyya CSE Dept., IIT Bombay 4 th Jan , 2011. Persons involved. Faculty instructors: Dr. Pushpak Bhattacharyya ( www.cse.iitb.ac.in/~pb )

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 1 – Introduction)' - inge


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cs460 626 natural language processing speech nlp and the web lecture 1 introduction

CS460/626 : Natural Language Processing/Speech, NLP and the Web(Lecture 1 – Introduction)

Pushpak BhattacharyyaCSE Dept., IIT Bombay

4th Jan, 2011

persons involved
Persons involved
  • Faculty instructors: Dr. Pushpak Bhattacharyya (www.cse.iitb.ac.in/~pb)
  • TAs: JoydipDatta, DebarghyaMajumdar {joydip,deb}@cse
  • Course home page (to be created)
    • www.cse.iitb.ac.in/~cs626-460-2011
perpectivising nlp areas of ai and their inter dependencies
Perpectivising NLP: Areas of AI and their inter-dependencies

Knowledge Representation

Search

Logic

Machine Learning

Planning

Expert Systems

NLP

Vision

Robotics

books etc
Books etc.
  • Main Text(s):
    • Natural Language Understanding: James Allan
    • Speech and NLP: Jurafsky and Martin
    • Foundations of Statistical NLP: Manning and Schutze
  • Other References:
    • NLP a Paninian Perspective: Bharati, Cahitanya and Sangal
    • Statistical NLP: Charniak
  • Journals
    • Computational Linguistics, Natural Language Engineering, AI, AI Magazine, IEEE SMC
  • Conferences
    • ACL, EACL, COLING, MT Summit, EMNLP, IJCNLP, HLT, ICON, SIGIR, WWW, ICML, ECML
topics proposed to be covered
Topics proposed to be covered
  • Shallow Processing
    • Part of Speech Tagging and Chunking using HMM, MEMM, CRF, and Rule Based Systems
    • EM Algorithm
  • Language Modeling
    • N-grams
    • Probabilistic CFGs
  • Basic Speech Processing
    • Phonology and Phonetics
    • Statistical Approach
    • Automatic Speech Recognition and Speech Synthesis
  • Deep Parsing
    • Classical Approaches: Top-Down, Bottom-UP and Hybrid Methods
    • Chart Parsing, Earley Parsing
    • Statistical Approach: Probabilistic Parsing, Tree Bank Corpora
topics proposed to be covered contd
Topics proposed to be covered (contd.)
  • Knowledge Representation and NLP
    • Predicate Calculus, Semantic Net, Frames, Conceptual Dependency, Universal Networking Language (UNL)
  • Lexical Semantics
    • Lexicons, Lexical Networks and Ontology
    • Word Sense Disambiguation
  • Applications
    • Machine Translation
    • IR
    • Summarization
    • Question Answering
grading
Grading
  • Based on
    • Midsem
    • Endsem
    • Assignments
    • Paper-reading/Seminar

Except the first two everything else in groups of 4. Weightages will be revealed soon.

what is nlp
What is NLP
  • Branch of AI
  • 2 Goals
    • Science Goal: Understand the way language operates
    • Engineering Goal: Build systems that analyse and generate language; reduce the man machine gap
the famous turing test language based interaction
The famous Turing Test: Language Based Interaction

Test conductor

Machine

Human

Can the test conductor find out which is the machine and which

the human

inspired eliza
Inspired Eliza
  • http://www.manifestation.com/neurotoys/eliza.php3
what is it question nlp is concerned with grounding
“What is it” question: NLP is concerned with Grounding

Ground the language into perceptual, motor and cognitive capacities.

grounding
Grounding

Chair

Computer

two views of nlp and the associated challenges

Two Views of NLP and the Associated Challenges

Classical View

Statistical/Machine Learning View

stages of processing
Stages of processing
  • Phonetics and phonology
  • Morphology
  • Lexical Analysis
  • Syntactic Analysis
  • Semantic Analysis
  • Pragmatics
  • Discourse
phonetics
Phonetics
  • Processing of speech
  • Challenges
    • Homophones: bank (finance) vs. bank (river

bank)

    • Near Homophones: maatraa vs. maatra (hin)
    • Word Boundary
      • aajaayenge (aa jaayenge (will come) or aaj aayenge (will come today)
      • I got [ua]plate
    • Phrase boundary
      • mtech1 students are especially exhorted to attend as such seminars are integral to one's post-graduate education
    • Disfluency: ah, um, ahem etc.
morphology
Morphology
  • Word formation rules from root words
  • Nouns: Plural (boy-boys); Gender marking (czar-czarina)
  • Verbs: Tense (stretch-stretched); Aspect (e.g. perfective sit-had sat); Modality (e.g. request khaanaa khaaiie)
  • First crucial first step in NLP
  • Languages rich in morphology: e.g., Dravidian, Hungarian, Turkish
  • Languages poor in morphology: Chinese, English
  • Languages with rich morphology have the advantage of easier processing at higher stages of processing
  • A task of interest to computer science: Finite State Machines for Word Morphology
lexical analysis
Lexical Analysis
  • Essentially refers to dictionary access and obtaining the properties of the word

e.g. dog

noun (lexical property)

take-’s’-in-plural (morph property)

animate (semantic property)

4-legged (-do-)

carnivore (-do)

Challenge: Lexical or word sense disambiguation

lexical disambiguation
Lexical Disambiguation

First step: part of Speech Disambiguation

  • Dog as a noun (animal)
  • Dog as a verb (to pursue)

Sense Disambiguation

  • Dog (as animal)
  • Dog (as a very detestable person)

Needs word relationships in a context

  • The chair emphasised the need for adult education

Very common in day to day communications

Satellite Channel Ad: Watch what you want, when you want (two senses of watch)

e.g., Ground breaking ceremony/research

technological developments bring in new terms additional meanings nuances for existing terms
Technological developments bring in new terms, additional meanings/nuances for existing terms
  • Justify as in justify the right margin (word processing context)
  • Xeroxed: a new verb
  • Digital Trace: a new expression
  • Communifaking: pretending to talk on mobile when you are actually not
  • Discomgooglation: anxiety/discomfort at not being able to access internet
  • Helicopter Parenting: over parenting
syntax processing stage
Syntax Processing Stage

Structure Detection

S

VP

NP

V

NP

I

like

mangoes

parsing strategy
Parsing Strategy
  • Driven by grammar
      • S-> NP VP
      • NP-> N | PRON
      • VP-> V NP | V PP
      • N-> Mangoes
      • PRON-> I
      • V-> like
challenges in syntactic processing structural ambiguity
Challenges in Syntactic Processing: Structural Ambiguity
  • Scope

1.The old men and women were taken to safe locations

(old men and women) vs. ((old men) and women)

2. No smoking areas will allow Hookas inside

  • Preposition Phrase Attachment
      • I saw the boy with a telescope

(who has the telescope?)

      • I saw the mountain with a telescope

(world knowledge: mountain cannot be an instrument of seeing)

      • I saw the boy with the pony-tail

(world knowledge: pony-tail cannot be an instrument of seeing)

Very ubiquitous: newspaper headline “20 years later, BMC pays father 20 lakhs for causing son’s death”

structural ambiguity
Structural Ambiguity…
  • Overheard
    • I did not know my PDA had a phone for 3 months
  • An actual sentence in the newspaper
    • The camera man shot the man with the gun when he was near Tendulkar
  • (P.G. Wodehouse, Ring in Jeeves) Jill had rubbed ointment on Mike the Irish Terrier, taken a look at the goldfish belonging to the cook, which had caused anxiety in the kitchen by refusing its ant’s eggs…
  • (Times of India, 26/2/08) Aid for kins of cops killed in terrorist attacks
headache for parsing garden path sentences
Headache for Parsing: Garden Path sentences
  • Garden Pathing
    • The horse raced past the garden fell.
    • The old man the boat.
    • Twin Bomb Strike in Baghdad kill 25 (Times of India 05/09/07)
semantic analysis
Semantic Analysis
  • Representation in terms of
      • Predicate calculus/Semantic Nets/Frames/Conceptual Dependencies and Scripts
  • John gave a book to Mary
      • Give action: Agent: John, Object: Book, Recipient: Mary
  • Challenge: ambiguity in semantic role labeling
    • (Eng) Visiting aunts can be a nuisance
    • (Hin) aapko mujhe mithaai khilaanii padegii (ambiguous in Marathi and Bengali too; not in Dravidian languages)
pragmatics
Pragmatics
  • Very hard problem
  • Model user intention
    • Tourist (in a hurry, checking out of the hotel, motioning to the service boy): Boy, go upstairs and see if my sandals are under the divan. Do not be late. I just have 15 minutes to catch the train.
    • Boy (running upstairs and coming back panting): yes sir, they are there.
  • World knowledge
    • WHY INDIA NEEDS A SECOND OCTOBER (ToI, 2/10/07)
discourse
Discourse

Processing of sequence of sentences

Mother to John:

John go to school. It is open today. Should you bunk? Father will be very angry.

Ambiguity of open

bunk what?

Why will the father be angry?

Complex chain of reasoning and application of world knowledge

Ambiguity of father

father as parent

or

father as headmaster

complexity of connected text
Complexity of Connected Text

John was returning from school dejected – today was the math test

He couldn’t control the class

Teacher shouldn’t have made him

responsible

After all he is just a janitor

a look at textual humour
A look at Textual Humour
  • Teacher (angrily): did you miss the class yesterday?Student: not much
  • A man coming back to his parked car sees the sticker "Parking fine". He goes and thanks the policeman for appreciating his parking skill.
  • Son: mother, I broke the neighbour's lamp shade.Mother: then we have to give them a new one.Son: no need, aunty said the lamp shade is irreplaceable.
  • Ram: I got a Jaguar car for my unemployed youngest son.Shyam: That's  a great exchange!
  • Shane Warne should bowl maiden overs, instead of bowling maidens over
giving a flavour of what is done in nlp structure disambiguation

Giving a flavour of what is done in NLP: Structure Disambiguation

Scope, Clause and Preposition/Postpositon

structure disambiguation is as critical as sense disambiguation
Structure Disambiguation is as critical as Sense Disambiguation
  • Scope (portion of text in the scope of a modifier)
    • Old men and women will be taken to safe locations
    • No smoking areas allow hookas inside
  • Clause
    • I told the child that I liked that he came to the game on time
  • Preposition
    • I saw the boy with a telescope
structure disambiguation is as critical as sense disambiguation contd
Structure Disambiguation is as critical as Sense Disambiguation (contd.)
  • Semantic role
    • Visiting aunts can be a nuisance
    • Mujheaapkomithaaikhilaanipadegii (“I have to give you sweets” or “You have to give me sweets”)
  • Postposition
    • unhoneteji se bhaaagte hue chorkopakadliyaa (“he caught the thief that was running fast” or “he ran fast and caught the thief”)

All these ambiguities lead to the construction of multiple parse trees for each sentence and need semantic, pragmatic and discourse cues for disambiguation

higher level knowledge needed for disambiguation
Higher level knowledge needed for disambiguation

Semantics

I saw the boy with a pony tail (pony tail cannot be an instrument of seeing)

Pragmatics

((old men) and women) as opposed to (old men and women) in “Old men and women were taken to safe location”, since women- both and young and old- were very likely taken to safe locations

Discourse:

No smoking areas allow hookas inside, except the one in Hotel Grand.

No smoking areas allow hookas inside, but not cigars.

problem definition
Problem definition
  • 4-tuples of the form V N1 P N2
    • saw (V) boys (N1) with (P) telescopes (N2)
  • Attachment choice is between the matrix verb V and the object noun N1
lexical association table hindle and rooth 1991 and 1993
Lexical Association Table (Hindle and Rooth, 1991 and 1993)
  • From a large corpus of parsed text
    • first find all noun phrase heads
    • then record the verb (if any) that precedes the head
    • and the preposition (if any) that follows it
    • as well as some other syntactic information about the sentence.
  • Extract attachment information from this table of co-occurrences
example lexical association
Example: lexical association
  • A table entry is considered a definite instance of the prepositional phrase attaching to the verb if:
    • the verb definitely licenses the prepositional phrase
  • E.g. from Propbank,
    • absolve
    • frames
    • absolve.XX: NP-ARG0 NP-ARG2-of obj-ARG1 1
    • absolve.XX NP-ARG0 NP-ARG2-of obj-ARG1
    • On Friday , the firms filed a suit *ICH*-1 against West Virginia in New York state court asking for [ARG0 a declaratory judgment] [relabsolving] [ARG1 them] of [ARG2-of liability] .
core steps
Core steps

Seven different procedures for deciding whether a table entry is an instance of no attachment, sure noun attach, sure verb attach, or ambiguous attach

able to extract frequency information, counting the number of times a particular verb or noun attaches with a particular preposition

core steps contd
Core steps (contd.)
  • These frequencies serve as the training data for the statistical model used to predict correct attachment
  • To disambiguate a sentence, compute the likelihood of the particular preposition given the particular verb and contrast with the likelihood of the preposition given the particular noun
    • i.e., compare P(with|saw) with P(with|telescope) as in I saw the boy with a telescope
critique
Critique

Limited by the number of relationships in the training corpora

Too large a parameter space

Model acquired during training is represented in a huge table of probabilities, precluding any straightforward analysis of its workings

approach based on transformation based error driven learning brill and resnick coling 1994
Approach based on Transformation Based Error Driven Learning, Brill and Resnick, COLING 1994
example transformations
Example Transformations

Initial attach-

ments by default

are to N1 pre-

dominantly.

transformation rules with word classes
Transformation rules with word classes

Wordnet synsets

and

Semantic classes

used

maximum entropy based approach ratnaparki reyner roukos 1994
Maximum Entropy Based Approach: (Ratnaparki, Reyner, Roukos, 1994)

Use more features than (V N1) bigram and (N1 P) bigram

Apply Maximum Entropy Principle

core formulation
Core formulation

We denote

the partially parsed verb phrase, i.e., the verb phrase without the attachment decision, as a history h, and

the conditional probability of an attachment as P(d|h),

where d and corresponds to a noun or verb attachment- 0 or 1- respectively.

features
Features

Two types of binary-valued questions:

Questions about the presence of any n-gram of the four head words, e.g., a bigram maybe V == ‘‘is’’, P == ‘‘of’’

Features comprised solely of questions on words are denoted as “word” features

features contd
Features (contd.)

Questions that involve the class membership of a head word

Binary hierarchy of classes derived by mutual information

features contd1
Features (contd.)

Given a binary class hierarchy,

we can associate a bit string with every word in the vocabulary

Then, by querying the value of certain bit positions we can construct

binary questions.

Features comprised solely of questions about class bits are denoted as “class” features, and features containing questions about both class bits and words are denoted as “mixed” features.

back off model based approach collins and brooks 1995
Back-off model based approach (Collins and Brooks, 1995)

NP-attach:

(joined ((the board) (as a non executive director)))

VP-attach:

((joined (the board)) (as a non executive director))

Correspondingly,

NP-attach:

1 joined board as director

VP-attach:

0 joined board as director

Quintuple of (attachment: A: 0/1, V, N1, P, N2)

5 random variables

probabilistic formulation
Probabilistic formulation

Or briefly,

If

Then the attachment is to the noun, else to the verb

the back off estimate
The Back-off estimate
  • Inspired by speech recognition
  • Prediction of the Nth word from previous (N-1) words

Data sparsity problem

f(w1, w2, w3,…wn) will frequently be 0 for large values on n

back off estimate contd
Back-off estimate contd.

The cut off frequencies (c1, c2 ....) are thresholds determining whether to back-off or not at each level-

counts lower than ci at stage i are deemed to be too low to give an accurate estimate, so in this case

backing-off continues.

back off for ppt attachment
Back off for PPT attachment

Note: the back off tuples always retain the preposition

lower and upper bounds on performance
Lower and upper bounds on performance

Lower bound

(most frequent)

Upper bound

(human experts

Looking at 4 word

only)

comparison with other systems
Comparison with other systems

Maxent,

Ratnaparkhi et. al.

Transformation

Learning,

Brill et. al.

slide72
Flexible Unsupervised PP Attachment using WSD and Data Sparsity Reduction: (Medimi Srinivas and Pushpak Bhattacharyya, IJCAI 2007)

Unsupervised approach (some way similar to Ratnaparkhi 1998): The training data is extracted from raw text

The unambiguous training data of the form V-P-N and N1-P-N2 TEACH the system how to resolve PP-attachment in ambiguous test data V-N1-P-N2

Refinement of extracted training data. And use of N2 in PP-attachment resolution process.

slide73
Flexible Unsupervised PP Attachment using WSD and Data Sparsity Reduction: (Medimi Srinivas and Pushpak Bhattacharyya, IJCAI 2007)

PP-attachment is determined by the semantic property of lexical items in the context of preposition using WordNet

An Iterative Graph based unsupervised approach is used for Word Sense disambiguation (Similar to Mihalcea 2005)

Use of a Data sparseness Reduction (DSR) Process which uses lemmatization, Synset replacement and a form of inferencing. DSRP uses WordNet.

Flexible use of WSD and DSR processes for PP-Attachment

experimental setup
Experimental setup

Training Data:

Brown corpus (raw text). Corpus size is 6 MB, consists of 51763 sentences, nearly 1 million 27 thousand words.

Most frequent Prepositions in the syntactic context N1-P-N2: of, in, for, to, with, on, at, from, by

Most frequent Prepositions in the syntactic context V-P-N: in, to, by, with, on, for, from, at, of

The Extracted unambiguous N1-P-N2: 54030 and V-P-N: 22362

Test Data:

Penn Treebank Wall Street Journal (WSJ) data extracted by Ratnaparkhi

It consists of V-N1-P-N2 tuples: 20801(training), 4039(development) and 3097(Test)

experimental setup contd
Experimental setup contd.

BaseLine:

The unsupervised approach by Ratnaparkhi, 1998 (Base-RP).

Preprocessing:

Upper case to lower case

Any four digit number less than 2100 as a year

Any other number or % signs are converted to num

Experiments are performed using DSRP: with different stages of DSRP

Experiments are performed using GuWSD and DSRP: with different senses

data sparsity reduction inferencing
Data Sparsity Reduction: Inferencing

If

V1-P-N1 and V2-P-N1 exist as also do V1-P- N2 and V2-P-N2, then

if

V3-P-Ni exist (i=1,2), then

we can infer the existence of V3-P-NJ (i ≠ j) with a frequency count of V3-P-Ni that can be added to the corpus.

example of dsr by inferencing
Example of DSR by inferencing

V1-P-N1: play in garden and V2-P-N1: sit in garden

V1-P-N2: play in house and V2-P-N2: sit in house

V3-P-N2: jump in house exists

Infer the existence of V3-P-N1: jump in garden