Loading in 2 Seconds...
Loading in 2 Seconds...
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 1 – Introduction). Pushpak Bhattacharyya CSE Dept., IIT Bombay 4 th Jan , 2011. Persons involved. Faculty instructors: Dr. Pushpak Bhattacharyya ( www.cse.iitb.ac.in/~pb )
Pushpak BhattacharyyaCSE Dept., IIT Bombay
4th Jan, 2011
Except the first two everything else in groups of 4. Weightages will be revealed soon.
Can the test conductor find out which is the machine and which
Ground the language into perceptual, motor and cognitive capacities.
Statistical/Machine Learning View
noun (lexical property)
take-’s’-in-plural (morph property)
animate (semantic property)
Challenge: Lexical or word sense disambiguation
First step: part of Speech Disambiguation
Needs word relationships in a context
Very common in day to day communications
Satellite Channel Ad: Watch what you want, when you want (two senses of watch)
e.g., Ground breaking ceremony/research
1.The old men and women were taken to safe locations
(old men and women) vs. ((old men) and women)
2. No smoking areas will allow Hookas inside
(who has the telescope?)
(world knowledge: mountain cannot be an instrument of seeing)
(world knowledge: pony-tail cannot be an instrument of seeing)
Very ubiquitous: newspaper headline “20 years later, BMC pays father 20 lakhs for causing son’s death”
Processing of sequence of sentences
Mother to John:
John go to school. It is open today. Should you bunk? Father will be very angry.
Ambiguity of open
Why will the father be angry?
Complex chain of reasoning and application of world knowledge
Ambiguity of father
father as parent
father as headmaster
John was returning from school dejected – today was the math test
He couldn’t control the class
Teacher shouldn’t have made him
After all he is just a janitor
Scope, Clause and Preposition/Postpositon
All these ambiguities lead to the construction of multiple parse trees for each sentence and need semantic, pragmatic and discourse cues for disambiguation
I saw the boy with a pony tail (pony tail cannot be an instrument of seeing)
((old men) and women) as opposed to (old men and women) in “Old men and women were taken to safe location”, since women- both and young and old- were very likely taken to safe locations
No smoking areas allow hookas inside, except the one in Hotel Grand.
No smoking areas allow hookas inside, but not cigars.
Seven different procedures for deciding whether a table entry is an instance of no attachment, sure noun attach, sure verb attach, or ambiguous attach
able to extract frequency information, counting the number of times a particular verb or noun attaches with a particular preposition
Limited by the number of relationships in the training corpora
Too large a parameter space
Model acquired during training is represented in a huge table of probabilities, precluding any straightforward analysis of its workings
ments by default
are to N1 pre-
Use more features than (V N1) bigram and (N1 P) bigram
Apply Maximum Entropy Principle
the partially parsed verb phrase, i.e., the verb phrase without the attachment decision, as a history h, and
the conditional probability of an attachment as P(d|h),
where d and corresponds to a noun or verb attachment- 0 or 1- respectively.
Two types of binary-valued questions:
Questions about the presence of any n-gram of the four head words, e.g., a bigram maybe V == ‘‘is’’, P == ‘‘of’’
Features comprised solely of questions on words are denoted as “word” features
Questions that involve the class membership of a head word
Binary hierarchy of classes derived by mutual information
Given a binary class hierarchy,
we can associate a bit string with every word in the vocabulary
Then, by querying the value of certain bit positions we can construct
Features comprised solely of questions about class bits are denoted as “class” features, and features containing questions about both class bits and words are denoted as “mixed” features.
(joined ((the board) (as a non executive director)))
((joined (the board)) (as a non executive director))
1 joined board as director
0 joined board as director
Quintuple of (attachment: A: 0/1, V, N1, P, N2)
5 random variables
Then the attachment is to the noun, else to the verb
Data sparsity problem
f(w1, w2, w3,…wn) will frequently be 0 for large values on n
The cut off frequencies (c1, c2 ....) are thresholds determining whether to back-off or not at each level-
counts lower than ci at stage i are deemed to be too low to give an accurate estimate, so in this case
Note: the back off tuples always retain the preposition
Looking at 4 word
Ratnaparkhi et. al.
Brill et. al.
Unsupervised approach (some way similar to Ratnaparkhi 1998): The training data is extracted from raw text
The unambiguous training data of the form V-P-N and N1-P-N2 TEACH the system how to resolve PP-attachment in ambiguous test data V-N1-P-N2
Refinement of extracted training data. And use of N2 in PP-attachment resolution process.
PP-attachment is determined by the semantic property of lexical items in the context of preposition using WordNet
An Iterative Graph based unsupervised approach is used for Word Sense disambiguation (Similar to Mihalcea 2005)
Use of a Data sparseness Reduction (DSR) Process which uses lemmatization, Synset replacement and a form of inferencing. DSRP uses WordNet.
Flexible use of WSD and DSR processes for PP-Attachment
Brown corpus (raw text). Corpus size is 6 MB, consists of 51763 sentences, nearly 1 million 27 thousand words.
Most frequent Prepositions in the syntactic context N1-P-N2: of, in, for, to, with, on, at, from, by
Most frequent Prepositions in the syntactic context V-P-N: in, to, by, with, on, for, from, at, of
The Extracted unambiguous N1-P-N2: 54030 and V-P-N: 22362
Penn Treebank Wall Street Journal (WSJ) data extracted by Ratnaparkhi
It consists of V-N1-P-N2 tuples: 20801(training), 4039(development) and 3097(Test)
The unsupervised approach by Ratnaparkhi, 1998 (Base-RP).
Upper case to lower case
Any four digit number less than 2100 as a year
Any other number or % signs are converted to num
Experiments are performed using DSRP: with different stages of DSRP
Experiments are performed using GuWSD and DSRP: with different senses
V1-P-N1 and V2-P-N1 exist as also do V1-P- N2 and V2-P-N2, then
V3-P-Ni exist (i=1,2), then
we can infer the existence of V3-P-NJ (i ≠ j) with a frequency count of V3-P-Ni that can be added to the corpus.
V1-P-N1: play in garden and V2-P-N1: sit in garden
V1-P-N2: play in house and V2-P-N2: sit in house
V3-P-N2: jump in house exists
Infer the existence of V3-P-N1: jump in garden