Lecture 5: Annotating Things & List Comprehensions. Methods in Computational Linguistics II Queens College. Linguistic Annotation. Text only takes us so far. People are reliable judges of linguistic behavior.
Methods in Computational Linguistics II
Text only takes us so far.
People are reliable judges of linguistic behavior.
We can model with machines, but for “gold-standard” truth, we ask people to make judgments about linguistic qualities.
Part of Speech Tags
Syntactic parse trees
many many many more
Techniques to process these.
Every corpus has its own format for linguistic annotation.
so…we need to parse annotation formats.
<word ortho=“The” pos=“DET”></word>
<word ortho=“Dog” pos=“NN”></word>
<word ortho=“is” pos=“VB”></word>
<word ortho=“fast” pos=“JJ”></word>
The dog is fast.
1, 3, DET
5, 7, NN
16, 16, .
Task: Given a string of words, identify the parts of speech for each word.
Surface level syntax.
Word Sense Disambiguation
Semantic Role labeling
Discourse, Topic, Sentence
Learn from Data.
Long term dependencies
Generate a parse tree.
Generate a Parse Tree from:
The surface form (words) of the text
Part of Speech Tokens
S → VP
S →NP VP
NP → Det Nom
Nom → Noun
Nom → Adj Nom
VP → Verb Nom
Det → “A”, “The”
Noun → “I”, “John”, “Address”
Verb → “Gave”
Adj → “My”, “Blue”
Adv → “Quickly”
The grammar must be built by hand.
Can’t handle ungrammatical sentences.
Can’t resolve ambiguity.
sent = “That isn’t the problem, Bob.”
Compact way to process every item in a list.
[x for x in array]
Using the iterating variable, x, methods can be applied.
Their value is stored in the resulting list.
[len(x) for x in array]
Elements from the original list can be omitted from the resulting list, using conditional statements
[x for x in array if len(x) == 3]
These can be combined to build up complicated lists
[x.upper() for x in array if len(x) > 3 and x.startswith(‘t’)]
Lists can contain lists
[[a, 1], [b, 2], [d, 4]]
[(a, 1), (b, 2), (d, 4)]
[ [d, d*d] for d in array if d < 4]
This is another way we store tables.
Similar to nested dictionaries.
a = [[0,1], [1,0]
Multiple lists can be processed simultaneously in a list comprehension
[x*y for x in array1 for y in array2]