Lecture 5 annotating things list comprehensions
Download
1 / 28

Lecture 5: Annotating Things & List Comprehensions - PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on

Lecture 5: Annotating Things & List Comprehensions. Methods in Computational Linguistics II Queens College. Linguistic Annotation. Text only takes us so far. People are reliable judges of linguistic behavior.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Lecture 5: Annotating Things & List Comprehensions' - isanne


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Lecture 5 annotating things list comprehensions
Lecture 5: Annotating Things& List Comprehensions

Methods in Computational Linguistics II

Queens College


Linguistic annotation
Linguistic Annotation

Text only takes us so far.

People are reliable judges of linguistic behavior.

We can model with machines, but for “gold-standard” truth, we ask people to make judgments about linguistic qualities.


Example linguistic annotations
Example Linguistic Annotations

Sentence Boundaries

Part of Speech Tags

Phonetic Transcription

Syntactic parse trees

Speaker Identity

Semantic Role

Speech Act

Document Topic

Argument structure

Word Sense

many many many more


We need
We need…

Techniques to process these.

Every corpus has its own format for linguistic annotation.

so…we need to parse annotation formats.


The/DET Dog/NN is/VB fast/JJ ./.

<word ortho=“The” pos=“DET”></word>

<word ortho=“Dog” pos=“NN”></word>

<word ortho=“is” pos=“VB”></word>

<word ortho=“fast” pos=“JJ”></word>

The dog is fast.

1, 3, DET

5, 7, NN

9,10, VB

12,15, JJ

16, 16, .


Constructing a linguistic corpus
Constructing a linguistic corpus

  • Decisions that need to be made:

    • Why are you doing this?

    • What material will be collected?

    • How will it be collected?

      • Automatically?

      • Manually?

      • Found material vs. laboratory language?

    • What meta information will be stored?

    • What manual annotations are required?

      • How will each annotation be defined?

      • How many annotators will be used?

      • How will agreement be assessed?

      • How will disagreements be resolved?

    • How will the material be disseminated?

      • Is this covered by your IRB if the material is the result of a human subject protocol?


Part of speech tagging
Part of Speech Tagging

Task: Given a string of words, identify the parts of speech for each word.


Part of speech tagging1
Part of Speech tagging

Surface level syntax.

Primary operation

Parsing

Word Sense Disambiguation

Semantic Role labeling

Segmentation

Discourse, Topic, Sentence


How is it done
How is it done?

Learn from Data.

Annotated Data:

Unlabeled Data:



Limitations
Limitations

Unseen tokens

Uncommon interpretations

Long term dependencies


Parsing
Parsing

Generate a parse tree.


Parsing1
Parsing

Generate a Parse Tree from:

The surface form (words) of the text

Part of Speech Tokens




Context free grammars for parsing
Context Free Grammars for Parsing

S → VP

S →NP VP

NP → Det Nom

Nom → Noun

Nom → Adj Nom

VP → Verb Nom

Det → “A”, “The”

Noun → “I”, “John”, “Address”

Verb → “Gave”

Adj → “My”, “Blue”

Adv → “Quickly”


Limitations1
Limitations

The grammar must be built by hand.

Can’t handle ungrammatical sentences.

Can’t resolve ambiguity.


Probabilistic parsing
Probabilistic Parsing

  • Assign each transition a probability

  • Find the parse with the greatest “likelihood”

  • Build a table and count

    • How many times does each transition happen


Segmentation
Segmentation

  • Sentence Segmentation

  • Topic Segmentation

  • Speaker Segmentation

  • Phrase Chunking

    • NP, VP, PP, SubClause, etc.


Split into words
Split into words

sent = “That isn’t the problem, Bob.”

sent.split()

vs.

nltk.word_tokenize(sent)


List comprehensions
List Comprehensions

Compact way to process every item in a list.

[x for x in array]


Methods
Methods

Using the iterating variable, x, methods can be applied.

Their value is stored in the resulting list.

[len(x) for x in array]


Conditionals
Conditionals

Elements from the original list can be omitted from the resulting list, using conditional statements

[x for x in array if len(x) == 3]


Building up
Building up

These can be combined to build up complicated lists

[x.upper() for x in array if len(x) > 3 and x.startswith(‘t’)]


Lists containing lists
Lists Containing Lists

Lists can contain lists

[[a, 1], [b, 2], [d, 4]]

...or tuples

[(a, 1), (b, 2), (d, 4)]

[ [d, d*d] for d in array if d < 4]


Lists within lists are often called 2 d arrays
Lists within lists are often called 2-d arrays

This is another way we store tables.

Similar to nested dictionaries.

a = [[0,1], [1,0]

a[1][1]

a[0][0]


Using multiple lists
Using multiple lists

Multiple lists can be processed simultaneously in a list comprehension

[x*y for x in array1 for y in array2]


Next time
Next Time

  • Word Similarity

    • Wordnet

  • Data structures

    • 2-d arrays.

    • Trees

    • Graphs


ad