Lecture 5 annotating things list comprehensions
This presentation is the property of its rightful owner.
Sponsored Links
1 / 28

Lecture 5: Annotating Things & List Comprehensions PowerPoint PPT Presentation


  • 53 Views
  • Uploaded on
  • Presentation posted in: General

Lecture 5: Annotating Things & List Comprehensions. Methods in Computational Linguistics II Queens College. Linguistic Annotation. Text only takes us so far. People are reliable judges of linguistic behavior.

Download Presentation

Lecture 5: Annotating Things & List Comprehensions

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Lecture 5 annotating things list comprehensions

Lecture 5: Annotating Things& List Comprehensions

Methods in Computational Linguistics II

Queens College


Linguistic annotation

Linguistic Annotation

Text only takes us so far.

People are reliable judges of linguistic behavior.

We can model with machines, but for “gold-standard” truth, we ask people to make judgments about linguistic qualities.


Example linguistic annotations

Example Linguistic Annotations

Sentence Boundaries

Part of Speech Tags

Phonetic Transcription

Syntactic parse trees

Speaker Identity

Semantic Role

Speech Act

Document Topic

Argument structure

Word Sense

many many many more


We need

We need…

Techniques to process these.

Every corpus has its own format for linguistic annotation.

so…we need to parse annotation formats.


Lecture 5 annotating things list comprehensions

The/DET Dog/NN is/VB fast/JJ ./.

<word ortho=“The” pos=“DET”></word>

<word ortho=“Dog” pos=“NN”></word>

<word ortho=“is” pos=“VB”></word>

<word ortho=“fast” pos=“JJ”></word>

The dog is fast.

1, 3, DET

5, 7, NN

9,10, VB

12,15, JJ

16, 16, .


Constructing a linguistic corpus

Constructing a linguistic corpus

  • Decisions that need to be made:

    • Why are you doing this?

    • What material will be collected?

    • How will it be collected?

      • Automatically?

      • Manually?

      • Found material vs. laboratory language?

    • What meta information will be stored?

    • What manual annotations are required?

      • How will each annotation be defined?

      • How many annotators will be used?

      • How will agreement be assessed?

      • How will disagreements be resolved?

    • How will the material be disseminated?

      • Is this covered by your IRB if the material is the result of a human subject protocol?


Part of speech tagging

Part of Speech Tagging

Task: Given a string of words, identify the parts of speech for each word.


Part of speech tagging1

Part of Speech tagging

Surface level syntax.

Primary operation

Parsing

Word Sense Disambiguation

Semantic Role labeling

Segmentation

Discourse, Topic, Sentence


How is it done

How is it done?

Learn from Data.

Annotated Data:

Unlabeled Data:


Learn the association from tag to word

Learn the association from Tag to Word


Limitations

Limitations

Unseen tokens

Uncommon interpretations

Long term dependencies


Parsing

Parsing

Generate a parse tree.


Parsing1

Parsing

Generate a Parse Tree from:

The surface form (words) of the text

Part of Speech Tokens


Parsing styles

Parsing Styles


Parsing styles1

Parsing styles


Context free grammars for parsing

Context Free Grammars for Parsing

S → VP

S →NP VP

NP → Det Nom

Nom → Noun

Nom → Adj Nom

VP → Verb Nom

Det → “A”, “The”

Noun → “I”, “John”, “Address”

Verb → “Gave”

Adj → “My”, “Blue”

Adv → “Quickly”


Limitations1

Limitations

The grammar must be built by hand.

Can’t handle ungrammatical sentences.

Can’t resolve ambiguity.


Probabilistic parsing

Probabilistic Parsing

  • Assign each transition a probability

  • Find the parse with the greatest “likelihood”

  • Build a table and count

    • How many times does each transition happen


Segmentation

Segmentation

  • Sentence Segmentation

  • Topic Segmentation

  • Speaker Segmentation

  • Phrase Chunking

    • NP, VP, PP, SubClause, etc.


Split into words

Split into words

sent = “That isn’t the problem, Bob.”

sent.split()

vs.

nltk.word_tokenize(sent)


List comprehensions

List Comprehensions

Compact way to process every item in a list.

[x for x in array]


Methods

Methods

Using the iterating variable, x, methods can be applied.

Their value is stored in the resulting list.

[len(x) for x in array]


Conditionals

Conditionals

Elements from the original list can be omitted from the resulting list, using conditional statements

[x for x in array if len(x) == 3]


Building up

Building up

These can be combined to build up complicated lists

[x.upper() for x in array if len(x) > 3 and x.startswith(‘t’)]


Lists containing lists

Lists Containing Lists

Lists can contain lists

[[a, 1], [b, 2], [d, 4]]

...or tuples

[(a, 1), (b, 2), (d, 4)]

[ [d, d*d] for d in array if d < 4]


Lists within lists are often called 2 d arrays

Lists within lists are often called 2-d arrays

This is another way we store tables.

Similar to nested dictionaries.

a = [[0,1], [1,0]

a[1][1]

a[0][0]


Using multiple lists

Using multiple lists

Multiple lists can be processed simultaneously in a list comprehension

[x*y for x in array1 for y in array2]


Next time

Next Time

  • Word Similarity

    • Wordnet

  • Data structures

    • 2-d arrays.

    • Trees

    • Graphs


  • Login