instructor smaranda muresan columbia university smara@ccls columbia edu n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Instructor: Smaranda Muresan Columbia University smara@ccls.columbia.edu PowerPoint Presentation
Download Presentation
Instructor: Smaranda Muresan Columbia University smara@ccls.columbia.edu

Loading in 2 Seconds...

play fullscreen
1 / 38

Instructor: Smaranda Muresan Columbia University smara@ccls.columbia.edu - PowerPoint PPT Presentation


  • 165 Views
  • Uploaded on

Course Introduction. Instructor: Smaranda Muresan Columbia University smara@ccls.columbia.edu. Natural Language Processing Applications.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Instructor: Smaranda Muresan Columbia University smara@ccls.columbia.edu' - derex


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
instructor smaranda muresan columbia university smara@ccls columbia edu

Course Introduction

Instructor: SmarandaMuresan

Columbia University

smara@ccls.columbia.edu

natural language processing applications
Natural Language Processing Applications

Information Extraction: Identifying the instances of facts names/entities , relations and events from semi-structured or unstructured text; and convert them into structured representations (e.g. databases)

10TH DEGREE is a full service advertising agency specializing in direct and interactive marketing. Located in Irvine CA, 10TH DEGREE is looking for an Assistant Account Manager to help manage and coordinate interactive marketing initiatives for a marquee automative account. Experience in online marketing, automative and/or the advertising field is a plus. Assistant Account Manager Responsibilities Ensures smooth implementation of programs and initiatives Helps manage the delivery of projects and key client deliverables … Compensation: $50,000-\$80,000

question answering ibm s watson
Question Answering IBM’s Watson
  • Won Jeopardy on February 16, 2011

Bram Stoker

slide4

Watson has no discourse understanding

“Watson also tripped up on an “Olympic Oddities” answer, but so imperceptibly that Alex Trebekdidn’t notice at first, raising an important point of clarification. After Jennings responded incorrectly that Olympian gymnast George Eyser was “missing a hand”, Watson responded, “What is a leg?”

http://www.wired.com/business/2011/02/watson-wrong-answer-trebek/

this class
This Class

The journalist William Finnegan has said about his profession (New Yorker, July 2,2012): ``You fish for facts and instead pull up boatloads of speculation, some of it well informed, much of it trailing tangled agendas. You end up reporting not so much what happened as what people think or imagine or say happened.'’

[Thanks Owen Rambow for this reference]

In this class we are interested in understanding communication through the eyes of the authors/speakers.

slide6

http://www.washingtonpost.com/blogs/erik-wemple/post/hurricane-sandy-nyse-not-flooded/2012/10/30/37532512-223d-11e2-ac85-e669876c6a24_blog.htmlhttp://www.washingtonpost.com/blogs/erik-wemple/post/hurricane-sandy-nyse-not-flooded/2012/10/30/37532512-223d-11e2-ac85-e669876c6a24_blog.html

syllabus overview
Syllabus Overview
  • http://www1.cs.columbia.edu/~smara/teaching/E6998/S14/
outline
Outline
  • Instructor Introduction
    • Background, Research Interests
  • Student Introductions
  • Class Overview
    • Class organization
    • Website
    • Office Hours & TA
    • Topics covered in this class
    • Grading
instructor intro
Instructor Intro
  • Researcher at the Center for Computational Learning Systems

http://www1.cs.columbia.edu/~smara

Broad research interests: computational semantics, language in social media

contrary meaning
Contrary meaning
  • Explicit: Conflicting statements/beliefs overtly expressed in text
  • Implicit: Sarcasm

User: I'm so happy I'm going back to the emergency room

User: Newspaper faces court over sleazing Facebook ? Facebook is so defenseless and innocent .

prelim work on sarcasm detection
Prelim work on Sarcasm Detection

(Gonzalez, Muresanand Wacholder, 2011; Muresan et al., underreview)

  • Can we automatically distinguish among sarcastic, positive and negative utterances?
  • Can we easily build a labeled corpus of naturally occurring sarcastic, positive and negative utterances?
how can we distinguish sarcastic pos and negative tweets
How can we distinguish sarcastic, pos, and negative tweets?
  • Lexical Features
    • Pannebacker et al. (2007) LIWC lexicon (64 word categories grouped intofour general classes:
      • Linguistic Processes (LP)(e.g., adverbs, pronouns),
      • Psychological Processes(PP) (e.g., positive and negative emotions)
      • PersonalConcerns (PC) (e.g, work, achievement)
      • Spoken Categories (SC) (e.g., disfluencies);
    • WordNet Affect (WNA) (Strapparavaand Valitutti, 2004)
    • list of interjections(e.g., ah, oh, yeah), and punctuations (e.g., !, ?).
  • We merged all of the lists into asingle dictionary.
  • The token overlap between thewords in combined dictionary and the words in thetweets was 85%.
how can we distinguish sarcastic pos and negative tweets1
How can we distinguish sarcastic, pos, and negative tweets?
  • Pragmatic features
    • Emoticons (, )
    • ToUser (@john)
classification experiments
Classification experiments
  • Several settings
    • S-N-P (900 example each; balanced datasets)
    • S-NS (NS contain 450 negative and 450 positive)
    • S-N (900 example each)
    • S-P (900 example each)
  • 2 classifiers
    • support vector machines (SVM),
    • and logistic regression (LogR).
  • Features used:
    • 1) unigrams;
    • 2) presence of the dictionary-based lexical factors and pragmatic factors (LIWC+_P);
    • 3) frequency of the dictionary-based lexical factors and pragmatic factors (LIWC+_F).
    • 4) combination of unigrams and presence features
human performance on the task
Human performance on the task
  • Two studies
    • 1) we asked 3 judges to classify 10% of our S-P-N datasets (90 randomly selected tweets per category). we also trained our SVM and LogR classifiers using the remaining 90% of the data.
    • 2) we asked another 3 judges to classify 10% of the S-NS dataset (90 per category. The NS category contained 45 positive and 45 negative tweets). We also trained SVM and LogR on the remaining 90% of data
humans on s n p
Humans on S-N-P
  • overall agreement of 50% was achieved among the three judges, with a Fleiss’ Kappa value of 0.4788 (p<.05). The average accuracy was 62.59%
  • When we considered only the 135 of 270 tweets on which all three judges agreed, the accuracy on the set they agreed on was 86.67%. (this can be an upper bound)
humans on s ns
Humans on S-NS
  • Results showed an agreement of 71.67% among the three judges with a Fleiss’ Kappa value of 0.5861 (p<.05). The average accuracy rate was 66.85% .
  • When we considered only cases where all three judges agreed (129 out of 180), the accuracy on the set they agreed on was 82.95%. (this can be un upper bound)
discussion
Discussion
  • Hard task both for Automatic Measures and Humans
  • Some judges reported specific difficulties:
    • Lack of context (e.g., world knowledge; context of conversation)
    • Brevity of messages

Other issues/observations – We will have a whole class on Sarcasm Detection 

detection conflicting information
Detection Conflicting Information
  • Explicit Contrary Meaning

User1: A shooting has just occurred at the Occupy Oakland encampment.

User2: Shootings happen in Oakland all the time and it had nothing to do with the Occupy movement.

User1: This shooting does have something to do with the Occupy movement because many of the witness's are the Occupiers and it happened only a few yards away from the encampment.

User3: On Twitter, Occupy Oakland has said the shooting was "related to the occupation. Please keep this man in your thoughts."

impact
Impact
  • Conflicting statements/beliefs can signal:
    • anomalies in events (e.g., different theories about the cause of an event),
    • anomalies in beliefs (change in beliefs),
    • deception/lying
    • misinformation
    • misconception
recognizing textual entailment rte
Recognizing Textual Entailment (RTE)
  • Given two text fragments – the Text(T) and the Hypothesis(H)– predict whether a human reader would say:
    • That the H is true, given T
    • That the H contradictsT
    • That it can’t be determined whether or not H is true given T

T: John Smith, who was 65, resigned yesterday.

H: 65-year-old Mr. Smith left office.

T: UberSoftCEO Bill Jobs

H: Frank N. Furter is CEO of Ubersoft

approach
Approach

Framed as a 2-way Textual Entailment problem (contradict., non-contradictory).

Assume utterances are about the same topic/event

T: A case of indigenously acquired rabies infection has been confirmed.

H: No case of rabies was confirmed.

3. Contradiction features &classification

1. Linguisticanalysis

2. Graphalignment

case

infection

det

case

prep_of

amod

A

rabies

det

infection

contradicts

0.00

amod

A

rabies

case

0.10

–0.75

1.84

–2.00

tunedthreshold

prep_of

det

case

score = =

rabies

No

prep_of

det

rabies

No

doesn’t

contradict

Event coreference

student introduction
Student Introduction
  • Your education: PhD/Master/Undergrad and year
  • Did you take NLP course?
  • Did you take ML course?
  • Are you doing or have done research in NLP? If yes, briefly say in what area
  • Any other info you want to share with the class?
outline1
Outline
  • Instructor Introduction
    • Background, Research Interests
  • Student Introductions
  • Class Overview
    • Class organization
    • Office Hours & TA
    • Website/details of topics covered in this class
    • Grading
class organization except first two lectures
Class organization (except first two lectures)
  • 50 min discussion of research articles led by students on topic of the week (intro on topic done previous week)
    • There will be 2 papers per class for discussion
    • 25 min each (15 min presentation, 10 min discussion)
  • 5 minutes break
  • 30 minutes in depth lecture/open questions for topic of the week
  • 25 Intro lecture to topic of following week (to facilitate paper discussion)
office hours
Office Hours
  • Instructor Office hours”
    • Thursday 6:00-7:00 (after class) or by appointment if needed
  • TA: Arpit Gupta
  • TA office ours: TBA
class website
Class Website
  • http://www1.cs.columbia.edu/~smara/teaching/E6998/S14/
  • Pay attention to top of page for announcements.
extracting social interactional meaning
Extracting social/interactional meaning
  • Sentiment (Positive or negative)
    • Movie or Products or Politics: is a text positive or negative?

“The movie was great”

- How can we automatically detect sentiment? (word level and text level)

  • Emotion(sad, happy) and Mood (depressed)
    • Detecting expression of emotion/mood in language
    • Applications:
      • Annoyance in talking to dialog systems
      • Uncertainty of students in tutoring
      • Detecting Trauma or Depression
  • Hedging & Beliefs
    • Committed Belief (CB): W/S firmly believes p

“John will arrive at 6”

- Non-committed Belief (NCB): W/S weakly believes p

“John may arrive at 6”

    • Reported Belief (RB): W/S is reporting someone else’s belief

“John said he would arrive at 6”

How can we automatically detect/tag beliefs?

extracting social interactional meaning1
Extracting social/interactional meaning
  • Sarcasm
    • Contrary of people’s actual sentiments or beliefs

“I love shopping on Black Friday”

“A Shooting in Oakland? That NEVER happend”

  • Agreement/Disagreement
    • Agreement vs. disagreement with propositions (and people)
  • Perspective
    • An aggregate of a person’s beliefs and sentiments w.r.t topic/event/proposition
    • How can we detect perspective automatically?
  • Deception
    • Automatic ways to identify deceptive language
extracting social interactional meaning2
Extracting social/interactional meaning
  • Power
    • Different types of power: e.g. hierarchical, influence
    • Applications:
      • Find influential people in online communities or those who want to become influential
      • target ads to influential people in community
  • Extracting Social Networks from text
    • Analyze online discussion and identify who are the people, and how are they related (beyond metadata)
    • Social network of characters from novels
  • Personality and Interpersonal Stance
    • Romantic interest, flirtation, friendliness
grading
Grading
  • Critical Discussion of one of the research articles (40% of grade)
    • Brief Presentation in class about the paper
    • Lead a critical discussion on key positive and negative aspects
    • Full list of papers up by Tuesday, Jan 28 11:59pm.
    • Students select their top 5 papers before class on Jan 30.
    • TA/Instructor assigns papers based on preference and in case of conflict first-come-first served, by Feb 1 5pm.
  • Project about a topic discussed in class (or related) (60%)
    • Computational Implementation
    • Can be individual or team of 2-3
    • Project Proposal (5thweek of classes; receive feedback by week 6)
    • Literature review on the chosen topic (9th week of classes; receive feedback by week 10 )
    • Final paper – conference/workshop format (8 pages) (last week of classes)
    • Final project presentation (last week of classes)
next class
Next Class
  • Computational Models for learning semantic lexicons
  • 2 papers for reading/discussion
    • I will lead the discussion of one of the research articles to set up a model of what’s expected
    • Second paper will be free discussion (unless there is a volunteer to present on of the papers )
resources
Resources
  • ACL anthology
    • All the proceedings of main conferences in NLP as well as major journals.
    • http://aclweb.org/anthology/

(Recent years authors are encourage to submit datasets and code)

  • Linguistic Data Consortium
    • Annotated corpora
    • http://catalog.ldc.upenn.edu/

(If interested to have access to some corpora for your project ask Instructor, most likely we have it)