the problem our solution possible class projects n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Dispute Finder PowerPoint Presentation
Download Presentation
Dispute Finder

Loading in 2 Seconds...

play fullscreen
1 / 19

Dispute Finder - PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on

The Problem Our Solution Possible Class Projects. Dispute Finder. Rob Ennals robert.ennals@intel.com . Not everything on the web is true, balanced, and objective. The great thing about the web is that anyone can say whatever they want . The bad thing is that they have.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Dispute Finder' - gada


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the problem our solution possible class projects
The Problem

Our Solution

Possible Class Projects

Dispute Finder

Rob Ennals

robert.ennals@intel.com

not everything on the web is true balanced and objective
Not everything on the web is true, balanced, and objective

The great thing about the web is that anyone can say whatever they want.

The bad thing is that they have.

newspapers are dying
Newspapers are Dying

The end of objective, fact-checked media sources that needed to appeal to an audience with a wide range of views?

false consensus

People like to fit in, so they tend to believe whatever they think is the consensus.

Groups spend a lot of money trying to create false consensus buy sending their message from multiple seemingly independent sources.

False Consensus
dispute finder

Encourage Skepticism by telling a user when information they encounter in their lives is disputed.

Reduce False Consensus by showing people evidence that other points of view are socially acceptable and well justified.

Dispute Finder
highlight disputed claims on the web
Highlight Disputed Claims on the Web

Show Other Points of Viewfrom sources you trust

how dispute finder works
How Dispute Finder Works

Client: Browser Extension

Server: Web site with API

Runs textual entailment NLP algorithm, looking

for known disputed claims.

Stores a user-editable set of disputed claims, paraphrases, etc

Low compute Javascript

Sees pages user browses

Arbitrary computation

Can't see user browsing

project suggestions

Find disputes on the Web

Textual entailment of claims [Beth]

User-guided textual entailment

Duplicate detection for claims

Sentiment Analysis

Whatever else you think of...

Project Suggestions
common tools

Yahoo BOSS: Simple API access to Yahoo Search API. Python Interface is available.

Amazon Mechanical Turk: Upload a CSV file with your data and pay users to mark it.

Our Database of Claims and Paraphrases: Entered by users + Snopes + Politifact

Common Tools
task find disputed claims on the web
Task: Find disputed claims on The Web

Possible approach: Search BOSS for phrases like “falsely claimed that *”

slide13

Where it gets interesting:

Olbermann falsely claimed thatWatters lied when he denied taping the meeting

-Are “Olbermann” and “Watters” ambiguous? Do we need a topic? - Are we sure “he” resolves to “Watters”?

Obama Falsely ClaimsThere Are 47 Million Uninsured Americans

- Do we need to know the time context?

Pleasanton Man Brice Carrington Who Falsely Claimed ThatHe Was A Three-Time Oscar-Winner Pleads Guilty

- Can we resolve “he”? Does this claim matter?

Today in Jordan, he falsely claimed thatthe predominantly Sunni terrorist organization Al-Qaida was receiving training from predominantly Shia Iran

- We want “Al-Qaida was receiving training from Iran” - no extras

The report falsely claimed thatthere were 46 million Americans who lacked health insurance

- We already have this one! Can we detect duplicates? -- see next...

Evaluation/Training: Ask Turk users

task textual entailment of claims
Task: Textual Entailment of Claims

Given a set of web pages and a set of claims, find phrases on

the pages that entail the claims.

Examples for “Cap and trade would cause job losses”:

“<title>Cap and trade...</title> [new paragraph] which results in fewer jobs created or higher unemployment.”

- The subject may not be in the sentence itself

- Can we translate “higher unemployment” to “job losses”?

“a "cap and trade scheme" that "would suppress our economic recovery, cost jobs across our economy, ...”

- There is other text between subject and object

“The claim that cap and trade will create many ‘green collar’ jobs overlooks the massive job losses caused by draconian energy rationing policies”

- There is other text between subject and object

“.. rejects the cap and trade bill... [start bullets] ... “1.2 - 1.8 million jobs lost”

- Big gap between subject and object.

- Needs stemming

Beth is working on this

slide15

The classifier must be simple enough to run in Javascript inside a web browser. However the training can be complex, since it runs on the server.

Possible approaches:

Bag of words + stemming + WordNet synonyms

Use an existing textual entailment tool, to derive simpler rules.

Data Set:

Our database of claims, mined from users and web sites

Pages pulled from Yahoo BOSS

Data sets from the RTE task

Evaluation: Ask Turk users

Beth is working on this

task user guided textual entailment
Task: User-Guided Textual Entailment

Ask the user questions that help us do accurate textual entailment.

What is the minimum set of easy questions that improve coverage the most?

Examples for “global warming does not exist”:

"Global warming is just another scam for the government to think they can control you"

- Does “scam” imply does not exist? Should we ask the user?

“Man-made global warming does not exist”

- Is “man-made” global warming the same as “global warming”?

- Should we ask the user?

“it does not mean that global warmingdoes not exist”

- Are they disagreeing with the statement?

- Should we ask the user?

Possible Approach:

Use BOSS to get a large number of pages about the topic.

Use bag-of-words to cluster likely phrases into common patterns

Ask the user about a minimal example from each cluster

duplicate detection for claims

Given a huge set of claims mined from the web, how do we work out which ones are saying the same thing?

Like textual entailment, except we can run entirely on the server, and the data set is a set of claims, rather than the whole web.

Duplicate Detection for Claims
task sentiment analysis

Some, but not all, claims are largely sentiments. X is good vs X is bad. Can we automatically infer contrasting sentiments about something?

Examples;

“cap and trade will ruin america” : negative

“cap and trade will create jobs” : positive

“the folly of cap and trade” : negative

“cap and trade is essential” : positive

Possible Method:

Pick a topic : e.g. “Cap and Trade”

Find pages that support it and pages that oppose it

Task: Sentiment Analysis
what we can provide

A database of claims and paraphrases

Currently ~2000 claims, user entered + snopes

Politifact + others coming soon

Funding and support for Mechanical Turk tagging

Example web pages to analyse + help with Yahoo BOSS

~6k examples of tagged entailments from web snippets

But: fairly low quality. Many are web snippets that repeat the same phrase.

An interesting problem space :-)

What we can provide