Extracting Local Understandings from User-Generated Reviews on City Guide Websites - PowerPoint PPT Presentation

Extracting local understandings from user generated reviews on city guide websites l.jpg
Download
1 / 12

  • 237 Views
  • Uploaded on
  • Presentation posted in: Shopping

Extracting Local Understandings from User-Generated Reviews on City Guide Websites Andrea Moed IS256 Applied Natural Language Processing Professor Marti Hearst December 6, 2006 Overview Motivations Corpus Processing Nickname discovery Ongoing experiments Attraction extraction

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Extracting Local Understandings from User-Generated Reviews on City Guide Websites

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Extracting local understandings from user generated reviews on city guide websites l.jpg

Extracting Local Understandings from User-Generated Reviews on City Guide Websites

Andrea Moed

IS256 Applied Natural Language Processing

Professor Marti Hearst

December 6, 2006


Overview l.jpg

Overview

  • Motivations

  • Corpus

  • Processing

  • Nickname discovery

  • Ongoing experiments

    • Attraction extraction

    • Review classification

  • Future work

Andrea Moed | IS56 ANLP


Motivations l.jpg

Motivations

  • Local knowledge of well-known places… for locals

    • “Nobody goes there anymore, it’s too crowded”

    • Major draws (views, dishes, people…)

    • Best times/seasons/modes of transport?

    • Places to combine in one excursion

  • “A good place for X” vs. a Great Good Place*

    • *Ray Oldenburg, The Great Good Place: Cafes, Coffee Shops, Bookstores, Bars, Hair Salons, and Other Hangouts at the Heart of a Community, 1999

Andrea Moed | IS56 ANLP


Corpus l.jpg

Corpus

  • Yelp San Francisco

    • Social site organized around cities, launched 2004

    • Thousands of SF places, reviews and reviewers

    • Largely local interest (Mass Media, Pets)

    • Some areas useful for visitors (Night Life, Shopping)

    • Writerly culture high structural and stylistic variation in the text

  • Categories: Restaurants, Night Life, Shopping, Active Life, Local Flavor

    • Destinations

    • Frequently reviewed places: 20+ reviews

Andrea Moed | IS56 ANLP


Processing l.jpg

Processing

  • Used Dappit to build page scrapers

  • Generated XML; parsed in Python

    • Place objects consisting of location info + reviews

    • Corpus collects place objects from various categories

  • Challenges of screen scraping

    • Tradeoff between more places and places with most reviews (optimization requires exhaustive search)

    • TripAdvisor proved too difficult

  • Analysis with Python and NLTK Lite

Andrea Moed | IS56 ANLP


Place nickname discovery l.jpg

Place Nickname Discovery

  • Goal: Discover alternate search terms to surface more diverse local results in web search

  • Method: Regular expression matching

Andrea Moed | IS56 ANLP


Place nickname discovery7 l.jpg

Place Nickname Discovery

  • Steps

    • Counted frequency of Yelp-given place name in reviews of that place

    • Tokenized name on whitespace

    • Rule-based generation of candidate nicknames: acronym, subsets of tokens

    • Compared frequencies of given name and each nickname

    • Potentially useful nicknames are those that occur at least half as often as the given name

Andrea Moed | IS56 ANLP


Place nickname discovery8 l.jpg

Place Nickname Discovery

  • Results

    • From 61 places (Restaurants, Active Life, Local Flavor), 38 reviews each

    • 23 of 61 places appeared to have frequently used nicknames

    • BUT in 9 cases this was due to common words in names

    • First word most commonly used nickname in remaining cases

    • Hypothesis: Long tail of less predictable nicknames

Andrea Moed | IS56 ANLP


Ongoing work l.jpg

Ongoing Work

  • Attraction extraction

    • TF/IDF calculation to find the concepts most widely associated with a place

    • Further text analysis to collect understandings of key concepts

      • Specificity

      • Sentiment

      • Temporality

Andrea Moed | IS56 ANLP


Ongoing work10 l.jpg

Ongoing Work

  • Attraction extraction

    • TF/IDF calculation to find the concepts most widely associated with a place

    • Further text analysis to collect understandings around key concepts

      • Specificity

      • Sentiment

      • Temporality

Andrea Moed | IS56 ANLP


Ongoing work11 l.jpg

Ongoing Work

  • Classification of reviews: recommendation vs. narrative

    • Recommendations help people “use” a city

    • Narrative is associated with memorable and unique locations

  • Features for classification

    • Verb tense distribution

    • Paragraph breaks

    • Opinion words at beginning and end (recommendation)

    • Memory and relationship words (narrative)

Andrea Moed | IS56 ANLP


Future work l.jpg

Future Work

  • Relating understanding about location features to external data (geocoding, weather)

  • Visualization of extracted concepts

  • Development of a training set for classification

Andrea Moed | IS56 ANLP


  • Login