Extracting local understandings from user generated reviews on city guide websites l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 12

Extracting Local Understandings from User-Generated Reviews on City Guide Websites PowerPoint PPT Presentation


  • 224 Views
  • Uploaded on
  • Presentation posted in: Shopping

Extracting Local Understandings from User-Generated Reviews on City Guide Websites Andrea Moed IS256 Applied Natural Language Processing Professor Marti Hearst December 6, 2006 Overview Motivations Corpus Processing Nickname discovery Ongoing experiments Attraction extraction

Download Presentation

Extracting Local Understandings from User-Generated Reviews on City Guide Websites

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Extracting local understandings from user generated reviews on city guide websites l.jpg

Extracting Local Understandings from User-Generated Reviews on City Guide Websites

Andrea Moed

IS256 Applied Natural Language Processing

Professor Marti Hearst

December 6, 2006


Overview l.jpg

Overview

  • Motivations

  • Corpus

  • Processing

  • Nickname discovery

  • Ongoing experiments

    • Attraction extraction

    • Review classification

  • Future work

Andrea Moed | IS56 ANLP


Motivations l.jpg

Motivations

  • Local knowledge of well-known places… for locals

    • “Nobody goes there anymore, it’s too crowded”

    • Major draws (views, dishes, people…)

    • Best times/seasons/modes of transport?

    • Places to combine in one excursion

  • “A good place for X” vs. a Great Good Place*

    • *Ray Oldenburg, The Great Good Place: Cafes, Coffee Shops, Bookstores, Bars, Hair Salons, and Other Hangouts at the Heart of a Community, 1999

Andrea Moed | IS56 ANLP


Corpus l.jpg

Corpus

  • Yelp San Francisco

    • Social site organized around cities, launched 2004

    • Thousands of SF places, reviews and reviewers

    • Largely local interest (Mass Media, Pets)

    • Some areas useful for visitors (Night Life, Shopping)

    • Writerly culture high structural and stylistic variation in the text

  • Categories: Restaurants, Night Life, Shopping, Active Life, Local Flavor

    • Destinations

    • Frequently reviewed places: 20+ reviews

Andrea Moed | IS56 ANLP


Processing l.jpg

Processing

  • Used Dappit to build page scrapers

  • Generated XML; parsed in Python

    • Place objects consisting of location info + reviews

    • Corpus collects place objects from various categories

  • Challenges of screen scraping

    • Tradeoff between more places and places with most reviews (optimization requires exhaustive search)

    • TripAdvisor proved too difficult

  • Analysis with Python and NLTK Lite

Andrea Moed | IS56 ANLP


Place nickname discovery l.jpg

Place Nickname Discovery

  • Goal: Discover alternate search terms to surface more diverse local results in web search

  • Method: Regular expression matching

Andrea Moed | IS56 ANLP


Place nickname discovery7 l.jpg

Place Nickname Discovery

  • Steps

    • Counted frequency of Yelp-given place name in reviews of that place

    • Tokenized name on whitespace

    • Rule-based generation of candidate nicknames: acronym, subsets of tokens

    • Compared frequencies of given name and each nickname

    • Potentially useful nicknames are those that occur at least half as often as the given name

Andrea Moed | IS56 ANLP


Place nickname discovery8 l.jpg

Place Nickname Discovery

  • Results

    • From 61 places (Restaurants, Active Life, Local Flavor), 38 reviews each

    • 23 of 61 places appeared to have frequently used nicknames

    • BUT in 9 cases this was due to common words in names

    • First word most commonly used nickname in remaining cases

    • Hypothesis: Long tail of less predictable nicknames

Andrea Moed | IS56 ANLP


Ongoing work l.jpg

Ongoing Work

  • Attraction extraction

    • TF/IDF calculation to find the concepts most widely associated with a place

    • Further text analysis to collect understandings of key concepts

      • Specificity

      • Sentiment

      • Temporality

Andrea Moed | IS56 ANLP


Ongoing work10 l.jpg

Ongoing Work

  • Attraction extraction

    • TF/IDF calculation to find the concepts most widely associated with a place

    • Further text analysis to collect understandings around key concepts

      • Specificity

      • Sentiment

      • Temporality

Andrea Moed | IS56 ANLP


Ongoing work11 l.jpg

Ongoing Work

  • Classification of reviews: recommendation vs. narrative

    • Recommendations help people “use” a city

    • Narrative is associated with memorable and unique locations

  • Features for classification

    • Verb tense distribution

    • Paragraph breaks

    • Opinion words at beginning and end (recommendation)

    • Memory and relationship words (narrative)

Andrea Moed | IS56 ANLP


Future work l.jpg

Future Work

  • Relating understanding about location features to external data (geocoding, weather)

  • Visualization of extracted concepts

  • Development of a training set for classification

Andrea Moed | IS56 ANLP


  • Login