query segmentation and structured annotation via nlp n.
Skip this Video
Loading SlideShow in 5 Seconds..
Query Segmentation and Structured Annotation via NLP PowerPoint Presentation
Download Presentation
Query Segmentation and Structured Annotation via NLP

Loading in 2 Seconds...

  share
play fullscreen
1 / 7
Download Presentation

Query Segmentation and Structured Annotation via NLP - PowerPoint PPT Presentation

carrington
132 Views
Download Presentation

Query Segmentation and Structured Annotation via NLP

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Query Segmentation and Structured Annotation via NLP Rifat Reza Joye Panagiotis Papadimitriou

  2. Problem • Caloricious.com: • Semantic search engine for food items • Free-text queries over structured data • Query: gluten free high protein bars • Data: Each food item is database record with attributes name, brand, category, nutrients, allergens, .. • Query segmentation and structured annotation gluten free high protein bars ALLERGEN NUTRIENT CATEGORY

  3. 1st ApproachMEMM with Synthetic Training Data • Seems as instance of NER • Problem: No labeled queries to train MEMM • Solution: Generate synthetic labeled queries • Query study in 100 queries • 96% queries contain 1–3 segments. • One of the segments in 98% queries refers to Name or Category or Brand • Algorithm • Pick a food item at random • Pick 1-3 attributes and generate a query

  4. 2nd ApproachSegmentation & MaxEnt Classification Query Segmentation Segment Annotation Annotate each segment with an attribute using MaxEnt classifier Training: For each attribute training examples come from the corresponding entries of database products • Train language model on structured data text • Use model to find segment probabilities • Find the ML segmentation through DP gluten free high protein bars gluten free high protein bars

  5. Results

  6. Conclusions – Future Work • Combination of Language Model, Dynamic Programming and MaxEnt classification provides very good accuracy without labeled data • It would be interesting to compare with NER on a big labeled set • We also plan to compare with the state-of-the art algorithm in the context of a research submission.

  7. More Results… • Evangelos • March 12, 2011 @ 9.14am • 19.5 inches • 6lbs 11oz