Query Segmentation and Structured Annotation via NLP Rifat Reza Joye Panagiotis Papadimitriou
Problem • Caloricious.com: • Semantic search engine for food items • Free-text queries over structured data • Query: gluten free high protein bars • Data: Each food item is database record with attributes name, brand, category, nutrients, allergens, .. • Query segmentation and structured annotation gluten free high protein bars ALLERGEN NUTRIENT CATEGORY
1st ApproachMEMM with Synthetic Training Data • Seems as instance of NER • Problem: No labeled queries to train MEMM • Solution: Generate synthetic labeled queries • Query study in 100 queries • 96% queries contain 1–3 segments. • One of the segments in 98% queries refers to Name or Category or Brand • Algorithm • Pick a food item at random • Pick 1-3 attributes and generate a query
2nd ApproachSegmentation & MaxEnt Classification Query Segmentation Segment Annotation Annotate each segment with an attribute using MaxEnt classifier Training: For each attribute training examples come from the corresponding entries of database products • Train language model on structured data text • Use model to find segment probabilities • Find the ML segmentation through DP gluten free high protein bars gluten free high protein bars
Conclusions – Future Work • Combination of Language Model, Dynamic Programming and MaxEnt classification provides very good accuracy without labeled data • It would be interesting to compare with NER on a big labeled set • We also plan to compare with the state-of-the art algorithm in the context of a research submission.
More Results… • Evangelos • March 12, 2011 @ 9.14am • 19.5 inches • 6lbs 11oz