1 / 41

SMS-Based Web Search for Low-end Mobile Devices

SMS-Based Web Search for Low-end Mobile Devices. Eric Brewer University of California. Lakshmi Subramanian New York University. Jay Chen New York University. -------- XinMiao Wu 2011-05-11. Outline. What the authors address Introduction Related Work SMSFind Problems

jerod
Download Presentation

SMS-Based Web Search for Low-end Mobile Devices

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SMS-Based Web Search for Low-end Mobile Devices Eric Brewer University of California Lakshmi Subramanian New York University Jay Chen New York University -------- XinMiao Wu 2011-05-11

  2. Outline • What the authors address • Introduction • Related Work • SMSFind Problems • SMSFind Search Algorithm • Implementation • Evaluation • Discussion • Conclusion

  3. Explanation • SMS • Short Messaging Service • 140 bytes limited • SMS-Based Web Search • Not via XHTML/WAP • Just uses SMS Service

  4. Conventional SMS-Based Web Search …………… …………… ……………. invoke 2 1 Short message 1.Response1 2. response2 3. response3 4. response4 . . . . . . . . . …………… …………… ……………. response 3 4 SMS Server …………… …………… ……………. User …………… …………… ……………. TOP N search response Short messages Search Engine

  5. What the authors address …………… …………… ……………. invoke 2 1 Short message 1.Response1 2. response2 3. response3 4. response4 . . . . . . . . . …………… …………… ……………. response 140 bytes main Content 3 4 5 SMS Server extract Short message Snippet User TOP N search response Search Engine (SMSFind)

  6. Outline • What the authors address • Introduction • Related Work • SMSFind Problems • SMSFind Search Algorithm • Implementation • Evaluation • Discussion • Conclusion

  7. Why meaningful? • Growth of the mobile phone market • motivated the design of new forms of mobile information services • Growth of Twitter and other social messaging networks • Short-Messaging Service (SMS) based applications and services become popular • Mobile devices in developing regions are still simple low-cost devices • With limited processing and communication capabilities • Voice and SMS will likely continue to remain the primary communication channels

  8. Why SMS-Based Search? • For any SMS-based web service, efficient SMS-based search is an essential building block. vertical (Google SMS and Yahoo! oneSearch) • Existing long tail (ChaCha,JustDial) --- need human being • None of the existing automated SMS search services is a complete solution for search queries across arbitrary topics. ---- Using pre-defined topics, such as “define” or “movies” (e.g. Google SMS: “define boils”)

  9. Difficulties of SMS-Based Search • 140 bytes • Search response time (10 seds ~ several mins) • Small form factor and low bandwidth (Even XHTML/WAP) • Long tail phenomenon • Rarely have the luxury (VS. Desktop) • Ambiguous • Problem: How does a mobile user efficiently search the Web using one round of interaction where the search response is restricted to one SMS message? • SMSFind

  10. Outline • What the authors address • Introduction • Related Work • SMSFind Problems • SMSFind Search Algorithm • Implementation • Evaluation • Discussion • Conclusion

  11. Related Works • Two surveys • First: Need a new mobile search model for low-end mobile devices. • Second: SMS is expected to continue its growth as it is popular, cheap, reliable and private. • Two kinds of SMS search • Vertical: Google , Yahoo! , and Microsoft • Long tail: ChaChaand Just Dial • Automatic Text Summarization • The goal is different

  12. Related Works • The problem that SMSFind seeks to address is similar to: • A question/answering systems (developed by the Text Retreival Conference) • But distinct from: • Unstructured search style queries (simple natural language style) • SMSFind is a snippet extraction and snippet ranking algorithm • The collection of documents being searched over

  13. Outline • What the authors address • Introduction • Related Work • SMSFind Problems • SMSFind Search Algorithm • Implementation • Evaluation • Discussion • Conclusion

  14. Known Verticals vs Long Tail

  15. SMSFind Search Problem • Characterized as follows: Given <query, hint> + the top N search response pages  extract a text snippet as an appropriate search response to the query. Note that: • What is a snippet? • What is the hint?

  16. Outline • What the authors address • Introduction • Related Work • SMSFind Problems • SMSFind Search Algorithm • Implementation • Evaluation • Discussion • Conclusion

  17. Disambiguate query • A common technique: • use additional contextual information from which the search is being conducted. • here we use an explicit hint. • Consider the query : <“Barack Obama wife”, “wife”>.

  18. <“Barack Obama wife”, “wife”> • Most search result pages will contain: • “Michelle” or “Michelle Obama” or “Michelle Robinson” or “Michelle Lavaughn Robinson” within the neighborhood of the word “wife” in the text of the page. • SMSFind will search the neighborhood of the word “wife” in every result page and look for commonly occurring n-grams. • 1<=n<=5. For example, “Michelle Obama” is a 2−gram.

  19. n-grams and snippets • Both represent continuous sequences of words in a document • A n-gram is extremely short in length (1−5 words) • A text snippet is a sequence of words that can fit in a single SMS message • n-grams are used as an intermediate unit • Snippets are used for the final ranking

  20. SMSFind Algorithm • Consider a search query (Q,H) • Q is the search query containing the hint term(s) H. • Let P1, . . . PN represent the textual content of the top N search response pages to Q. • Three steps: Neighborhood Extraction; N-gram Ranking; Snippet Ranking

  21. Neighborhood Extraction

  22. N-gram Ranking

  23. Basic rationale of n-gram ranking algorithm • Any n-gram which satisfies the following three properties is potentially related to the appropriate response: • 1. the n-gram appears very frequently around the hint. • 2. the n-gram appears very close to the hint. • 3. the n-gram is not a commonly used popular term or phrase. • As an example, the n-gram “Michelle Obama”.

  24. Three Metrics • Frequency - The number of times the n-gram occurs across all snippets. • Mean rank – The sum of the PageRanksof every page in which the n-gram occurs, divided by the n-gram’s raw frequency. • MinimumDistanceto the hint.

  25. Should return the response “rainnwilson” Here, freq(s), meanrank(s) and mindist(s) are normalized scores of a n-gram s

  26. Snippet Ranking

  27. Hint Extraction from the Query • 45% of the queries began with the word “what” . • And over 80% of the queries are in standard forms . (e.g. “what is”, “what was”, “what are”, “what do”, “what does”). • The “what is X” pattern . • Example, the hint of “what is a quote by ernesthemingway” is “quote”. (“a” is a stop word )

  28. Outline • What the authors address • Introduction • Related Work • SMSFind Problems • SMSFind Search Algorithm • Implementation • Evaluation • Discussion • Conclusion 8 mins

  29. IMPLEMENTATION • 600 lines of Python code • 1.8Ghz Duo Core Intel PC • 2 GB of RAM • 2 Mbps broadband • A front-end • Setup a SMS short code with a local telco in Kenya

  30. Outline • What the authors address • Introduction • Related Work • SMSFind Problems • SMSFind Search Algorithm • Implementation • Evaluation • Discussion • Conclusion

  31. EVALUATION • How about the query set? • How about the correct answers? • How to judge correct or not? • How about the percentage of verticals? • Can the hint be always got correctly?

  32. Result • SMSFind results in 57.3% correct answers. • While Google SMS results in only 9.5% of these queries.

  33. what do the snippet results actually look like?

  34. What is more interesting? • if remove the vertical queries? • if consider only the highest n-grams returned rather than the entire snippet? • Whether n-grams are necessary or if ranking snippets alone would perform just as well? • How Important is the Hint Term?

  35. Summary of several results

  36. Outline • What the authors address • Introduction • Related Work • SMSFind Problems • SMSFind Search Algorithm • Implementation • Evaluation • Discussion • Conclusion

  37. Difficult Types of Queries • Really ambiguous • Explanations • Enumerations • Analysis • Time sensitive SMSFind can not handle these kinds of queries now!

  38. Outline • What the authors address • Introduction • Related Work • SMSFind Problems • SMSFind Search Algorithm • Implementation • Evaluation • Discussion • Conclusion

  39. CONCLUSION • We have presented SMSFind, an automated SMS-based search response system. • SMSFind can work across arbitrary topics. • We find that a combination of simple Information Retrieval algorithms with existing search engines can provide reasonably accurate search responses for SMS queries. • SMSFind is able to answer 57.3% of the queries in our test set.

  40. Thank you!

More Related