SMS-Based Web Search for Low-end Mobile Devices

SMS-Based Web Search for Low-end Mobile Devices Eric Brewer University of California Lakshmi Subramanian New York University Jay Chen New York University -------- XinMiao Wu 2011-05-11

Outline • What the authors address • Introduction • Related Work • SMSFind Problems • SMSFind Search Algorithm • Implementation • Evaluation • Discussion • Conclusion

Explanation • SMS • Short Messaging Service • 140 bytes limited • SMS-Based Web Search • Not via XHTML/WAP • Just uses SMS Service

Conventional SMS-Based Web Search …………… …………… ……………. invoke 2 1 Short message 1.Response1 2. response2 3. response3 4. response4 . . . . . . . . . …………… …………… ……………. response 3 4 SMS Server …………… …………… ……………. User …………… …………… ……………. TOP N search response Short messages Search Engine

What the authors address …………… …………… ……………. invoke 2 1 Short message 1.Response1 2. response2 3. response3 4. response4 . . . . . . . . . …………… …………… ……………. response 140 bytes main Content 3 4 5 SMS Server extract Short message Snippet User TOP N search response Search Engine (SMSFind)

Why meaningful? • Growth of the mobile phone market • motivated the design of new forms of mobile information services • Growth of Twitter and other social messaging networks • Short-Messaging Service (SMS) based applications and services become popular • Mobile devices in developing regions are still simple low-cost devices • With limited processing and communication capabilities • Voice and SMS will likely continue to remain the primary communication channels

Why SMS-Based Search? • For any SMS-based web service, efficient SMS-based search is an essential building block. vertical (Google SMS and Yahoo! oneSearch) • Existing long tail (ChaCha,JustDial) --- need human being • None of the existing automated SMS search services is a complete solution for search queries across arbitrary topics. ---- Using pre-defined topics, such as “define” or “movies” (e.g. Google SMS: “define boils”)

Difficulties of SMS-Based Search • 140 bytes • Search response time (10 seds ~ several mins) • Small form factor and low bandwidth (Even XHTML/WAP) • Long tail phenomenon • Rarely have the luxury (VS. Desktop) • Ambiguous • Problem: How does a mobile user efficiently search the Web using one round of interaction where the search response is restricted to one SMS message? • SMSFind

Related Works • Two surveys • First: Need a new mobile search model for low-end mobile devices. • Second: SMS is expected to continue its growth as it is popular, cheap, reliable and private. • Two kinds of SMS search • Vertical: Google , Yahoo! , and Microsoft • Long tail: ChaChaand Just Dial • Automatic Text Summarization • The goal is different

Related Works • The problem that SMSFind seeks to address is similar to: • A question/answering systems (developed by the Text Retreival Conference) • But distinct from: • Unstructured search style queries (simple natural language style) • SMSFind is a snippet extraction and snippet ranking algorithm • The collection of documents being searched over

Known Verticals vs Long Tail

SMSFind Search Problem • Characterized as follows: Given <query, hint> + the top N search response pages  extract a text snippet as an appropriate search response to the query. Note that: • What is a snippet? • What is the hint?

Disambiguate query • A common technique: • use additional contextual information from which the search is being conducted. • here we use an explicit hint. • Consider the query : <“Barack Obama wife”, “wife”>.

<“Barack Obama wife”, “wife”> • Most search result pages will contain: • “Michelle” or “Michelle Obama” or “Michelle Robinson” or “Michelle Lavaughn Robinson” within the neighborhood of the word “wife” in the text of the page. • SMSFind will search the neighborhood of the word “wife” in every result page and look for commonly occurring n-grams. • 1<=n<=5. For example, “Michelle Obama” is a 2−gram.

n-grams and snippets • Both represent continuous sequences of words in a document • A n-gram is extremely short in length (1−5 words) • A text snippet is a sequence of words that can fit in a single SMS message • n-grams are used as an intermediate unit • Snippets are used for the final ranking

SMSFind Algorithm • Consider a search query (Q,H) • Q is the search query containing the hint term(s) H. • Let P1, . . . PN represent the textual content of the top N search response pages to Q. • Three steps: Neighborhood Extraction; N-gram Ranking; Snippet Ranking

Neighborhood Extraction

N-gram Ranking

Basic rationale of n-gram ranking algorithm • Any n-gram which satisfies the following three properties is potentially related to the appropriate response: • 1. the n-gram appears very frequently around the hint. • 2. the n-gram appears very close to the hint. • 3. the n-gram is not a commonly used popular term or phrase. • As an example, the n-gram “Michelle Obama”.

Three Metrics • Frequency - The number of times the n-gram occurs across all snippets. • Mean rank – The sum of the PageRanksof every page in which the n-gram occurs, divided by the n-gram’s raw frequency. • MinimumDistanceto the hint.

Should return the response “rainnwilson” Here, freq(s), meanrank(s) and mindist(s) are normalized scores of a n-gram s

Snippet Ranking

Hint Extraction from the Query • 45% of the queries began with the word “what” . • And over 80% of the queries are in standard forms . (e.g. “what is”, “what was”, “what are”, “what do”, “what does”). • The “what is X” pattern . • Example, the hint of “what is a quote by ernesthemingway” is “quote”. (“a” is a stop word )

Outline • What the authors address • Introduction • Related Work • SMSFind Problems • SMSFind Search Algorithm • Implementation • Evaluation • Discussion • Conclusion 8 mins

IMPLEMENTATION • 600 lines of Python code • 1.8Ghz Duo Core Intel PC • 2 GB of RAM • 2 Mbps broadband • A front-end • Setup a SMS short code with a local telco in Kenya

EVALUATION • How about the query set? • How about the correct answers? • How to judge correct or not? • How about the percentage of verticals? • Can the hint be always got correctly?

Result • SMSFind results in 57.3% correct answers. • While Google SMS results in only 9.5% of these queries.

what do the snippet results actually look like?

What is more interesting? • if remove the vertical queries? • if consider only the highest n-grams returned rather than the entire snippet? • Whether n-grams are necessary or if ranking snippets alone would perform just as well? • How Important is the Hint Term?

Summary of several results

Difficult Types of Queries • Really ambiguous • Explanations • Enumerations • Analysis • Time sensitive SMSFind can not handle these kinds of queries now!

CONCLUSION • We have presented SMSFind, an automated SMS-based search response system. • SMSFind can work across arbitrary topics. • We find that a combination of simple Information Retrieval algorithms with existing search engines can provide reasonably accurate search responses for SMS queries. • SMSFind is able to answer 57.3% of the queries in our test set.

Thank you!

SMS-Based Web Search for Low-end Mobile Devices

SMS-Based Web Search for Low-end Mobile Devices

Presentation Transcript

Mobile Devices and the Mobile Web

Block-based Web Search

Low Power Processor --- For Mobile Devices

Programming for mobile devices

Peer-Based Location of Mobile Devices

for Mobile Devices

ID-based Authenticated Key Exchange for Low-Power Mobile Devices

An Efficient Identity-based Cryptosystem for End-to-end Mobile Security

Location based System for mobile GSM devices

Mobile Devices for Control

Mobile Web Search Personalization

Proximity-Based Authentication of Mobile Devices

Mobile Web Search Personalization

SMS-Based web Search for Low-end Mobile Devices

Block-based Web Search

Mobile Web Interaction on Embedded Devices

Tips to write web pages for Mobile devices

Search strategy Web & Mobile Marketing

Web Design for Mobile Devices by Matebiz

Block-based Web Search

Proximity-Based Authentication of Mobile Devices

ID-based Authenticated Key Exchange for Low-Power Mobile Devices