WikiMania’08, July 18, 2008 Alexandria, Egypt Recent Developments at Yahoo! in Search & Mobile, and Future Challenges Research Usama Fayyad Chief Data Officer & Executive VP Yahoo! Inc. Usama_fayyad@yahoo.com
Overview • About Yahoo! and its business • Yahoo! Mobile Philosophy • OneSearch 2.0 • Challenges in Mobile Search • Some words about search advertising • Examples of Search Evolution at Yahoo! • Concrete examples of the changes that are relevant to Social Web • Concluding thoughts
Globally, Internet Users Number Over 1 Billion Internet Users in Millions: Source: IDC, December 2003.
Yahoo! is the #1 Destination on the Web • More people visited Yahoo! in the past month than: • Use coupons • Vote • Recycle • Exercise regularly • Have children living at home • Wear sunscreen regularly 73% of the U.S. Internet population uses Yahoo! – Over 500 million users per month globally! • Global network of content, commerce, media, search and access products • 100+ properties including mail, TV, news, shopping, finance, autos, travel, games, movies, health, etc. • 25 terabytes of data collected each day… and growing • Representing thousands of cataloged consumer behaviors Data is usedto develop content, consumer, category and campaign insights for our key content partners and large advertisers Sources: Mediamark Research, Spring 2004 and comScore Media Metrix, February 2005.
Yahoo! Data – A league of its own… GRAND CHALLENGE PROBLEMS OF DATA PROCESSING TRAVEL, CREDIT CARD PROCESSING, STOCK EXCHANGE, RETAIL, INTERNET Y! Data Challenge Exceeds others by 2 orders of magnitude
What About Yahoo! Mobile? • Fast growing initiative that is one of the companies priorities in the future • Great success in distribution • signed deals with 29 carriers, and therefore it’s accessible to 600 million subscribers, who are now under contract. • OneSearch is Yahoo’s mobile search application that it launched 13 months ago. Just launched OneSearch 2.0 • Marco Boerries, EVP of Mobile at Yahoo!: “No one has never amassed that kind of distribution under that short period of time.”
Mobile Device Internet Penetration Will Eclipse the PC 1 Billion people across the worlduse the Internet * 3.3 Billion people across the world are mobile service subscribers (that’s half the global population)** * U.N Telecommunications Agency, Sept 07 ** Informa, Nov 07
16.9Million in USA Yahoo!’s Global Mobile Reach 16.9 Million Unique Users Per Month In The U.S. Alone Yahoo! Google MSN AOL 16.9 12.1 8.9 8.6 Unique Users Per Month (mm)
Mobile Search Built for the Consumer = Mobile Search PC Search
The Mobile Use Case is Different Give me Answers, Entertainment, Images…
Y! oneSearch Changed the Game Answers Instead of Web Links. Relevant, Complete Results
Yahoo! Mobile Approach to Search • OneSearch is a special federated search engine • Analyses Concept and Intent of the query against a large collection of “vertical” backends • Web, News, Images, Finance, etc… • UGC such as Wikipedia and Yahoo! Answers • Aggregates results from verticals and blends to optimize to user query and to device used for query • Goal is to minimize clicks by taking user to results around tasks • Query sources: • Browsers: WAP/XHTML • Java app interface for Yahoo! Go • SMS text messaging for Yahoo! Mobile SMS
Approach • Be as Open as possible on interfaces • Fundamentally believe the mobile OS market will remain fragmented from a platforms perspective for quite a while • Windows Mobile only reached 30M users after more than 7 years of effort • Provide an environment to allow users to program to one target platform and let Yahoo! bear the effort of making it run on wide range of devices • Focus on the highest value apps for users today involving access to on-line world (less on client apps) • Return results and not links
Yahoo! Mobile Products Yahoo!onePlace Yahoo!oneConnect Yahoo! Home Page Yahoo! Go 3.0 Yahoo! oneSearch * M:Metrics, October 2008 **All Yahoo! Mobile services are free. Check with your wireless carrier about data plan charges that may apply.
OneSearch 2.0 • OneSearch is being opened up to all publishers and content owners so they can write rich metadata that will be returned as part of results, rather than just a link, • Similar to Yahoo’s Search Monkey service for the Internet. • More about this later… • Three new major upgrades • Search Assist: The search box will predict what you are typing. • Voice input: Users can search by speaking into the device instead of typing (provided by Vlingo) • The search box will be integrated into the home screen of the phones.
Turning web search results into answers Unlocking the power of the semantic web Providing more relevant content OneSearch 2.0 Better answers
Easier, faster input Predictive text completion Contextual recommendations OneSearch 2.0 Easier input
Speak your search Search for anything Personalized to your voice OneSearch 2.0 Voice input
Persistent 1-click access Gateway to the Internet Supports text & voice OneSearch 2.0 Always there
Internet Use on Mobile vs. PC Research
Mobile Use • Today, we believe Internet use on PC is about 10x that of Mobile • Mobile is faster growing, in all regions • There are > 3x mobiles today than Internet users globally • But most phones are not data capable yet • The world today: • We are learning from the web, and attempting to figure out what makes sense for mobile users • Trying to work with the Smart Phones users as they represent the early adopters
Mendocino weather Mars surface images Nikon CoolPix Classical web search user needs • Informational (~25%) – want to learn about something • Navigational (~40%) – want to go to that page • Transactional (~35%) – want to do something (web-mediated) • Access a service • Downloads • Shop • Gray areas • Find a good hub • Exploratory search “see what’s there” Low hemoglobin United Airlines Car rental Finland Broder 2002, A Taxomony of web search
What about on Mobile • No good classification • Several studies that cover • Query frequency distribution • Words per query • Characters per query • Categorization by query type into traditional categories: • Adult and Entertainment, Autos, Consumer Goods, Finance, Government & politics, Sports, Technology, Travel, etc… • Best known studies by • Kamvar and Baluja (2006 and 2007) • Yi, Maghoul, and Pedersen (2008) • Good quantitative statistics, little on qualitative purpose-driven analysis (early days still)
What do We Believe about Mobile Queries • We believe it is a different distribution than the query distribution for PC users • Bias towards shorter queries • Data contradicts that: 2.6 words per query, same # chars as PC • Difficulty of query entry is a significant hurdle • Much higher location-based activity • Much more task-oriented than exploration or research • Notifications adds a whole new “push” dimension • Trigger alerts (stocks, news, auctions) • Location-based (geo-driven) • Event-based (calendar entires such as travel alerts, flight delays, etc.) • Can learn much more about user intent and hence eventually more promising for advertising
Implications and Challenges • Task-orientation • Specialized content packaging • Locality Inference from queries • Locality Inference from device (LBS) • Minimize typing and round-trips: get results, not just links • Less room to display SERP + other accessories • Monetization strategies to fund this model still not decided • Advertising • Subscription to “premium services” • Revenue share on “leads” • Pay per usage of special high-value areas In the meantime, the web, and Search are evolving…
Even Larger Challenges • Modeling Social Media and use of mobile in social settings on the go • Understanding UGC • Classifying, categorizing, organizing UGC and folksonomy • A different problem of search -- Semantics of content are critical, especially if we are to target • Intent • Task-orientation • Motion dimension (distance to target of search) • Push and notifications • Understanding the physical world (common sense): what is close? Business hours? Holidays? • Web Content growing, changing, diversifying, fragmenting • Truly leveraging the notification abilities and finding new everyday uses – far more versatile a space than PC • Long-term memory (state) for long-running tasks and queries
A Tale of Two Search Engines Research
Advertisements =Monetization Algorithmic results =Audience -$ +$
Algorithmic vs. Ad Search • Analogous to classical separation of editorial vs commercial content • Technical underpinnings: • Some commonalities (IR, ML) • Many differences (incentives, spam, mechanism design)
User Web spider Search Indexer The Web Indexes Ad indexes The two engines
1995: The Yahoo! Directory • Apply human expertise and editorial to organize web sites • What worked • Practical, Navigable • Trustworthy, Authoritative • What didn’t • Scalability • Granularity • Etc.
1995 : Altavista (Inktomi, Lycos, etc.) • Automate the process of acquiring pages; use “information retrieval” techniques to return pages that contain a particular term • What worked • Scalable (query for “IBM” returns 40M pages) • Simple • Granular • What didn’t • Scalability a double-edged sword • Ranking and relevance poor • Not authoritative (spam, irrelevance, etc.)
c. 1999-2006: PageRank (Google, Yahoo) • Use topology (link structure) of the web to confer authority • What works • Relevance is greatly improved • Navigational query is born (query for “IBM” gets me to ibm.com) • What doesn’t • Homogeneity of results (no personalization) means no “subjective” queries – webmasters vote by proxy for everyone – and their answer is the only answer • System easily “gamed” by spammers – leads to arms race
Meanwhile, On the Money Front… • Sponsored search ranking: Goto.com (morphed into Overture.com Yahoo!) • Your search ranking depended on how much you paid • Auction for keywords: casino was expensive! • 1998+: Link-based ranking pioneered by Google • Blew away all early engines except Inktomi • Great user experience in search of a business model • Meanwhile Goto/Overture’s annual revenues were nearing $1 billion • Result: Google added sponsored search “ads” to the side, independent of search results • 2003: Yahoo follows suit, acquiring Overture (for paid placement) and Inktomi (for search) • The Monetization Mechanisms… Conversion of marketplace machanisms in 2007
Search query Ad
Questions for the audience • Do you think an “average” user, knows the difference between sponsored search links and algorithmic search results? • Do you think an “average” user knows there are sponsored links on the page? • Do you think a user knows where a sponsored link would navigate to upon a click?
I want to bid $5 on canon camera I want to bid $2 on cannon camera How it works Ad Index Advertiser Sponsored search engine Engine decides when/where to show this ad. Landing page Engine decides how much to charge advertiser on a click.
IR Econ Engine: Three sub-problems • Retrieve ads matching query • Order the ads • Pricing on a click-through
2. Order the ads • Most generally, composite IR+Econ score … for today’s talk, focus on Econ • Original GoTo/Overture scheme: • Order by bid
Economic ordering • Bid and revenue ordering: two forms of ordering by an econ score • Does revenue ordering maximize revenue? • No – advertisers react to ordering scheme, by changing their bid behavior! • Lahaie+Pennock ACM EC 2007 • Family of schemes bridging Bid and Revenue ordering • Game-theoretic analysis Edelman, Ostrovsky, Schwarz 2006
A new convergence • Monetization and economic value an intrinsic part of system design • Not an afterthought • Mistakes are costly! • Computing meets humanities like never before – sociology, economics, anthropology …
Example I want to book a vacation in Tuscany. Start Finish
Trends in task complexity • Dawn of search: • Navigational queries • Pockets of information • Today: • Increasing migration of content online • New forms of media only available online • Infrastructure for payments and reputation sufficient for many users
Things to notice • Long-running user goals • Search as hub: • start there • return for resource discovery and at task boundaries • traverse the web broadly to complete task • Web services integrated into task
Content Growth Research
Content trends [Ramakrishnan and Tomkins 2007]