Challenges in Commerce Search
1 / 26

Hugh E. Williams Vice President, Experience, Search, and Platforms @ hughewilliams , [email protected] - PowerPoint PPT Presentation

  • Uploaded on

Challenges in Commerce Search. Hugh E. Williams Vice President, Experience, Search, and Platforms @ hughewilliams , [email protected] eBay Today. 50+ petabytes. Of data in our Hadoop and Teradata clusters. 2+ billion . 250 million. Page views each day. 75+ billion.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Hugh E. Williams Vice President, Experience, Search, and Platforms @ hughewilliams , [email protected]' - soren

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Challenges in Commerce Search

Hugh E. WilliamsVice President, Experience, Search, and [email protected], [email protected]

50+ petabytes

Of data in our Hadoop and Teradata clusters

2+ billion

250 million

Page views each day

75+ billion

Database calls each day

Queries per day


$10 trillion

The opportunity ahead is huge

Online Commerce

$1 trillion

Source: Economist Intelligence Unit, Morgan Stanley

Note: Market sizes as of 2012, Compounded Annual Growth Rates from 2012 to 2015

Today s search
Today’s Search

  • Turnaround contributor

  • Series of improvements

  • Ten year old technology

Conversion up 13
Conversionup 13%

Better Search


Simple Flows

Better Images




Improving search from 2009 to 2012
Improving Search from 2009 to 2012

  • User experience changes

    • Imagery

    • Reorganization

    • Optimization

    • Major page refresh

    • Speed

  • Search science

    • Query understanding and rewriting

    • Understanding user intent

    • Behavioral measurement

    • Substantial ranking improvements (particularly to Fixed Price ranking)

  • And all on a 10+ year old platform named Voyager

Query understanding and rewriting
Query Understanding and Rewriting

  • Our search engine was literal

  • We’re on a journey to make it more intuitive

  • Idea: Mine our query-session data, look for patterns, and use these to map words in user queries to synonyms and structured data

User Query

Search Query


Query Rewrite

eBay Results

How do buyers purchase the pilzlampe
How do buyers purchase the pilzlampe?

  • It turns out, they do one of a few things:

    • Type pilzlampe, and purchase

    • Type pilzlampe, … , pilzlampe, and purchase

    • Type pilzlampe, … , pilzlampen, and purchase

    • Type pilzlampen, … , pilzlampe, and purchase

How do buyers purchase the pilzlampe1
How do buyers purchase the pilzlampe?

  • From our data mining:

    • We automatically discover that pilzlampeand pilzlampeare the same

    • We also discover that pilzand pilzeare the same, and lampeand lampenare the same

  • From these patterns, we rewrite the user’s query pilzlampeas:

    pilzlampeOR “pilzlampe” OR “pilzlampen” OR pilzlampen OR “pilzelampe” OR pilzelampe OR “pilzelampen” OR pilzelampen

Are query rewrites easy
Are Query Rewrites easy?

  • Nothing is easy at scale

    • Incorrect strong signals:

      • CMU is not Central Michigan University

      • Mariners is not the same as Marines

    • Context matters

      • Correcting Seattle Marines to Seattle Mariners is (generally) right

      • Denver Nuggets is not Denver in the Jewelry & Watches category

Next Gen Search

An even bigger opportunity

Cassini reengineering ebay search

Cassini: Reengineering eBay Search

How hard is it to ship a new search engine
How hard is it to ship a new search engine?

  • Voyager is used for much more than the obvious. It’s multi-tenant:

    • “Default Search” search (already migrated to Cassini in the US)

    • Completed, null and low (already migrated to Cassini worldwide)

    • Description search

    • Deterministic sorts

    • Query rewrite

    • Merchandizing

    • The Feed

    • Selling (for example, allowing sellers to create listings from similar items)

    • Category browsing

    • Motors and other verticals

    • Many fast “item lookup” scenarios for other teams

    • Many scenarios we don’t even know about…

What s else is hard about ebay search
What’s else is hard about eBay search?

  • eBay has over 400 million items listed in multiple languages

  • Our collection of items changes fast

  • You can find just about anything on eBay. We have to optimize for every type of item

  • Not everybody follows the same listing practices, or uses the same keywords or units

    • Examples include:

      • Units of measure: centimeter versus cm, gigabytesversus gb

      • Colors: Blue versus Aqua, Rojois the same as Red

      • Synonyms: laptopand notebook, mobile phone and cell phone

      • Abbreviations: SGA means Stadium Giveaway

      • Spelling errors

  • Our goal is to help both buyers and sellers find items even when they use different ways of expressing the same things

Technology deep dive infrastructure
Technology Deep dive: Infrastructure

  • What’s hard at eBay?

    • Multi-tenant system

    • Document additions and deletions

    • Document modifications

    • Index updates

    • Result caching

    • Data center automation

Technology deep dive ranking
Technology Deep dive: Ranking

  • What’s hard at eBay?

    • Mix of items: good ’til canceled multi quantity vs. single quantity

    • Gaps in catalog data

    • A very different problem: different ranking signals to Web search

    • The deterministic sort:

      • Recall versus precision

      • Consistency with best match

    • Spam

    • Result blending



of eBay multiscreenusers

of GMV share