data cloud n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Data Cloud PowerPoint Presentation
Download Presentation
Data Cloud

Loading in 2 Seconds...

play fullscreen
1 / 40

Data Cloud - PowerPoint PPT Presentation


  • 112 Views
  • Uploaded on

Data Cloud. Yury Lifshits Yahoo! Research http://yury.name. My Beliefs. The key challenge in web search is structured search Part 1: What is structured search? The key challenge in structured search is collecting data Part 2: Data distribution & idea of Data Cloud

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Data Cloud


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data cloud

Data Cloud

Yury Lifshits

Yahoo! Research

http://yury.name

my beliefs
My Beliefs

The key challenge in web search is structured search

Part 1: What is structured search?

The key challenge in structured search is collecting data

Part 2: Data distribution & idea of Data Cloud

Part 3: Demo: numeric data distribution

The key challenge in collecting data is incentive design

Part 4: Economics of data distribution

slide11
Data = data of entities + data of contentData

Structured data

Entity unit:

  • Identifier
  • Metadata:
    • Explicit key-value pairs
    • Relational properties
    • Evaluation

Semi-structured data

Content unit:

  • Body: text, video, audio, or image
  • Metadata:
    • Explicit key-value pairs
    • Relational properties
    • Evaluation
structured search
Structured Search

Factoid search

“what's the value of property X of object Y“

Entity hubs

  • Domain hubs

Structured object search

"all concerts this weekend in SF under 20$ sorted by popularity"

  • Time focus
  • Ranking focus
  • Relations focus

Structured content search

"all videos with Tom Brady"

“all comments and blog posts about Bing"

yury s wishlist
Yury’s Wishlist

Business-generated data

  • Products, services, news, wishlists, contact data

Reality stream, sensors

  • Where what have happened

Expert knowledge

  • Glossary, issues, typical solutions, object databases, related objects graph

Events

  • Sport, concerts, education, corporate, community, private

Market graph & signals

  • Like, interested, use, following, want to buy; votes and ratings
search as a platform

Query analysis

Post analysis

App 3

App 1

Classic search

App 2

Structured Data

Web index

Search as a Platform

App 4

slide15
Data Cloud

How to collect all structured data in one place?

data producers
Data Producers
  • People: forums, wiki, mail groups, blogs, social networks
  • Enterprizes: product profiles, corporate news, professional content
  • Sensors: GPS modules, web cameras, traffic sensors, RFID
  • Transactional data
data distributors
Data distributor is any technical solution to accumulate, organize and provide access to structured and semi-structured data

Data publisher: the original distributor of some data

Data retailer: a consumer-facing distributor of some data

Data Distributors
data consumers
Data Consumers
  • Humans
    • Email
    • Aggregators: news, friend feeds, RSS readers
    • Search
    • Browsing / random walks
  • Intelligence projects
    • Recommendation systems
    • Trend mining
data cloud1
Data Cloud

Data Cloud is a centralized fully-functional data distribution service

Success metric for data cloud strategy = the total “value” of data on the cloud

to cloud solutions
To-Cloud Solutions
  • Extraction
    • DBpedia.org, “web tables”
  • Semantic markup, data APIs
    • Yahoo! SearchMonkey
  • Feeds
    • Yahoo! Shopping
    • Disqus.com, js-kit.com, Facebook Connect
  • Direct publishing
on cloud solutions
On-Cloud Solutions
  • Ontology maintenance
    • Freebase
  • Normalization, de-duplication, antispam
  • Named entity recognition, metadata inference, ranking
  • Data recycling (cross-references)
    • Amazon Public Data Sets
    • Viral license
  • Hosted search
    • Yahoo! BOSS
from cloud solutions
From-Cloud Solutions
  • Search, audience
    • Y! SearchMonkey, Google Base
  • Data API, dump access, update stream
  • Custom notifications
    • Gnip.com
  • Data cloud as a primary backend
  • Access control
    • Ad distribution. (AT&T and Yahoo! Local deal)
slide23
Demo:

webNumbr.com

Joint work with Paul Tarjan

webnumbr com import
webNumbr.com: Import
  • Crawl numbers from the web

URL + XPath + regex

  • Create “numbr pages”
  • Update their values every hour
  • Keep the history

Anyone can create a numbr

http://webnumbr.com/create

webnumbr com export
webNumbr.com: Export
  • Embed code
  • Graphs
  • Search & browse
  • RSS
slide27
Economics of Data Distribution

Joint work with Ravi Kumar and Andrew Tomkins

network effect in two sided markets
Two sided market = every product serves consumers of two types A and B

Cross-side network effect: the more type-A users product X has, the more attractive it is for type-B consumers and vice versa

Examples: operating systems, credit cards, e-commerce marketplaces

Two-sided network effects: A theory of information product design

G. Parker, M.W. Van Alstyne, N. Bulkley, M. Van Alstyne

Network Effect in Two-Sided Markets
basic model
Basic model
  • Distributors D1, … Dk
  • Producer/consumer joins only one distributor
  • Initial shares (p1,c1) … (pk,ck)
  • New consumer selects a distributor with a probability proportional to pi
  • New producer selects a distributor with probability proportional to ci
basic model1
Basic model

a2

a4

a3

a1

a1

a3

a4

a2

market shares dynamics
Market Shares Dynamics

Theorem 1

Market shares will stabilize

Theorem 2

With super-liner preference rule

one of distributors will tip

Theorem 3

With sub-liner preference rule

market shares will flatten

external factor
External Factor

Preference rule with external factor:

ei+ci/(c1+…+ck)

  • Theorem 4
      • Market shares will stabilize on
      • e1 : e2 : … : ek
coalition
Coalition

Data Cloud

coalitions
Coalitions

Theorem 5

If all market shares are below 1/sqrt(k)

coalition (sharing data) is profitable for

all distributors

Corollary

Coalitions are not monotone

Example: 5 : 4 : 1 : 1

model variations
Model Variations
  • Same-side network effect
  • Different p-to-c and c-to-p rules
  • Multi-homing (overlapping audiences)
  • n^2 vs. nlog n revenue models
  • Mature market: newcomer rate = departing rate
  • Diverse market (many types of producers and consumers)
  • Newcoming and departing distributors
  • Directed coalitions
marketing
Marketing
  • Data demand?
  • Data offerings?
  • Requirements for distribution technology?
incentive design
Incentive design
  • Incentives for data sharing?
  • Centralized or distributed?
        • For profit or non-profit?
  • Data licensing and ownership?
  • Monetizing data cloud?
more challenges
More Challenges

Prototyping:

  • Data marketplace: open data & data demand
  • Search plugins: related objects, glossaries, object timelines
  • Publishing tools for structured data
  • Data client: structured news, bookmarking, notifications

Tech design:

  • Access management
  • Namespace design

User interface:

  • Structured search UI
  • Discovery UI
slide40
Thanks!

Follow my research:

http://twitter.com/yurylifshits

http://yury.name/blog