Icde2009 keynotes summary
Download
1 / 56

- PowerPoint PPT Presentation


  • 102 Views
  • Updated On :

ICDE2009 Keynotes Summary. Shanghai, China, 3.29-4.2 Li Yukun. Outline. Keynotes Search Computing( Stefano Ceri ) Data Management in the Cloud( Raghu Ramakrishnan) Why Can't I Find My Data the Way I Find My Dinner? David Carlson. Keynote 1. Search Computing Stefano Ceri

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - brit


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Icde2009 keynotes summary l.jpg

ICDE2009 Keynotes Summary

Shanghai, China, 3.29-4.2

Li Yukun


Outline l.jpg
Outline

  • Keynotes

    • Search Computing(Stefano Ceri)

    • Data Management in the Cloud(Raghu Ramakrishnan)

    • Why Can't I Find My Data the Way I Find My Dinner?

      David Carlson


Keynote 1 l.jpg
Keynote 1

Search Computing

Stefano Ceri

Dipartimento di Elettronica e Informazione, Politecnico di Milano

Piazza L. Da Vinci 32, 20133 Milano, Italy

[email protected]


Motivation l.jpg
Motivation

  • “Who are the strongest European competitors on software ideas?

  • Who is the best doctor to cure insomnia in a nearby hospital?

  • Where can I attend an interesting conference in my field close to a sunny beach?”

This information is available on the Web, but no software system can accept such queries nor compute the answer.


Core model for search computing l.jpg
Core model for search computing

  • Conventional services

    • Are abstracted as systems producing sets of equal-weight answers;

  • Service computing

    • A cross-discipline that covers the science and technology of bridging the gap between Business Services and IT Services.

    • The goal of Services Computing is to enable IT services and computing technology to perform business services more efficiently and effectively.

  • Search services

    • Can be abstracted as systems producing ranked lists of answers.

  • Search computing

    • It is a new paradigm where ranking is the dominant factor for composing services.

    • Multi-domain query,constellation of cooperating search services, possibly dynamically selected,


Chapters of search computing l.jpg
CHAPTERS OF SEARCH COMPUTING

  • Theory for search computing

    • Select the best abstractions covering the concepts

    • Design basic operations on services and algorithms

    • Compute time and space complexity

  • Statistical models for search services

    • Build statistical estimators of the number and quality of the results

  • Optimization methods for search computing

  • Description abstractions for search services

    • Expose ranking-specific properties of search services

  • Language abstractions for search computing

    • by incorporating the ranking aspects and strategies for dealing with rankings


Chapters of search computing7 l.jpg
CHAPTERS OF SEARCH COMPUTING

  • Human-computer interfaces

    • Expressing ranking preferences.

    • Light-weight user interaction

  • Semantics

    • Merging the results of heterogeneous search services

    • semantic “join” of search services.

  • Higher-order ranking

    • “ranking of rankings”, is essential for selecting and prioritizing search services.

    • A multi-level one,

  • Managing individual and social searching

    • search strategies to user profiling or to past user interactions

    • Societal recommendation and evaluation

    • Thus, individual and societal aspects are key ingredients for search computing


Chapters of search computing8 l.jpg
CHAPTERS OF SEARCH COMPUTING

  • Search computing engineering

    • designing, assembling and deploying search computing software applications.

  • Economy of search computing

    • Suitable business models, based upon advertising schemes, pay-per-query, subscription fees, micro-billing, and so on.

  • Security and privacy of search computing

    • control of how data is used.

    • For instance, use of a search service could be granted to a service computing application, provided that the service’s owners can trace all queries involving their data and limit the kind of information that is made visible to the queries.


Project organization l.jpg
PROJECT ORGANIZATION

  • Funded by the European Research Council in the framework of the IDEAS Advanced Grants;

  • It started on Nov. 1, 2008 and will last five years.


Project organization10 l.jpg
PROJECT ORGANIZATION

  • The project involves about 30 researchers at Politecnico

    • Abdan Abid, Edoardo Amaldi, Alessandro Bozzon, Daniele Maria Braga, Marco Brambilla, Tommaso Buganza, Alessandro Campi, Sofia Ceppi, Sara Comai, Emanuele Della Valle, Piero Fraternali, Nicola Gatti, Michael Grossniklaus, Ma’moun Abu Hellu, Pier Luca Lanzi, Davide Martinenghi, Marco Masseroli, Maristella Matera, Davide Mazza, Giuseppe Pozzi, Stefania Ronchi, Roberto Verganti, Marco Tagliasacchi, Massimo Tisi.

  • SeCo has an advisory board

    • Edoardo Amaldi (Operations Research),

    • Fabio Casati (Service Computing),

    • Georg Gottlob (Theory),

    • Ioana Manolescu (Systems and Performance),

    • Roberto Verganti (Business Models),

    • Gerhard Weikum (Information Retrieval for the Web),

    • Jennifer Widom (Languages and Paradigms)


Seven teams l.jpg
seven teams

  • Concept team

  • Theory and methods

  • Service registration and management

  • Query processing

  • Interaction design

  • Tools and prototypes

  • Business models and technology watch


More information on seco is available on the project s web site l.jpg
More information on SeCo is available on the project’s Web site:

  • http://home.dei.polimi.it/ceri/seco/index.html


Outline13 l.jpg
Outline site:

  • Keynotes

    • Search Computing

      Stefano Ceri

    • Data Management in the Cloud

      Raghu Ramakrishnan

    • Why Can't I Find My Data the Way I Find My Dinner?

      David Carlson


Keynote 2 data management in the cloud l.jpg
Keynote 2: Data Management in the Cloud site:

Yahoo! Research

Raghu Ramakrishnan

Brian Cooper

Utkarsh Srivastava

Adam Silberstein

Nick Puz

Rodrigo Fonseca

CCDI

Chuck Neerdaels

P.P.S. Narayan

Kevin Athey

Toby Negrin

Plus Dev/QA teams


Scenarios l.jpg
SCENARIOS site:

Pie-in-the-sky


Living in the clouds l.jpg
Living in the Clouds site:

We want to start a new website, FredsList.com

Our site will provide listings of items for sale, jobs, etc.

As time goes on, we’ll add more features

illustrate how more cloud capabilities are used as needed

List of capabilities/components is illustrative, not exhaustive


Step 1 listings l.jpg
Step 1: Listings site:

FredsList wants to store listings as (key, category, description)

FredsList.com application

DECLARE DATASET Listings AS

( ID String PRIMARY KEY,

Category String,

Description Text )

5523442, childcare,

Nanny available in San Jose

1234323, transportation, For sale: one bicycle, barely used

215534,

wanted,

Looking for issue 1 of Superman comic book

Simple Web Service API’s

Database

Sherpa


Step 2 search l.jpg
Step 2: Search site:

FredsList’s customers quickly ask for keyword search

FredsList.com application

ALTER Listings

SET Description SEARCHABLE

“dvd’s”

“bicycle”

“nanny”

Simple Web Service API’s

Database

Search

Sherpa

Vespa

Messaging

YMB


Step 3 photos l.jpg
Step 3: Photos site:

FredsList decides to add photos to listings

FredsList.com application

ALTER Listings

ADD Photo BLOB

Simple Web Service API’s

Storage

Database

Search

Foreign key

photo → listing

MObStor

Sherpa

Vespa

Messaging

YMB


Step 4 data analysis l.jpg
Step 4: Data Analysis site:

FredsList wants to analyze its listings to get statistics about category, do geocoding, etc.

FredsList.com application

ALTER Listings

MAKE ANALYZABLE

Hadoop program to generate fancy pages for listings

Hadoop program to geocode data

Pig query to analyze categories

Simple Web Service API’s

Storage

Compute

Database

Search

Foreign key

photo → listing

MObStor

Grid

Sherpa

Vespa

Messaging

YMB

Batch export


Step 5 performance l.jpg
Step 5: Performance site:

FredsList wants to reduce its data access latency

FredsList.com application

ALTER Listings

MAKE CACHEABLE

Simple Web Service API’s

Storage

Compute

Database

Caching

Search

Foreign key

photo → listing

MObStor

Grid

Sherpa

memcached

Vespa

Messaging

YMB

Batch export


Eyes to the skies l.jpg
EYES TO THE SKIES site:

Motherhood-and-Apple-Pie


Requirements for cloud services l.jpg
Requirements for Cloud Services site:

Multitenant

A cloud service must support multiple, organizationally distant customers.

Elasticity

Tenants should be able to negotiate and receive resources/QoS on-demand.

Resource Sharing

Ideally, spare cloud resources should be transparently applied when a tenant’s negotiated QoS is insufficient.

Horizontal scaling

It should be possible to add cloud capacity in small increments; this should be transparent to the tenants

Metering

A cloud service must support accounting that reasonably ascribes operational and capital expenditures to each of the tenants of the service.

Security

A cloud service should be secure in that tenants are not made vulnerable because of loopholes in the cloud.

Availability

A cloud service should be highly available.

Operability

A cloud service should be easy to operate


Types of cloud services l.jpg
Types of Cloud Services site:

Two kinds of cloud services:

Horizontal Cloud Services

Functionality enabling tenants to build applications or new services on top of the cloud

Functional Cloud Services

Functionality that is useful in and of itself to tenants. E.g., various SaaS instances, such as Saleforce.com; Google Analytics and Yahoo!’s IndexTools; Yahoo! properties aimed at end-users and small businesses, e.g., flickr, Groups, Mail, News, Shopping

Yahoo! has been offering these for a long while (e.g., Mail for SMB, Groups, Flickr, BOSS, Ad exchanges)


Slide25 l.jpg

SHERPA site:

To Help You Scale Your Mountains of Data


The sherpa solution l.jpg
The Sherpa Solution site:

The next generation global-scale record store

  • Record-orientation: Routing, data storage optimized for low-latency record access

  • Scale out: Add machines to scale throughput (while keeping latency low)

  • Asynchrony: Pub-sub replication to far-flung datacenters to mask propagation delay

  • Consistency model: Reduce complexity of asynchrony for the application programmer

  • Cloud deployment model: Hosted, managed service to reduce app time-to-market and enable on demand scale and elasticity

26


Query processing l.jpg
QUERY site: PROCESSING

27


Accessing data l.jpg
Accessing Data site:

Record for key k

Get key k

Record for key k

1

2

3

4

Get key k

SU

SU

SU

28


Bulk read l.jpg
Bulk Read site:

{k1, k2, … kn}

Get k1

Get k2

Get k3

Scatter/

gather server

1

2

SU

SU

SU

29


Range queries in ydot l.jpg
Range Queries in YDOT site:

Clustered, ordered retrieval of records

Storage unit 1

Canteloupe

Storage unit 3

Lime

Storage unit 2

Strawberry

Storage unit 1

Grapefruit…Pear?

Grapefruit…Lime?

Storage unit 1

Canteloupe

Storage unit 3

Lime

Storage unit 2

Strawberry

Storage unit 1

Lime…Pear?

Router

Storage unit 1

Storage unit 2

Storage unit 3

Apple

Avocado

Banana

Blueberry

Canteloupe

Grape

Kiwi

Lemon

Lime

Mango

Orange

Strawberry

Tomato

Watermelon

Apple

Avocado

Banana

Blueberry

Strawberry

Tomato

Watermelon

Lime

Mango

Orange

Canteloupe

Grape

Kiwi

Lemon


Updates l.jpg
Updates site:

Write key k

SU

SU

SU

6

5

2

4

1

8

7

3

Sequence # for key k

Write key k

Routers

Message brokers

Write key k

Sequence # for key k

SUCCESS

Write key k

31




Consistency model l.jpg

Goal: make it easier for applications to reason about updates and cope with asynchrony

What happens to a record with primary key “Brian”?

Consistency Model

Record inserted

Delete

Update

Update

Update

Update

Update

Update

Update

v. 2

v. 5

v. 1

v. 3

v. 4

v. 6

v. 7

v. 8

Time

Time

Generation 1

34


Consistency model35 l.jpg
Consistency Model updates and cope with asynchrony

Read

Stale version

Current version

Stale version

v. 2

v. 5

v. 1

v. 3

v. 4

v. 6

v. 7

v. 8

Time

Generation 1

35


Consistency model36 l.jpg
Consistency Model updates and cope with asynchrony

Read up-to-date

Stale version

Current version

Stale version

v. 2

v. 5

v. 1

v. 3

v. 4

v. 6

v. 7

v. 8

Time

Generation 1

36


Consistency model37 l.jpg
Consistency Model updates and cope with asynchrony

Read ≥ v.6

Stale version

Current version

Stale version

v. 2

v. 5

v. 1

v. 3

v. 4

v. 6

v. 7

v. 8

Time

Generation 1

37


Consistency model38 l.jpg
Consistency Model updates and cope with asynchrony

Write

Stale version

Current version

Stale version

v. 2

v. 5

v. 1

v. 3

v. 4

v. 6

v. 7

v. 8

Time

Generation 1

38


Consistency model39 l.jpg
Consistency Model updates and cope with asynchrony

Write if = v.7

ERROR

Stale version

Current version

Stale version

v. 2

v. 5

v. 1

v. 3

v. 4

v. 6

v. 7

v. 8

Time

Generation 1

39


Index maintenance l.jpg
Index Maintenance updates and cope with asynchrony

How to have lots of interesting indexes, without killing performance?

Solution: Asynchrony!

Indexes updated asynchronously when base table updated

Planned functionality


Sherpa in context l.jpg
SHERPA updates and cope with asynchronyIN CONTEXT

42


Mobstor l.jpg
MObStor updates and cope with asynchrony

Yahoo!’s next-generation globally replicated, virtualized media object storage service

Better provisioning, easy migration, replication, better BCP, and performance

New features (Evergreen URLs, CDN integration, REST API, …)

The object metadata problem is addressed using Sherpa, though MObStor is focused on blob storage.

43


Slide43 l.jpg

Storage & Delivery Stack updates and cope with asynchrony


The world has changed l.jpg
The World Has Changed updates and cope with asynchrony

Web applications need

Scalability!

Geographic distribution

High availability

Reliable storage

Web applications be unfit for

Complicated queries

Strong transactions


Web data management l.jpg
Web Data Management updates and cope with asynchrony

  • CRUD

  • Point lookups and short scans

  • Index organized table and random I/Os

  • $ per latency

  • Scan oriented workloads

  • Focus on sequential disk I/O

  • $ per cpu cycle

Structured record storage

(PNUTS)

Large data analysis

(Hadoop)

  • Object retrieval and streaming

  • Scalable file storage

  • $ per GB

Blob storage

(SAN/NAS)


Application design space l.jpg
Application Design Space updates and cope with asynchrony

Get a few things

Sherpa

MObStor

YMDB

MySQL

Oracle

Filer

BigTable

Scan everything

Hadoop

Everest

Files

Records

47


Further reading l.jpg
Further Reading updates and cope with asynchrony

Efficient Bulk Insertion into a Distributed Ordered Table (SIGMOD 2008)

Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee,

Ramana Yerneni, Raghu Ramakrishnan

PNUTS: Yahoo!'s Hosted Data Serving Platform (VLDB 2008)

Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava,

Adam Silberstein, Phil Bohannon, Hans-Arno Jacobsen,

Nick Puz, Daniel Weaver, Ramana Yerneni


Outline48 l.jpg
Outline updates and cope with asynchrony

  • Keynotes

    • Search Computing(Stefano Ceri)

    • Data Management in the Cloud(Raghu Ramakrishnan)

    • Why Can't I Find My Data the Way I Find My Dinner?

      David Carlson


Keynote 3 l.jpg
Keynote 3 updates and cope with asynchrony

  • Why Can’t I Find My Data the

    Way I Find My Dinner?

  • David Carlson

    • Director International Polar Year International Programme Office

    • Cambridge, UK

    • [email protected]


International polar year ipy l.jpg
International Polar Year(IPY) updates and cope with asynchrony

  • One can find almost every discipline represented in the IPY projects, and funding has come from geophysical, biological and social agencies and programs.


Ipy data l.jpg
IPY data updates and cope with asynchrony

  • open access data policy

  • display and access of IPY data

  • We have component systems, within nations, disciplines, or existingdata service centers, that provide access examples for portions of the IPY data set.

  • We have unprecedented bandwidth for real-time data transmission

  • But , How to access these data set easily!!!


Enormous challenges l.jpg
enormous challenges updates and cope with asynchrony

  • financial

  • social and technical barriers

  • this talk focuses on the latter.


Example l.jpg
Example updates and cope with asynchrony

  • To understand and predict the health of migratory bird populations in the polar environment,

    • Need ornithological, toxicological, ecological, meteorological, hydrological, climatological, geomagnetic, and sociological data.

    • These data will cover a broad range of space and times scales, often in disparate (or at least inconsistent) space and time coordinate system


Problems l.jpg
Problems updates and cope with asynchrony

  • Data access

    • For a larger population of curious users, the specialized data services associated with subsets of the IPY data will not provide easy, friendly, or even accessible

  • Interfaces

    • No familiar interfaces will provide integrated discovery and browse services.

  • No long-term plan

    • On longer time scales, and even as data storage capabilities grow rapidly, most of the IPY data sets donot, at present, have acceptable long-term archive plans, even for passive storage without continued discovery services.


Research issues l.jpg
Research issues updates and cope with asynchrony

  • smart search engines

  • pattern recognition

  • data mining tools

  • multi-gigabyte personal storage devices

  • Advanced animation capabilities

  • coupled with almost unlimited mobile bandwidth

  • offer many citizens expansive and amazing access to commercial, recreational, financial, and personal data and data services.

  • What changes in strategy, technology, funding and individual and collective behavior need to occur in the world of scientific data to allow me to browse, view and access IPY data on my iTouch?


Slide56 l.jpg

  • Thanks updates and cope with asynchrony


ad