Databases unplugged challenges in ubiquitous data management
This presentation is the property of its rightful owner.
Sponsored Links
1 / 46

Databases Unplugged: Challenges in Ubiquitous Data Management PowerPoint PPT Presentation


  • 66 Views
  • Uploaded on
  • Presentation posted in: General

Databases Unplugged: Challenges in Ubiquitous Data Management. Michael Franklin UC Berkeley. “Gazillions of Gizmos”. “In ten years, billions of people will be using the Web, but a trillion "gizmos" will also be connected to the Web.” Asilomar Rep. on DB Research, Dec. 1998

Download Presentation

Databases Unplugged: Challenges in Ubiquitous Data Management

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Databases unplugged challenges in ubiquitous data management

Databases Unplugged:Challenges in Ubiquitous Data Management

Michael Franklin

UC Berkeley


Gazillions of gizmos

“Gazillions of Gizmos”

  • “In ten years, billions of people will be using the Web, but a trillion "gizmos" will also be connected to the Web.” Asilomar Rep. on DB Research, Dec. 1998

  • You’ve heard it before…

    • Smartphones, PDAs, Smartcards, badges, wearables, lightswitches, toasters, …

    • Worldwide sales of Internet-enabled appliances projected to grow from 5.9M units in 1998 to 55.7M units in 2002. IDC via H&Q report

M. Franklin, 12/17/99


An explosion in scale

An Explosion in Scale

(Picture is by way of Randy Katz)

Information

Appliances

More

Many people

per computer

One person

per computer

Scaled down

PCs, desktop

metaphor

PC + Network

Distribution

WS/Server

Many computers

per person

Time Sharing

Batch

RJE

Less

Less

More

Personalization

M. Franklin, 12/17/99


Technical challenges

Technical Challenges

  • Disconnection/Weak Connection

    • Standard distributed database techniques break down.

  • Limited resources

    • Memory, CPU, Power, User Interface, Bandwidth

  • Movement/Location

    • Killer Mobile apps use current and future locations.

  • Scale

    • Number and diversity of devices.

  • Reliability - Palm Pilots don’t bounce.

M. Franklin, 12/17/99


But is mobile data mgmt needed

But, is Mobile Data Mgmt Needed?

  • “Fundamentally, the ability to access all information from anywhere and have ONE unified and synchronized information repository is critical to making appliances useful.” Hambrecht and Quist, iWord , March 1999

  • “All these information appliances have internal data that "docks" with other data stores. Each gizmo is a candidate for database system technology, because most will store and manage some information.” Asilomar Report

M. Franklin, 12/17/99


Road map

Road Map

  • Motivation

  • Alternative scenarios for mobile Databases

  • Technical/Research challenges

  • Some solutions

    • Consistency

    • Data Dissemination

    • Data Recharging

  • Conclusions

M. Franklin, 12/17/99


How will it happen

How Will it Happen?

Alternatives

  • SQL engine on the device (largely standalone)

  • Extension of enterprise infrastructure

  • Data Collection (device to infrastructure)

  • Data Dissemination (infrastructure to device)

  • PIM-driven information assistant

M. Franklin, 12/17/99


Sql engine on the device

SQL Engine on the Device

  • Reasonable for Palmtop — but probably not the toaster or light-switch…

  • Stand-alone with occasional synchronization.

  • Footprint versus functionality

    • Engine can be made surprisingly small (10-100s KB).

    • Sybase uses “take what you need” library approach

  • All major vendors are playing in this space:

    • Oracle Lite, Sybase SQL Anywhere, Informix/Cloudscape, DB2 for the Workpad, SQL Server for Windows CE

  • But, what is the killer app???

  • M. Franklin, 12/17/99


    Extension of enterprise

    Extension of Enterprise

    • Logical Progression?

      • Mainframe->Desktop->Palm

      • ERP-> Palm

    • Device becomes the endpoint of the enterprise infrastructure (queries and updates).

    • This is happening but must take into account fundamental limitations of the mobile platforms.

    • Again, examples exist, but the killer app has not yet emerged here.

    M. Franklin, 12/17/99


    Data collection devices

    Data Collection Devices

    • Inventory Management/Tracking/Sensors/Census

    • Examples: Symbol technologies --- Palm with a bar code scanner; more futuristic: smart dust.

    • Asymmetric (device to server) data flow/usage dictates system architecture.

    • Many applications exist, but no clear need for full function DBMS on the device.

    • Server-side DB must handle datastreams

    M. Franklin, 12/17/99


    Data dissemination

    Data Dissemination

    • Many Potential Apps

      • stock and sports tickers

      • traffic information systems

      • software distribution

      • news and/or entertainment delivery

    • Asymmetric (server to devices) data flow/usage dictates system architecture.

    • No clear need for full function DBMS on the device, but intelligent caching and filtering on device is crucial.

    M. Franklin, 12/17/99


    Personal information management

    Personal Information Management

    • PIM is the killer app for mobile devices.

    • So, use PIM to drive the data management architecture.

    • Example: IBM’s Active Calendar

      • Calendar provides semantic information on what information will be needed when (and where).

      • Use this information to pre-stage information from the fixed infrastructure.

    • This seems to be the most promising approach for driving device DB functionality.

    M. Franklin, 12/17/99


    Research issues

    Research Issues

    • Transactions (not likely) and Consistency.

    • Distribution of function

      • how to split query functionality?

      • adaptive??

    • New Querying and Access Models

      • info filtering and dissemination

      • location centric/movement

      • triggers/pervasive (invasive?) computing

      • Evidence Accrual – killer app: dating game

    • Availability and Recovery

    M. Franklin, 12/17/99


    Data caching and consistency

    Data Caching and Consistency

    • How to keep distributed data consistent?

    • Centralized algorithms require connectivity at specific times.

    • Alternative: Epidemic Algorithms(Peer-to-peer)

      • Conflict detection: timestamps, version vectors,…

      • Conflict Handling (update commitment):

        • Optimistic (resolution) - Manual except in limited domains,

        • Pessimistic (avoidance) - primary copy, write-all or voting-based.

    M. Franklin, 12/17/99


    Epidemic protocol illustration

    Epidemic Protocol Illustration

    (Picture is by way of Ugur Cetintemel)

    Conflict?

    M. Franklin, 12/17/99


    Deno cetintemel and keleher

    Deno - Cetintemel and Keleher

    Pessimistic, Asynchronous (epidemic), voting-based

    “Bounded” weighted-voting:

    • Each replica is assigned a currency cis.t. 0 ci  1.0

    • Total currency in the system is bounded, i.e., ci=1.0

    • Currency can be re-distributed for optimization or planned disconnection.

      An update’s life:

    • Sites issue tentative updates

    • Updates and votes are propagated in a pair-wise fashion

    • Updates gather votes as they pass through sites

    • An update commits when it gathers plurality of votes

    M. Franklin, 12/17/99


    Decentralized update commitment

    Decentralized Update Commitment

    • An update u wins an election with plurality

    • A site s maintains:

      • votes(u): the sum of votes u gained so far

      • unknown: the sum of votes unknown to s

        (i.e., 1.0 –  votes(u),for u)

    • u commits iff for all u’ <> u,

      votes(u) > votes(u') + unknown and

      votes(u) > unknown

      Issues: time to commit; abort rates

    s1

    s1

    Oi

    Oi

    (s1, 0.20, u1)

    (s5, 0.20, u1)

    (s6, 0.15, u2)

    (s1, 0.20, u1)

    (s5, 0.20, u1)

    (s1, 0.20, u1)

    (s5, 0.20, u1)

    (s6, 0.15, u2)

    (s2, 0.15, u1)

    (s1, 0.20, u1)

    (s4, 0.20, u2)

    (s6, 0.25, u3)

    (s2, 0.25, u2)

    (s1, 0.20, u1)

    (s1, 0.20, u1)

    (s4, 0.20, u2)

    (s6, 0.25, u3)

    (s1, 0.20, u1)

    (s4, 0.20, u2)

    (s1, 0.20, u1)

    votes(u1) = 0.20

    votes(u1) = 0.20

    votes(u1) = 0.40

    unknown = 0.80

    votes(u1) = 0.20

    votes(u2) = 0.20

    unknown = 0.80

    votes(u1) = 0.40

    votes(u2) = 0.15

    unknown = 0.60

    votes(u1) = 0.20

    votes(u2) = 0.20

    votes(u3) = 0.25

    votes(u1) = 0.55

    votes(u2) = 0.15

    votes(u1) = 0.20

    votes(u2) = 0.45

    votes(u3) = 0.25

    unknown = 0.60

    unknown = 0.45

    unknown = 0.30

    unknown = 0.35

    unknown = 0.10

    u1 commits!

    u2 commits!

    M. Franklin, 12/17/99


    Semantic caching dar et al

    Semantic Caching - Dar et al.

    • Idea: Maintain description of cache contents as a set of logical predicates rather than a list of items.

    • Potential advantages:

      • Less overhead with no need for static clustering (reduces bandwidth requirements).

      • Describe missing items with logical remainder query.

      • Application/Environment specific replacement functions --- e.g. considering direction and velocity.

    • Issues:

      • controlling complexity of cache descriptions

      • interacting with real database systems

    M. Franklin, 12/17/99


    Dissemination based info sys dbis

    Dissemination-Based Info Sys (DBIS)

    1) Push vs. Pull is just one dimension along which to compare data delivery mechanisms.

    - We’ve identified three.

    2) Different mechanisms for data delivery can (and should) be applied at different points in the system.

    - Select components from toolkit.

    Franklin and Zdonik - Framework in OOPSLA 97,

    Toolkit description and demo in SIGMOD 99.

    M. Franklin, 12/17/99


    Dbis framework

    DBIS Framework

    • An architecture that combines data delivery techniques for responsive client access.

    • 3 types of nodes:

      • Data sources

      • Clients

      • Information brokers (can add value)

    • Any data delivery mode can be used.

      • Network transparency

    • Possibly dynamic.

    M. Franklin, 12/17/99


    Delivery options

    Aperiodic

    Aperiodic

    Periodic

    Periodic

    Unicast

    Unicast

    1-to-n

    1-to-n

    Unicast

    Unicast

    1-to-n

    1-to-n

    Delivery Options

    Push

    Pull

    request/

    response

    w/snoop

    polling

    polling

    w\snoop

    Email

    lists

    publish/

    subscribe

    Email

    list

    digests

    Broad-

    cast

    disks

    request/

    response

    publish/

    subscribe

    M. Franklin, 12/17/99


    Network transparency

    Clients

    Brokers

    Sources

    Network Transparency

    The type of a link matters only to nodes on each end

    M. Franklin, 12/17/99


    Dbis example

    Proxy cache

    Proxy cache

    Proxy cache

    DBIS Example

    An example:

    Unicast pull

    Unicast pull

    1-to-n push

    DB

    Server

    Can vary dynamically

    Unicast pull

    M. Franklin, 12/17/99


    Dbis research issues

    DBIS Research Issues

    • Each data delivery mechanism has unique aspects

      • Broadcast Disks - sched., caching, prefetching,updates

      • On-demand Broadcast -scheduling, data staging

      • Publish/Subscribe-large-scale filtering, channelization

    • Security/Fault-tolerance/Reliability

    • End-to-End network design and control

    • Fundamental performance tradeoffs

    • Exploiting existing and emerging technologies

    M. Franklin, 12/17/99


    Data recharging

    “Data Recharging”

    • Mobile devices require 2 resources: power and data

      • It is impractical to be continuously connected to fixed sources of these.

    • Devices cope with disconnection using caching:

      • Power cached in rechargeable batteries

      • Data cached in hot-synched memory

    • Ideal: make recharging data as simple as power:

      • Anywhere (with adapters), anytime, flexible connection duration

    • Joint work w/ Mitch Cherniack and Stan Zdonik getting underway

    M. Franklin, 12/17/99


    Data recharging research agenda

    Data Recharging - Research Agenda

    • Profile Definition and Maintenance

    • Update Storage and Preparation

    • Efficient integration of "recharge" updates with existing cached data.

      • Recharge, Trickle Charge, Jump Start...

    • Consistency Guarantees

    • Global Data Staging

    • Approaches will be driven by (mostly PIM) applications.

    M. Franklin, 12/17/99


    Conclusions

    Conclusions

    • Lots of plausible/useful Mobile data architectures.

      • For many, the applications exist today

      • Each has its own set of fascinating research opportunities.

    • PIM is the killer app for mobile data access.

      • It can be used to drive the integration with enterprise and Internet data sources.

    • Successful MDA work lies at the intersection of communications and data management rather than exclusively in either camp.

    M. Franklin, 12/17/99


    The data flood is real

    The Data Flood is Real

    Source: J. Porter, Disk/Trend, Inc.

    http://www.disktrend.com/pdf/portrpkg.pdf

    M. Franklin, 12/17/99


    Disk appetite cont

    Disk Appetite, cont.

    • Greg Papadopoulos, CTO Sun:

      • Disk sales doubling every 9 months

    • Note: only counts the data we’re saving!

    • Translate:

      • Time to process all your data doubles every 18 months

      • MOORE’S LAW INVERTED!

        • (and Moore’s Law may run out in the next couple decades?)

    • Big challenge (opportunity?) for SW systems research

      • Traditional scalability research won’t help

        • “Ideal” linear scaleup is NOT NEARLY ENOUGH!

    M. Franklin, 12/17/99


    Data volume prognostications

    Data Volume: Prognostications

    • Today

      • SwipeStream

        • E.g. Wal-Mart 24 Tb Data Warehouse

      • ClickStream

      • Web

        • Internet Archive: ?? Tb

      • Replicated OS/Apps

    • Tomorrow

      • Sensors Galore

      • DARPA/Berkeley “Smart Dust”

    • Note: the privacy issues onlyget more complex!

      • Both technically and ethically

    Temperature, light, humidity, pressure,

    accelerometer,

    magnetics

    M. Franklin, 12/17/99


    Explaining disk appetite

    Explaining Disk Appetite

    • Areal density increases 60%/yr

    • Yet Mb/$ rises much faster!

    Source: J. Porter, Disk/Trend, Inc.

    http://www.disktrend.com/pdf/portrpkg.pdf

    M. Franklin, 12/17/99


    Scenarios

    Scenarios

    • Ubiquitous computing: more than clients

      • sensors and their data feeds are key

        • smart dust, biomedical (MEMS sensors)

        • each consumer good records (mis)use

          • disposable computing

        • video from surveillance cameras, broadcasts, etc.

    • Global Data Federation

      • all the data is online – what are we waiting for?

      • The plumbing is coming

        • XML/HTTP, etc. give LCD communication

        • but how do you flow, summarize, query and analyze data robustly over many sources in the wide area?

    M. Franklin, 12/17/99


    Dataflow in volatile environments

    Dataflow in Volatile Environments

    • Federated query processors a reality

      • Cohera, IBM DataJoiner

      • No control over stats, performance, administration

    • Large Cluster Systems “Scaling Out”

      • No control over “system balance”

    • User “CONTROL” of running dataflows

      • Long-running dataflow apps are interactive

      • No control over user interaction

    • Sensor Nets: the next killer app

      • E.g. “Smart Dust”

      • No control over anything!

    • Telegraph

      • Dataflow Engine for these environments

    M. Franklin, 12/17/99


    Data flood main features

    Data Flood: Main Features

    • What does it look like?

      • Never ends: interactivity required

        • Online, controllable algorithms for all tasks!

      • Big: data reduction/aggregation is key

      • Volatile: this scale of devices and nets will not behave nicely

    M. Franklin, 12/17/99


    The telegraph dataflow engine

    The Telegraph Dataflow Engine

    • Key technologies

      • Interactive Control

        • interactivity with early answers and examples

        • online aggregation for data reduction

      • Dataflow programming via paths/iterators

        • Elevate query processing frameworks out of DBMSs

        • Long tradition of static optimization here

          • Suggestive, but not sufficient for volatile environments

      • Continuously adaptive flow optimization

        • massively parallel, adaptive dataflow via Rivers and Eddies

    M. Franklin, 12/17/99


    Oceanstore context ubiquitous computing

    OceanStore Context: Ubiquitous Computing

    • Computing everywhere:

      • Desktop, Laptop, Palmtop

      • Cars, Cellphones

      • Shoes? Clothing? Walls?

    • Connectivity everywhere:

      • Rapid growth of bandwidth in the interior of the net

      • Broadband to the home and office

      • Wireless technologies such as CMDA, Satelite, laser

    • Rise of the thin-client metaphor:

      • Services provided by interior of network

      • Incredibly thin clients on the leaves

        • MEMs devices -- sensors+CPU+wireless net in 1mm3

    • Mobile society: people move and devices are disposable

    M. Franklin, 12/17/99


    Questions about information

    Questions about information:

    • Where is persistent information stored?

      • 20th-century tie between location and content outdated (we all survived the Feb 29th bug -- let’s move on!)

      • In world-scale system, locality is key

    • How is it protected?

      • Can disgruntled employee of ISP sell your secrets?

      • Can’t trust anyone (how paranoid are you?)

    • Can we make it indestructible?

      • Want our data to survive “the big one”!

      • Highly resistant to hackers (denial of service)

      • Wide-scale disaster recovery

    • Is it hard to manage?

      • Worst failures are human-related

      • Want automatic (introspective) diagnose and repair

    M. Franklin, 12/17/99


    First observation want utility infrastructure

    First Observation:Want Utility Infrastructure

    • Mark Weiser from Xerox: Transparent computing is the ultimate goal

      • Computers should disappear into the background

    • In storage context:

      • Don’t want to worry about backup

      • Don’t want to worry about obsolescence

      • Need lots of resources to make data secure and highly available, BUT don’t want to own them

      • Outsourcing of storage already becoming popular

    • Pay monthly fee and your “data is out there”

      • Simple payment interface one bill from one company

    M. Franklin, 12/17/99


    Second observation need wide scale deployment

    Second Observation:Need wide-scale deployment

    • Many components with geographic separation

      • System not disabled by natural disasters

      • Can adapt to changes in demand and regional outages

      • Gain in stability through statistics

      • Difference between thermodynamics and mechanics surprising stability of temperature and pressure given 1030 molecules with highly variable behavior!

    • Wide-scale use and sharing also requires wide-scale deployment

      • Bandwidth increasing rapidly, but latency bounded by speed of light

    • Handling many people with same system leads to economies of scale

    M. Franklin, 12/17/99


    Oceanstore everyone s data one big utility

    OceanStore:Everyone’s data, One big Utility

    “The data is just out there”

    • Separate information from location

      • Locality is an only an optimization (an important one!)

      • Wide-scale coding and replication for durability

    • All information is globally identified

      • Unique identifiers are hashes over names & keys

      • Uniform location mechanism:

        • replaces: DNS, server location, data location

      • No centralized namespace required (e.g. like SDSI)

    M. Franklin, 12/17/99


    Amusing back of the envelope calculation courtesy bill bolotsky microsoft

    Amusing back of the envelope calculation(courtesy Bill Bolotsky, Microsoft)

    • How many files in the OceanStore?

      • Assume 1010 people in world

      • Say 10,000 files/person (very conservative?)

      • So 1014 files in OceanStore!

      • If 1 gig files (not likely), get 1 mole of bytes!

        Truly impressive number of elements…

        … but small relative to physical constants

    M. Franklin, 12/17/99


    Basic structure irregular mesh of pools

    Basic Structure:Irregular Mesh of “Pools”

    M. Franklin, 12/17/99


    Utility based infrastructure

    Utility-based Infrastructure

    Canadian

    OceanStore

    • Service provided by confederation of companies

      • Monthly fee paid to one service provider

      • Companies buy and sell capacity from each other

    Sprint

    AT&T

    IBM

    Pac

    Bell

    IBM

    M. Franklin, 12/17/99


    Outline

    Outline

    • Motivation

    • Properties of the OceanStore and Assumptions

    • Specific Technologies and approaches:

      • Conflict resolution on encrypted data

      • Replication and Deep archival storage

      • Naming and Data Location

      • Introspective computing for optimization and repair

      • Economic models

    • Conclusion

    M. Franklin, 12/17/99


    Ubiquitous devices ubiquitous storage

    Ubiquitous Devices  Ubiquitous Storage

    • Consumers of data move, change from one device to another, work in cafes, cars, airplanes, the office, etc.

    • Properties REQUIRED for OceanStore storage substrate:

      • Strong Security: data encrypted in the infrastructure; resistance to monitoring and denial of service attacks

      • Coherence:too much data for naïve users to keep coherent “by hand”

      • Automatic replica management and optimization:huge quantities of data cannot be managed manually

      • Simple and automatic recovery from disasters: probability of failure increases with size of system

      • Utility model: world-scale system requires cooperation across administrative boundaries

    M. Franklin, 12/17/99


    State of the art

    State of the Art?

    • Widely deployed systems: NFS, AFS (/DFS)

      • Single “regions” of failure, caching only at endpoints

      • ClearText exposed at various levels of system

      • Compromised server all data on server compromised

    • Mobile computing community: Coda, Ficus, Bayou

      • Small scale, fixed coherence mechanism

      • Not optimized to take advantage of high-bandwidth connections between server components

      • ClearText also exposed at various levels of system

    • Web caching community: Inktomi, Akamai

      • Specialized, incremental solutions

      • Caching along client/server path, various bottlenecks

    • Database Community:

      • Interfaces not usable by legacy applications

      • ACID update semantics not always appropriate

    M. Franklin, 12/17/99


  • Login