1 / 8

Web Data Management

Web Data Management. Raghu Ramakrishnan. QUIQ Lessons. Structured data management powers scalable collaboration environments ASP Multi-tenancy Massively distributed Fine-grained permissions, hierarchical acls RDBMSs were a lousy fit. “Transactional” Storage & Serving

issac
Download Presentation

Web Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Data Management Raghu Ramakrishnan

  2. QUIQ Lessons • Structured data management powers scalable collaboration environments • ASP • Multi-tenancy • Massively distributed • Fine-grained permissions, hierarchical acls • RDBMSs were a lousy fit

  3. “Transactional” Storage & Serving E.g., PNUTS, S3, SSDS, UDB Cloud Computing: Computing as a Service Packaged Software Cloud Computing CPU Intensive Data Intensive High-throughput E.g., Condor Analytic E.g., SSDS, Hadoop

  4. Implications • Data management as a service • Scientists and others who’ve resisted (installing, maintaining, and) using DBMSs will find it much easier to reap the benefits • “Data centers” and “Computing Centers” will come into vogue again • Hosted back-ends and RAD tools will make Web application development accessible to all • The Web is becoming open • E.g., OpenSocial, OpenID • Ideas will be the most valuable currency, not the wherewithal to build complex systems • Paradigm shifts possible for how we do research in many fields • Build applications that embed your algorithms and test them directly in the field—Computer Scientists can interact directly with users (ironically, this would still be a breakthrough of sorts after four decades!) • Many other disciplines (e.g., Sociology, microeconomics) can design and conduct online experiments involving unprecedented numbers of participants

  5. A 42342 E A 42342 E A 42342 E B 42521 W B 42521 W B 42521 W C 66354 W C 66354 W C 66354 W D 12352 E D 12352 E D 12352 E E 75656 C E 75656 C E 75656 C F 15677 E F 15677 E F 15677 E PNUTS: DB in the Cloud Indexes and views CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) Geographic replication Parallel database Structured, flexible schema Hosted, managed infrastructure

  6. Basic Consistency Model Goal: • Make it easier for applications to reason about updates and cope with asynchrony—alternative to “transactions” in an asynchronous world • What happens to a record with primary key “Brian”? Guarantees: • Every reader will always see some consistent, but possibly stale version • Readers can request a more up-to-date version, but may pay extra latency • Special case: Critical read (writer/readers see their own writes) • Writers can verify that the record is still at the version they expect Record inserted Record inserted Update Record inserted Update Update Update Delete Update Delete Delete v. 2 v. 2 v. 1 v. 3 v. 1 v. 3 v. 4 v. 1 Time Generation 1 Generation 2 Generation 3

  7. Lots of Issues to Re-think • Massive distribution & replication • Asynchrony • Availability • Consistency • DBA to the world • Auto-tuning • Multi-tenancy • Access control (granularity, online ids) • Encryption • App-support • Caching

  8. Querying the Web • Search will become more semantic—best-effort match-making between: • Query intent (NLP, query logs …) • Interpreted web content • Deep web has a lot of structured data • How we get a handle on it is an interesting problem • But this is only part of the problem … lots of data not here • Semantic web isn’t working • Site-wrapping doesn’t scale • Solutions? • Domain-wrapping • Mass collaboration • ??

More Related