1 / 28

Your PetDB

A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos, P. Tucker. Your PetDB. It’s 2015. For $300 a year, you can have a personal petabyte database (PetDB). You can talk to it from anywhere.

fay
Download Presentation

Your PetDB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Petabyte in Your PocketDavid MaierOregon Graduate Institutewith help fromD. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos, P. Tucker

  2. Your PetDB • It’s 2015. • For $300 a year, you can have a personal petabyte database (PetDB). • You can talk to it from anywhere. • Organizes any kind of digital data. • Doesn’t lose structure, can restructure • Queryable • Handles streams • Organized by type, content, associations, multiple categorizations and groupings • Locate items by • How or where you encountered them • What you’ve done with them • Where you were when you accessed them

  3. What Would I Put in a Petabyte? • A lot. • Fill my office floor to ceiling with books  100 GB • What do I do with 10,000 as much? • Many possibilities: • Contents of every book and magazine I read • Every web page I visit • All email I send or receive • Every TV program I watch • Every version of every piece of software I use • Maps of everywhere I go • Notes from every class or seminar I attend • All the telephone calls I make • My “Lifestream” (Freeman and Gerlernter)

  4. Streams and Restructuring • Can incorporate streamed data on the fly. • MD: Vital signs from patients in ICU • Factory supervisor: status, output rate of all machines; finished products; rejects • Can restructure data if desired. • Combined list of conferences in my area • Info sheets on autos I’m considering buying • Comparable salaries of faculty at my rank in similar departments

  5. Anything I Might Want to Refer Back to • Personally indexed for me. • Can be located in a thousand different ways. • What is the company in Massachusetts I read about in the article on factory tours when I was on the plane to the sales meeting in Atlanta last spring?

  6. Or Things I Might Want in the Future • Histories of news groups and mailing lists • Parts of the web I might want to browse, including past snapshots • Descriptions and prices for any item I might want to buy • Papers I’ve been meaning to read • Historical data on stocks I’m interested in • Functions as a personal web portal

  7. “Database” Not Completely Apt • Didn’t have to define a scheme for it • Doesn’t need to know the datatypes I want to store in advance • Doesn’t chop data into rows and columns • Unless I ask • Can query over information streams • Don’t need to write and run applications to add data • Anything I’ve touched is there • Or expressed an interest in • Not on a particular computer • Doesn’t have an “outside”

  8. My PetDB is Good to Me • I don’t move data between environments • I’m never on the “wrong” machine • Never go back to my office to grab a paper, never have the wrong folder at a meeting • Don’t worry a lot about filing systems–PetDB organizes itself by ways I like to look for information • Anticipates what data I’ll be using

  9. How to Do This? • On $300/year • Plan A: Pack my office floor to ceiling with disk drives. • About a $1 million. • Plan B: Be clever. • Share • Stage • Reconstitute

  10. Share • Most of the information in my PetDB isn’t unique to me: magazine article, web page, stock quote. • Store one copy. • Information Paradox: What’s too expensive for one may be affordable for all. Others’ PetDBs My PetDB

  11. Stage • Not all data has to be at my current point of connection. • Mainly resides in shared and private servers on the Internet. • Staged to me on a series of data managers. • Access time depends on context, likely use • Current itinerary: 1 second • Upcoming trips: 5 seconds • Past trips: 30 seconds

  12. Reconstitute • “If I found it once, PetDB can find it again” • Remember what procedure or search constructed or located data originally. • Use the same method to get it again. • Need to ensure base data is archived. • Plus a small amount of unique content • Stuff I’ve created • Foreground information that superimposes my personal perspective: selections, annotations, responses, manipulations, groupings

  13. What Infrastructure Do I Need? • Net Data Managers • Network-centric vs. disk-centric • Data movement vs. data storage • Work on lives streams as well as stored data • Deal with data of arbitrary types • Run queries of thousands of sites • Locate data by external contexts as well as internal content • Large-scale monitoring

  14. Net Data Managers (NDMs) Query DBMS No Query File System Web Servers Disk Centric Network Centric Data Management Space

  15. Why Net Data Managers? • File systems won’t work • No queries, disk centric • Web Servers won’t work • No structural query, no combining of data • No support for optimization and execution of high-level queries spanning 1000s of sites • No support for triggers • In reality, nothing more than “page servers”

  16. Limitations of Current DBMSs • Schema-first • Load then query • Data in the box • Scale • Search by content, not by context

  17. Key Elements of NDM • Self-describing data (e.g., XML) • NetQueries • Algebraic basis • Stream-processing components • Oil refinery vs. book-order warehouse • Want to do for net-centric, data-intensive applications what relational DBs did for business data processing: • Reduce the coding effort to produce such applications, while improving performance, scalability and reliability.

  18. Codd’s Contribution • What’s the most important aspect of the relational model? • Calculus? • Algebra? • Equivalence? • My opinion: Observing that BDP programs only do about 6-7 different things: • scan files remove fields • select records remove duplicates • combine records [aggregate records] • concatenate files • What are the building blocks of net data management?

  19. Format Conversion Alert Service Browser Push Receiver Profiles Format Conversion Browser Data Product Generation Push Receiver Accumulator + Query Eng. Algorithm Browser Format Conversion Push Receiver Parameter File Generic Component Custom Software Data Sources Without NDMs Users

  20. Sources Users Format Conversion Format Conversion Format Conversion Alert Service Browser Push Receiver Profiles Browser Data Product Generation Accumulator + Query Eng. Push Receiver Algorithm Browser Push Receiver Parameter File Generic Component Custom Software With NDMs

  21. Kinds of Components • Stream-based query processors • Alerters • Accumulators • Remote monitoring/indexing • Semantic Routers • Replicators: lazy, eager, just-in-time • Semantic caches • Splitters • Access-mode adapters • Partial evaluators

  22. Data Centric Net Centric ? ? ? D D D DBMS Alerter ! ! ! ! ! ! D D D ? ? ? Stream of data past a store of queries Stream of queries past a store of data Alerting vs. Querying

  23. Access Modes: Who Decides When DataMoves Post Push Producer Poll Pull Consumer Producer Consumer What Data Moves

  24. Assembling Applications from Components • Akamai FreeFlow (see NASDAQ site) • Splitting + Replication + Merge + Adapters Browser Merge Base Server Pull Web Content Pull Text Field Server Push Replicate Split Graphics Field Server Field Server

  25. NIAGARA Project • Initial investigation of NDM based on XML • University of Wisconsin and OGI • Stream-oriented XML-QL evaluator • “Text-in-context” search • NiagaraCQ • Merge operator (and rest of algebra) • XML Firehose

  26. Use of NDM for PetDB • NetQueries encode procedures for reconstituting data • Monitoring sources of interest • Replication, splitting, push, accumulators, semantic routing for staging data • NetQuery to inform an archive server what to save • Archives, semantic caches express what they already hold with a NetQuery

  27. Building the PetDB System Context Mgr. Stager Petster Task Analyzer Profiler Stager Private Archive Pet DB Stager Replicate Server IP Server Secure Local Cache Back Quote Data Kennel WebSnap Indexer Stream Processor Internet Monitor Public Archives

  28. What Else is Needed? • Superimposed Information • Much of my unique content is an organizational overlay on base data • Small-footprint data managers • Presentation model of stream data • Authorization and Authentication • QoS control, content scaling • Intelligent prediction, learning • Secure staging areas

More Related