1 / 40

So What To Do Next?

So What To Do Next?. Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu). Where To Find Problems. State of affairs Interesting industrial problems Mike’s picks My whine on XML Grand challenges. State of Affairs. IT failure rate

sai
Download Presentation

So What To Do Next?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

  2. Where To Find Problems • State of affairs • Interesting industrial problems • Mike’s picks • My whine on XML • Grand challenges

  3. State of Affairs • IT failure rate • Software half-life • No knobs

  4. State of Affairs • ~50-75% of IT projects fail • if we built bridges, our profession would be fired • and the same mistakes are repeated over and over (excessive ambition, rolling specs, bad design, failure to load a large data set early)

  5. What To Do? • We typically don’t teach this stuff • probably because we don’t (can’t) spend any time in industry to figure it out Action item: at the very least read a couple of Robert L. Glass’s books

  6. State of Affairs • Hardware “half-life” is 18 months • Software half-life is 18 years (or more)!

  7. What To Do? • Much higher level design environments • we are stuck at the general purpose programming level (conceivable benefit limited) • workflow and other higher level graphical notations probably a good idea

  8. What To Do? • special purpose languages nice (why are report writers shunned?) • higher level versions of SQL and Xquery • See Informix Visionary for a cool example

  9. State of Affairs • Commercial products are way too hard to use • takes people in white lab coats to get them up and keep them up • Full employment act for DBAs forever

  10. What To Do? • “No knobs” • only buttons are “go” and “stop” • all tuning automatic • index selection is one of the minor ones (buffer pool size, partitioning, log buffer pool size, …) • Error reporting stinks

  11. Interesting Industrial Problems Should Focus Research • BBC • OZ entertainment • Cisco • Akamai • Fidelity My suggestion: NSF should require a letter of support from a CIO with each grant proposal.

  12. Interesting Problems -- BBC • Digitize 50 years of British television creativity • want to serve it up on demand • especially British soccer games • media is wearing out • Random access to 1 Petabyte (or so) • By the unwashed internet 200 million

  13. CNN Variation • On-line digital news editing by 300 news directors • who want to find Monica Lewinsky • and 30 seconds of footage on suffering in Bosnia

  14. What To Do? • Content outlives support for the content format • Automatic content indexing • cannot afford a librarian • Global scale distributed system • Staging and caching • high locality of reference

  15. What To Do? • Query model meets visualization systems • unwashed will not learn Xquery • Rights management • incredibly sticky issue in whole area

  16. Interesting Problem - OZ Entertainment • New theme park near Kansas City • “no lines” • no lost kids • virtual theme park as teaser

  17. What To Do? • Large scale GIS • update intensive! • Large scale triggering problem • alert me if there is a cancellation at X and I am within 300 yards

  18. Interesting Problem - Cisco Systems • Supply chain of 60K suppliers for custom goods • Want to query the transitive closure of this supply chain • can I make 10 more routers next week?

  19. What To Do? • Huge federated system • central metadata a non-starter • no single DBA • global query optimizer a non-starter • Adapters for 1M (or so) legacy systems • how to write them semi-automatically?

  20. Interesting Problem - Akamai • Billing is 95/5 • 5 minute intervals • pay for bandwidth of 95th percentile • 300 Gbytes a day (compressed) of click stream data Biggest warehouses on the planet will soon be click stream data!

  21. Click Stream Data • Customers want to mine their click stream • And Akamai only has a portion of it • i.e. huge distributed data base • Query is “tell me something interesting” • i.e. why are 95% of the shopping carts abandoned? • and not a pile of statistics

  22. Interesting Problem - Fidelity • Financial portal for high net worth individuals • must connect to several hundred Fidelity systems • Customers want to know fairly complex things • i.e. rank my money manager against all value managers for 1, 3 and 5 years

  23. What to Do? • Voice to NL to structured data • voice to NL works in focused verticals (weather, airline schedules) • but this is a pretty broad app • NL to structured data requires some work • put in the joins • look up vocabulary in the DBMS

  24. What to Do? • How to join unstructured data to structured data • tell me the news stories about all stocks which have increased in value more than 10% today

  25. Mike’s Picks • Too much middleware • Akamai for structured data

  26. Interesting Problem - Middleware • Average enterprise has • one (or more) app servers • one (or more) EAI packages • one (or more) ETL packages • one (or more) portal products • one (or more) application packages • and maybe someday a federated DBMS

  27. All of these systems • Contain transformation engines • And often do function activation (app service) • And often have adapters to legacy systems Huge overlap in functionality!!

  28. What to Do? • Consolidate weaker paradigms under stronger ones • e.g. federated DBMS subsumes ETL • OR DBMS subsumes app service Middleware becomes DBMS-centric!

  29. Interesting Problem - Caching • Akamai et. al cache HTML • closer to the browser that wants it • Would be nice to cache structured data • need to cache application that uses the data • and the data

  30. What to Do? • Materialized views are a predefined solution • Nice to have a more dynamic one • Cache (query, answer) pairs?

  31. History Lesson (Codd) • Putting semantics into data order is bad • restricts storage options • hidden meaning bad • Hierarchical representations for data are bad • rewrite the queries when representation changes (data independence) • Complexity is bad

  32. My Spin on XML (XMLSchema) • As a storage format, XML is good for documents not data • Codd’s thinking has not been repealed (order, hierarchy, complexity) • no binary format • in line tags are inefficient • SGML run amok….

  33. My Spin on XML • As an “on the wire” notation, XML is ok for data • but don’t try to move too much stuff • and don’t try to move it too fast • Remember why client-server put in binary movement!

  34. Xquery For Data • Won’t store data in XML • Necessary to design something that is easy to translate into SQL • Alternate syntax for OR SQL • which is much cleaner (// is a user defined function in Informix)

  35. XML Summary • Focus attention on XMLSchema as a document description system not a data description system • Focus Xquery on documents not data W3C use cases do not do this!

  36. OR DBMS • XML is merely this year’s data type • Next year it will be WML or ... • OR is still not finished • query optimization • data base design • physical storage layout

  37. Grand Challenge #1 • Preponderance of web accessible data is structured • much more than “facts and figures” • Construct a system to access “the rest of” the web

  38. What To Do • GUI problem (NL or Vis) • Query notation problem • Discovery problem • how do you “scrape” a structured data web site to figure out the meaning of its data? • Federation problem

  39. Grand Challenge #2 • Everything of material importance is geo-positioned (lojacked) • Construct the mother of all GIS systems • complete automation of supply chains • “where is my wife” (or the closest restroom)

  40. What To Do • Most of the issues in GC #1 • The mother of all triggering problems • The mother of all security/privacy problems

More Related