1 / 18

SwissBox

SwissBox. G. Alonso , D. Kossmann, T. Roscoe Systems Group, ETH Zurich http://systems.ethz.ch. Agenda. What we are building ? Why we are building it ?. What is SwissBox ?. [Forrest Gump, Hollywood 1994]. Inside SwissBox (Hardware). N CPU Cores (N = 100, 1000)

maik
Download Presentation

SwissBox

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SwissBox G. Alonso, D. Kossmann, T. Roscoe Systems Group, ETH Zurich http://systems.ethz.ch

  2. Agenda • Whatwearebuilding? • Whywearebuildingit?

  3. WhatisSwissBox? [Forrest Gump, Hollywood 1994]

  4. InsideSwissBox (Hardware) • N CPU Cores (N = 100, 1000) • X GB of mainmemory (X = 10xN) • NUMA • dedicate MM foreachcore • Network • heterogeneous (complex) • FPGAs • Somepersistentstorage • Disks orflash (maybe obsolete in futurewith PCM) • Think of (commodity) rackor a multi-coremachine

  5. Overview of Components

  6. Sharedi-diskArchitecture Client HTTP XML, JSON, HTML Web Server FCGI, ... XML, JSON, HTML App Server SQL records DB Server get/put block Storage

  7. Sharedi-diskArchitecture Client Client Client Client HTTP XML, JSON, HTML Web Server Workload Splitter XML, JSON, HTML FCGI, ... XML, JSON, HTML DB+App DB+App App Server Predicates, Light Aggr. SQL records Store (e.g., S3) Store (e.g., S3) DistributedStorage DB Server get/put block Storage [Brantner et al. 2008]

  8. {record, {query-ids} } results Queries + Upd. records ClockScan datapartition [Unterbrunner et al. 2009]

  9. SharedDB: Joins • Mass. shareJoins • samejoinpred. • diff. tablepred • (reassemble BO) • Same idea as ClockScan • „sharedjoinscan“ • additional joinpredicate on „query“ [Giannikis et al. 2011]

  10. Overview of Components

  11. SwissBox Building Blocks • BarrelfishMulti-kernelOperating System • CPU Driver foreachcore (Barrelfish) • MessagePassing (no sharedmemory!) • Designedforheterogeneous HW (e.g., NUMA) • ClockScan • Storagelayerserves simple predicates + aggregates • Snapshopisolationwithinonepartion • E-CastProtocol • Paxos + consistenthashing • elasticity (online repartioning), SI acrosspartions • SharedDB Operators • massivelysharedjoins, sorts, group-bys... • customprocessing (ifsharingnotworthit) • FPGAs • somespecialalgosforin-networkfiltering / processing

  12. Summary: Design Ideas • SwissBoxis an Appliance • enablesoptimizationacrosslayers • Exploitdata / queryduality • indexqueriesratherthandata • optimizewithknowledge of queries and data • Radicallysimplifieddataflowarchitecture • No indexes, onequery plan for a particularworkload • Merge DB and applicationserverlayers • Save cost and improvepredictability • Shapetheworkload • Force (almost) all operationsinto simple accesspatterns (scan) • Sharedi-diskarchitecture • Great forelasticity, fault tolerance (previouswork on cloud) • Makeuse of capabilities of „storagelayer“ • Great for „inter-query“ parall. (not good for „intra-queryparall.)

  13. Agenda • Whatwearebuilding? • Whywearebuildingit?

  14. Whyarewedoingthis? • Becausewecan... • ... theproofis in thepudding • Interestingresearchartefact • re-address OS/DB co-design • study „battle of thebottlenecks“ • Hardware trends • Hardware changesfasterthansystemssoftware • NUMA, main-memory, heterogeneity • Challengingworkloads and requirements • Predictableperformance, datafreshnessguarantees

  15. Amadeus Workload • Passenger-Booking Database • ~ 600 GB of rawdata (twoyears of bookings) • singletable, denormalized • ~ 50 attributes: flight-no, name, date, ..., manyflags • Query Workload • up to 4000 queries / second • latencyguarantees: 2 seconds • today: onlypre-cannedqueriesallowed • Update Workload • avg. 600 updates per second (1 update per GB per sec) • peak of 12000 updates per second • datafreshnessguarantee: 2 seconds

  16. OtherWorkloads • Logging Service (Amadeus, CreditSuisse) • Log entriesfrom multiple apps and middleware • Maintenance of coarse-grainedindexes (sessionId, ...) • Distributeddebugging, support, auditing • Index look-ups + large scans • Twitter Times (http://www.twittertim.es) • Streams of events / microblogposts (700 / sec) • Maintain simple statisticsincrementally (wordcounts) • Compile a personalizednewspaper of posts • TPC-W style (CreditSuisse, SAP) • Complexqueries + updates

  17. RelatedWork • Appliances • SAP Trex, Netezza, Oracle Exadata, ... • New Data ProcessingArchitectures • All thepreviouspapers of thissession • IBM Blink, MonetDB X100, AsterData, ... • Eddies, data/querydualism, StageDB, QPipes, ... • Nothingwhatwe do isreallynew

  18. Conclusion • Consensus on Starting Point • Great workloads, newapprequirements • (predictability, elasticity, ...) • Technology movingfasterthanever • (MM, multi-core, heterogeneity, cloud, ...) • Building blocksthatfeel right • (ClockScan, multi-kernel, ...) • No consensus (yet) on puttingittogether • Howto composepredictability andelasticity? • „Thejourneyisthedestination“

More Related