1 / 38

IFLOW: Self-managing distributed information flows

IFLOW: Self-managing distributed information flows. Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai, Sangeetha Seshadri, Greg Eisenhauer, Karsten Schwan and others. Overview. Motivation Case study: inTransit Architecture

wilsonn
Download Presentation

IFLOW: Self-managing distributed information flows

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai, Sangeetha Seshadri, Greg Eisenhauer, Karsten Schwan and others

  2. Overview • Motivation • Case study: inTransit • Architecture • Flow graph deployment/reconfiguration • Experiments • Other aspects of the system

  3. Motivation • Lots of data produced in lots of places • Examples: operational information systems, scientific collaborations, end-user systems, web traffic data

  4. Airline example Shop for flights Rebook missed connections Check seats Flights arriving Concourse display Flights departing Bags scanned Gate display Customers check-in Baggage display Weather updates Catering updates Home user display FAA updates

  5. Previous solutions • Tools for managing distributed updates • Pub/sub middlewares • Transaction Processing Facilities • In-house solutions • Times have changed • How to handle larger data volumes? • How to seamlessly incorporate new functionality? • How to effectively prioritize service? • How to avoid hand-tuning the system?

  6. Approach • Provide a self-managing distributed data flow graph Select ATL data Weather data Terminal or web Predict delays Flight data Generate customer messages Correlate flights and reservations Check-in data

  7. Approach • Deploy operators in a network overlay • Middleware should self-manage this deployment • Provide necessary performance, availability • Respond to business-level needs

  8. IFLOW X-Window Client AirlineFlowGraph { Sources ->{FLIGHTS, WEATHER, COUNTERS} Sinks ->{DISPLAY} Flow-Operators ->{JOIN-1, JOIN-2} Edges ->{(FLIGHTS, JOIN-1), (WEATHER, JOIN-1), (JOIN-1, JOIN-2), (COUNTERS, JOIN-2), (JOIN-2, DISPLAY)} Utility ->[Customer-Priority, Low Bandwidth Utilization] } CollaborationFlowGraph { Sources ->{Experiment} Sinks ->{IPaq, X-Window, Immersadesk} Flow-Operators ->{Coord, DistBond, RadDist, CoordBond} Edges ->{(Experiment, Coord), (Coord, DistBond), (DistBond, RadDist), (DistBond, RadDist), (RadDist, IPaq), (CoordBond, ImmersaDesk), (CoordBond, X-Window)} Utility ->[Low-Delay, Synchronized-Delivery] } Coordinates Coordinates +Bonds Calculates Distance and Bonds FLIGHTS Radial Distance OVERHEAD-DISPLAY WEATHER Molecular Dynamics Experiment ImmersaDesk IPaq Client IFLOW middleware COUNTERS [ICAC ’06]

  9. Case study • inTransit • Query processing over distributed event streams • Operators are streaming versions of relational operators

  10. Architecture Flow-graph control PDS IFLOW inTransit Distributed Stream Management Infrastructure Query? Application layer Data-flow parser Middleware layer ECho pub-sub Stones Messaging Underlay layer [ICDCS ’05]

  11. Application layer • Applications specify data flow graphs • Can specify directly • Can use SQL-like declarative language STREAM N1.FLIGHTS.TIME, N7.COUNTERS.WAITLISTED, N2.WEATHER.TEMP FROM N1.FLIGHTS, N7.COUNTERS, N2.WEATHER WHEN N1.FLIGHTS.NUMBER=’DL207’ AND N7.COUNTERS.FLIGHT_NUMBER= N1.FLIGHTS.NUMBER AND N2.WEATHER.LOCATION=N1.FLIGHTS.DESTINATION; ⋈ N1 ⋈ ‘DL207’ N2 N10 N7

  12. Middleware layer • ECho – pub/sub event delivery • Event channels for data streams • Native operators • E-code for most operators • Library functions for special cases • Stones – operator containers • Queues and actions Channel 1 ⋈ Channel 3 Channel 2

  13. Middleware layer CPU • PDS – resource monitoring • Nodes update PDS with resource info • inTransit notified when conditions change CPU CPU? CPU

  14. Flow graph deployment • Where to place operators?

  15. Flow graph deployment • Where to place operators? • Basic idea: cluster physical nodes

  16. Flow graph deployment • Partition flow graph among coordinators • Coordinators represent their cluster • Exhaustive search among coordinators N1 ? ⋈ ‘DL207’ ⋈ N10 N2 ? ? N7

  17. Flow graph deployment • Coordinator deploys subgraph in its cluster • Uses exhaustive search to find best deployment ⋈ ?

  18. Flow graph reconfiguration • Resource or load changes trigger reconfiguration • Clusters reconfigure locally • Large changes require inter-cluster reconfiguration ⋈

  19. Hierarchical clusters • Coordinators themselves are clustered • Coordinators form a hierarchy • May need to move operators between clusters • Handled by moving up a level in the hierarchy

  20. What do we optimize • Basic metrics • Bandwidth used • End to end delay • Autonomic metrics • Business value • Infrastructure cost [ICAC ’05]

  21. Experiments • Simulations • GT-ITM transit/stub Internet topology (128 nodes) • NS-2 to capture trace of delay between nodes • Deployment simulator reacts to delay • OIS case study • Flight information from Delta airlines • Weather and news streams • Experiments on Emulab (13 nodes)

  22. Approximation penalty Flow graphs on simulator

  23. Impact of reconfiguration 10 node flow graph on simulator

  24. Impact of reconfiguration Network congestion Increased processor load 2 node flow graph on Emulab

  25. Different utility functions Simulator, 128 node network

  26. Query planning • We can optimize the structure of the query graph • A different join order may enable a better mapping • But there are too many plan/deployment possibilities to consider • Use the hierarchy for planning • Plus: stream advertisements to locate sources and deployed operators • Planning algorithms: top-down, bottom-up [IPDPS ‘07]

  27. Planning algorithms • Top down A ⋈ B ⋈ C ⋈ D C ⋈ D A ⋈ B ⋈ A B C D C ⋈ D A ⋈ B ⋈

  28. Planning algorithms • Bottom up A ⋈ B A ⋈ B A ⋈ B ⋈ C ⋈ D A B C D A ⋈ B A ⋈ B ⋈ C ⋈ D

  29. Query planning 100 queries, each over 5 sources, 64 node network

  30. Availability management • Goal is to achieve both: • Performance • Reliability • These goals often conflict! • Spend scarce resources on throughput or availability? • Manage tradeoff using utility function

  31. Fault tolerance ⋈ • Basic approach: passive standby • Log of messages can be replayed • Periodic “soft-checkpoint” from active to standby • Performance versus availability (fast recovery) • More soft-checkpoints = faster recovery, higher overhead • Choose a checkpoint frequency that maximizes utility ⋈ X ⋈ [Middleware ’06]

  32. Proactive fault tolerance • Goal: predict system instability

  33. Proactive fault tolerance

  34. Mean time to recovery

  35. IFLOW beyond inTransit Science app … Complex infrastructure inTransit Pub/sub Self-managing information flow

  36. Related work • Stream data processing engines • STREAM, Aurora, TelegraphCQ, NiagaraCQ, etc. • Borealis, TRAPP, Flux, TAG • Content-based pub/sub • Gryphon, ARMADA, Hermes • Overlay networks • P2P • Multicast (e.g. Bayeux) • Grid • Other overlay toolkits • P2, MACEDON, GridKit

  37. Conclusions • IFLOW is a general information flow middleware • Self-configuring and self-managing • Based on application-specified performance and utility • inTransit distributed event management infrastructure • Queries over streams of structured data • Resource-aware deployment of query graphs • IFLOW provides utility-driven deployment and reconfiguration • Overall goal • Provide useful abstractions for distributed information systems • Implementation of abstractions is self-managing • Key to scalability, manageability, flexibility

  38. For more information • http://www.brianfrankcooper.net • cooperb@yahoo-inc.com

More Related