1 / 96

CS 347: Parallel and Distributed Data Management Notes X: S4

CS 347: Parallel and Distributed Data Management Notes X: S4. Hector Garcia-Molina. Material based on:. S4. Platform for processing unbounded data streams general purpose distributed scalable partially fault tolerant (whatever this means!). Data Stream.

lavender
Download Presentation

CS 347: Parallel and Distributed Data Management Notes X: S4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 347: Parallel and DistributedData ManagementNotes X: S4 Hector Garcia-Molina Lecture 9B

  2. Material based on: Lecture 9B

  3. S4 • Platform for processing unbounded data streams • general purpose • distributed • scalable • partially fault tolerant(whatever this means!) Lecture 9B

  4. Data Stream Terminology: event (data record), key, attribute Question: Can an event have duplicate attributes for same key?(I think so...) Stream unbounded, generated by say user queries,purchase transactions, phone calls, sensor readings,... Lecture 9B

  5. S4 Processing Workflow user specified “processing unit” Lecture 9B

  6. Inside a Processing Unit PE1 processing element key=a PE2 key=b ... key=z PEn Lecture 9B

  7. Example: • Stream of English quotes • Produce a sorted list of the top K most frequent words Lecture 9B

  8. Lecture 9B

  9. Processing Nodes processing node key=a hash(key)=1 key=b ... ... hash(key)=m ... Lecture 9B

  10. Dynamic Creation of PEs • As a processing node sees new key attributes,it dynamically creates new PEs to handle them • Think of PEs as threads Lecture 9B

  11. Another View of Processing Node Lecture 9B

  12. Failures • Communication layer detects node failures and provides failover to standby nodes • What happens events in transit during failure?(My guess: events are lost!) Lecture 9B

  13. How do we do DB operations on top of S4? • Selects & projects easy!   • What about joins? Lecture 9B

  14. What is a Stream Join? • For true join, would need to storeall inputs forever! Not practical... • Instead, define window join: • at time t new R tuple arrives • it only joins with previous w S tuples R S Lecture 9B

  15. One Idea for Window Join code for PE:for each event e: if e.rel=R [store in Rset(last w) for s in Sset: output join(e,s) ] else ... R S “key” is join attribute Lecture 9B

  16. One Idea for Window Join code for PE:for each event e: if e.rel=R [store in Rset(last w) for s in Sset: output join(e,s) ] else ... R S “key” is join attribute Is this right???(enforcing window on a per-key value basis) Maybe add sequence numbers to events to enforce correct window? Lecture 9B

  17. Another Idea for Window Join code for PE:for each event e: if e.rel=R [store in Rset(last w) for s in Sset: if e.C=s.C then output join(e,s) ] else if e.rel=S ... R S All R & S events have “key=fake”; Say join key is C. Lecture 9B

  18. Another Idea for Window Join code for PE:for each event e: if e.rel=R [store in Rset(last w) for s in Sset: if e.C=s.C then output join(e,s) ] else if e.rel=S ... R S All R & S events have “key=fake”; Say join key is C. Entire join done in one PE; no parallelism Lecture 9B

  19. Do You Have a Better Idea for Window Join? R S Lecture 9B

  20. Final Comment: Managing state of PE state Who manages state?S4: user doesMupet: System does Is state persistent? Lecture 9B

  21. CS 347: Parallel and DistributedData ManagementNotes X: Hyracks Hector Garcia-Molina Lecture 9B

  22. Hyracks • Generalization of map-reduce • Infrastructure for “big data” processing • Material here based on: Appeared in ICDE 2011 Lecture 9B

  23. A Hyracks data object: • Records partitioned across N sites • Simple record schema is available (more than just key-value) site 1 site 2 site 3 Lecture 9B

  24. Operations distributionrule operator Lecture 9B

  25. Operations (parallel execution) op 1 op 2 op 3 Lecture 9B

  26. Example: Hyracks Specification Lecture 9B

  27. Example: Hyracks Specification initial partitions input & output isa set of partitions mapping to new partitions 2 replicated partitions Lecture 9B

  28. Notes • Job specification can be done manually or automatically Lecture 9B

  29. Example: Activity Node Graph Lecture 9B

  30. Example: Activity Node Graph sequencing constraint activities Lecture 9B

  31. Example: Parallel Instantiation Lecture 9B

  32. Example: Parallel Instantiation stage (start after input stages finish) Lecture 9B

  33. System Architecture Lecture 9B

  34. Map-Reduce on Hyracks Lecture 9B

  35. Library of Operators: • File reader/writers • Mappers • Sorters • Joiners (various types0 • Aggregators • Can add more Lecture 9B

  36. Library of Connectors: • N:M hash partitioner • N:M hash-partitioning merger (input sorted) • N:M rage partitioner (with partition vector) • N:M replicator • 1:1 • Can add more! Lecture 9B

  37. Hyracks Fault Tolerance: Work in progress? Lecture 9B

  38. Hyracks Fault Tolerance: Work in progress? save output to files save output to files The Hadoop/MR approach: save partial results to persistent storage; after failure, redo all work to reconstruct missing data Lecture 9B

  39. Hyracks Fault Tolerance: Work in progress? Can we do better? Maybe: each process retains previous results until no longer needed? Pi output: r1, r2, r3, r4, r5, r6, r7, r8 have "made way" to final result current output Lecture 9B

  40. CS 347: Parallel and DistributedData ManagementNotes X: Pregel Hector Garcia-Molina Lecture 9B

  41. Material based on: • In SIGMOD 2010 • Note there is an open-source version of Pregel called GIRAPH Lecture 9B

  42. Pregel • A computational model/infrastructurefor processing large graphs • Prototypical example: Page Rank a d PR[i+1,x] =f(PR[i,a]/na, PR[i,b]/nb) x b e f Lecture 9B

  43. a d Pregel • Synchronous computation in iterations • In one iteration, each node: • gets messages from neighbors • computes • sends data to neighbors x PR[i+1,x] =f(PR[i,a]/na, PR[i,b]/nb) b e f Lecture 9B

  44. Pregel vs Map-Reduce/S4/Hyracks/... • In Map-Reduce, S4, Hyracks,...workflow separate from data • In Pregel, data (graph) drives data flow Lecture 9B

  45. Pregel Motivation • Many applications require graph processing • Map-Reduce and other workflow systemsnot a good fit for graph processing • Need to run graph algorithms on many procesors Lecture 9B

  46. Example of Graph Computation Lecture 9B

  47. Termination • After each iteration, each vertex votes tohalt or not • If all vertexes vote to halt, computation terminates Lecture 9B

  48. Vertex Compute (simplified) • Available data for iteration i: • InputMessages: { [from, value] } • OutputMessages: { [to, value] } • OutEdges: { [to, value] } • MyState: value • OutEdges and MyState are remembered for next iteration Lecture 9B

  49. Max Computation • change := false • for [f,w] in InputMessages do if w > MyState.value then [ MyState.value := w change := true ] • if (superstep = 1) OR change then for [t, w] in OutEdges do add [t, MyState.value] to OutputMessages else vote to halt Lecture 9B

  50. Page Rank Example Lecture 9B

More Related