0 likes | 5 Views
Storm is a distributed, fault-tolerant real-time computation framework that can be used with any programming language. It integrates with various technologies, works on HDFS for batch processing, and allows for the creation of topologies using spouts and bolts. Storm includes concepts like tuples, streams, stream grouping, and parallelism levels, making it a powerful tool for processing data at scale.
E N D
STORM Chrystalla Tsoutsouki Chrysovalantis Anastasiou
PROBLEM 1
• Distributed • Fault-tolerant • Any programming language • Distributed • Fault-tolerant • Any programming language • Integrates with all technologies • Real-time Computation • Works on HDFS • Batch Processing 3
STORM Introduction to Storm Concept 4
Tuples Ordered set of elements [“the”, 14], [“boy”, 2] 5
Streams Sequence of tuples Tuple Tuple Tuple Tuple Tuple 6
Storm Topology • Spout: Source of streams • Bolt: Computation unit 7
Stream Grouping • Shuffle Grouping • Fields Grouping • All Grouping • Global Grouping • None Grouping • Direct Grouping • Local Grouping 8
Parallelism • Level 1: o Different Bolts can make different computations Bolt Spout Bolt 9
Parallelism • Level 2: o Worker Processes o Executors (Threads) o Task 10
Storm UI 12
Wordcount Topology Random Sentence Spout Sentence Split Bolt Word Count Bolt 13
Example 14
Example 15
Example 16
Spitfire Distributed AkNN computation 17
Step 1 - Partitioning 90.0 -90.0 19 180.0 -180.0
Step 3 - Refinement Each node performs a local kNN using internal geographic grouping and bulk processing 21
Our approach – Step 1 SyncBolt Sort CellBounds UsersSpout 22
Our approach – Step 2 Distribute ECs Neighbors ECands 23
Our approach – Step 3 Local kNN 24