130 likes | 147 Views
This presentation gives an overview of the Apache Samza project. It explains Samza's stream processing capabilities as well as its architecture, users, use cases etc. <br> <br>Links for further information and connecting<br><br>http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/<br><br>https://nz.linkedin.com/pub/mike-frampton/20/630/385<br><br>https://open-source-systems.blogspot.com/
E N D
What Is Apache Samza ? ● An asynchronous computational framework ● For distributed sub second stream processing ● Fault tolerance, isolation and stateful processing ● Open source / Apache 2.0 license ● Developed in Java and Scala ● Runs stand-alone or on YARN
Samza Use Cases ● Applications that require millisecond - second response – Streaming analytics – DDOS attack detection – Fraud detection – Metric anomaly detection – System notifications – Performance monitoring
Samza Partitioned Stream ● Samza uses streams to process data ● Collections of ordered immutable objects ● Each object uses a key-value pair ● Each stream is sharded into partitions ● This allows the architecture to scale
Samza API's ● High Level Streams API (Java) – Stream based processing API ● Low Level Task API (Java) – Message based processing API ● Table API – Random access by key data sources ● Testing Samza – Samza's testing Integration framework ● Samza SQL – Stream processing via SQL and UDF's ● Apache BEAM – Samza provides a Beam runner for application execution
Samza Architecture ● Application are broken down into tasks ● Each task consumes data from a stream partition ● Tasks are executed with containers ● A coordinator assigns tasks to containers ● Tasks checkpoint their last processed task offset ● Each task has its own state store for state management ● Samza replicates changes to local store in separate stream ● This allows later recovery of local stores
Samza Architecture ● Task container coordination
Samza Architecture ● Fault tolerance of state
Samza Architecture ● Incremental checkpointing
Samza Architecture ● State management
Available Books ● See “Big Data Made Easy” Apress Jan 2015 – See “Mastering Apache Spark” ● Packt Oct 2015 – See “Complete Guide to Open Source Big Data Stack ● “Apress Jan 2018” – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – Connect on LinkedIn ● www.linkedin.com/in/mike-frampton-38563020 –
Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration