Apache Beam

What Is Apache Beam ? ● A unified programming model ● To define and execute data processing pipelines ● For ETL, batch and stream ● Open source / Apache 2.0 license ● Written in Java, Python, Go ● Cross platform support ● Pipelines define using Beam SDK's

How Does Beam Work ? ● Use provided SDK's to define pipelines ● In Java, Python, Go ● Beam SDK isolated in Docker container ● Can be run by any execution runners ● A supported group of runners execute the pipeline ● Capability matrix defines – Relative capabilities of runners – See beam.apache.org for matrix

Beam Programming Guide ? ● A guide for user to create data pipelines ● Examples in Java, Python, Go ● Can design, create and test pipelines ● Provides multi language functions for ● Pcollections ● Windowing ● Transforms ● Triggers ● Pipeline I/O ● Metrics ● Schemas ● State and Timers ● Data encoding / type safety

Beam Pipelines ● When designing pipelines consider – Where data is stored – What does the data look like – What do you want to do with the data – What does your output data look like – Where should the data go ● Use PCollection and PTransform functions to define pipelines

Beam Example Pipelines

Beam Runners ● Supported Beam Runners are – Direct Runner (test and development ) – Apache Apex – Apache Flink – Apache Gearpump – Apache Hadoop MapReduce – Apache Nemo – Apache Samza – Apache Spark – Google Cloud Dataflow – Hazelcast Jet – IBM Streams – JStorm

Beam Capability Matrix – What Computed

Beam Capability Matrix – Where Computed

Beam Capability Matrix – When Computed

Beam Capability Matrix – How Computed

Available Books ● See “Big Data Made Easy” Apress Jan 2015 – See “Mastering Apache Spark” ● Packt Oct 2015 – See “Complete Guide to Open Source Big Data Stack ● “Apress Jan 2018” – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – Connect on LinkedIn ● www.linkedin.com/in/mike-frampton-38563020 –

Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration

Apache Beam

Apache Beam

Presentation Transcript

Apache Ant

Apache Sandesha and Apache Axis2

Apache

Apache

Apache

Apache

Apache

The apache

Apache

Apache Mesos

APACHE

Apache POI

Apache Axis:

Apache

Apache

Apache

APACHE

Apache

Apache

Apache

APACHE

Apache Spark