130 likes | 153 Views
This presentation gives an overview of the Apache Beam project. It shows that it is a means of developing generic data pipelines in multiple languages using provided SDK's. The pipelines execute on a range of supported runners/executors. <br> <br>Links for further information and connecting<br><br>http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/<br><br>https://nz.linkedin.com/pub/mike-frampton/20/630/385<br><br>https://open-source-systems.blogspot.com/
E N D
What Is Apache Beam ? ● A unified programming model ● To define and execute data processing pipelines ● For ETL, batch and stream ● Open source / Apache 2.0 license ● Written in Java, Python, Go ● Cross platform support ● Pipelines define using Beam SDK's
How Does Beam Work ? ● Use provided SDK's to define pipelines ● In Java, Python, Go ● Beam SDK isolated in Docker container ● Can be run by any execution runners ● A supported group of runners execute the pipeline ● Capability matrix defines – Relative capabilities of runners – See beam.apache.org for matrix
Beam Programming Guide ? ● A guide for user to create data pipelines ● Examples in Java, Python, Go ● Can design, create and test pipelines ● Provides multi language functions for ● Pcollections ● Windowing ● Transforms ● Triggers ● Pipeline I/O ● Metrics ● Schemas ● State and Timers ● Data encoding / type safety
Beam Pipelines ● When designing pipelines consider – Where data is stored – What does the data look like – What do you want to do with the data – What does your output data look like – Where should the data go ● Use PCollection and PTransform functions to define pipelines
Beam Runners ● Supported Beam Runners are – Direct Runner (test and development ) – Apache Apex – Apache Flink – Apache Gearpump – Apache Hadoop MapReduce – Apache Nemo – Apache Samza – Apache Spark – Google Cloud Dataflow – Hazelcast Jet – IBM Streams – JStorm
Available Books ● See “Big Data Made Easy” Apress Jan 2015 – See “Mastering Apache Spark” ● Packt Oct 2015 – See “Complete Guide to Open Source Big Data Stack ● “Apress Jan 2018” – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – Connect on LinkedIn ● www.linkedin.com/in/mike-frampton-38563020 –
Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration