1 / 13

Apache Airflow

This presentation gives an overview of the Apache Airflow project. It explains Apache Airflow in terms of it's pipelines, tasks, integration and UI. <br> <br>Links for further information and connecting<br><br>http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/<br><br>https://nz.linkedin.com/pub/mike-frampton/20/630/385<br><br>https://open-source-systems.blogspot.com/<br>

semtechs
Download Presentation

Apache Airflow

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What Is Apache Airflow ? ● A work flow management platform ● Uses Python based work flows ● Schedule by time or event ● Open source Apache 2.0 license ● Written in Python ● Monitor work flows in UI ● Has a wide range of integration options ● Originally developed at Airbnb

  2. What Is Apache Airflow ? ● Uses SqlLite as a back end DB but can use – MySQL, Postgres, JDBC etc ● Install extra packages using pip command – Wide variety available, includes – Many databases, cloud services – Hadoop eco system – Security, web services, queues – Many more

  3. Airflow Pipelines ● These are Python based work flows ● Are actually directed acyclic graphs ( DAG's ) ● Pipelines use Jinja templating ● Pipelines contain user defined tasks ● Tasks can run on different workers at different times ● Jinja scripts can be embedded in tasks ● Comments can be added in tasks in varying formats ● Inter task dependencies can be defined

  4. Airflow Pipelines

  5. Airflow Tasks ● Tasks have a lifecycle ● Tasks use operators to execute, depends upon type – For instance MySqlOperator ● Hooks are used to access external systems i.e. databases ● Worker specific queues can be used for tasks ● Xcom allows tasks to exchange messages ● Pipelines or DAG's allow – Branching – Sub DAG's – Service level agreements ( SLA ) – Triggering rules

  6. Airflow Task Stages ● Tasks have life cycle stages

  7. Airflow Task Life Cycle

  8. Airflow UI ● Airflow UI provides views – DAG, Tree, Graph, Variables, Gantt Chart – Task duration, Code view ● Select a task instance in any view to manage ● Monitor and troubleshoot pipelines in views ● Monitor DAG's by owner, schedule, run time etc ● Use views to find pipeline problem areas ● Use views to find bottle necks

  9. Airflow UI

  10. Airflow Integration ● Airflow Integrates with – Azure: Microsoft Azure – AWS: Amazon Web Services – Databricks – GCP: Google Cloud Platform – Cloud Speech Translate Operators – Qubole ● Kubernetes – Run tasks as pods

  11. Airflow Metrics ● Airflow can send metrics to StatsD – A network daemon that runs on Node.js – Listens for statistics, like counters, gauges, timers – Statistics sent over UDP or TCP ● Install metrics using pip command ● Specify which stats to record i.e. – scheduler,executor,dagrun

  12. Available Books ● See “Big Data Made Easy” Apress Jan 2015 – See “Mastering Apache Spark” ● Packt Oct 2015 – See “Complete Guide to Open Source Big Data Stack ● “Apress Jan 2018” – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – Connect on LinkedIn ● www.linkedin.com/in/mike-frampton-38563020 –

  13. Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration

More Related