150 likes | 239 Views
This presentation attempts to give an overview of the Apache NiFi project. I had intended to specifically examine the registry but found that there was more to say about Nifi itself. It does examine the Registry project as well as extensions and a possible registry for that area. <br> <br><br>Links for further information and connecting<br><br>http://www.semtech-solutions.co.nz<br><br>http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/<br><br>https://nz.linkedin.com/pub/mike-frampton/20/630/385
E N D
What Is Apache NiFi ? ● A data flow automation system maintained by Cloudera ● Written in Java ● Open source / Apache 2 License ● Cluster based and scaleable ● Has web based user interface ● Widely extendable ● Offers data flow monitoring
How does Nifi work ? ● NiFi runs in JVM on servers in cluster ● Uses ZooKeeper for configuration/coordination – One node as a Cluster Coordinator – One node as a primary ● JVM encapsulates – Web server – Processor / Extensions – Repositories for ●FlowFile / Content / Data Provenance
Nifi Architecture ● Web Server for monitoring and administration ● Flow controller manages extensions and resources ● FlowFile processor 1 .. N – actual data flow worker – Each processor supports NiFi data flow ● Extensions allow remote system connectivity – Can be user defined ● FlowFile Repo – tracks and maintains current flows ● Content Repo – maintains data in transit ● Provenance Repo – historic data flow information
NiFi Flow Management ● Guaranteed data delivery ● Uses write ahead logs and content repositories ● Queue buffering / back pressure ● Queue priority configuration ● Flow configuration ( latency / throughput ) ● UI based data flow builds ● UI based data flow monitoring ● UI based data provenance
NiFi Cluster ● Nifi Can act in cluster mode, configured by ZooKeeper ● Each node works on a different set of data ● ZooKeeper – Elects a single cluster coordinator node – Handles node fail over ● Cluster coordinator manages cluster membership ● ZooKeeper elects a node as a DataFlow manager
NiFi Repository Storage ● All repository storage is pluggable ● Storage could be change by user defined development ● The default is file system storage with – Multiple file system locations used – Multiple physical partitions used – RAID configurations to optimize I/O ● Archiving available for the content repository – Deletion is automatic and configurable
NiFi Extensions ● Extensions are stored in Nifi Archives ( NAR's ) ● Points of extension include can be – processors, Controller Services, Reporting Tasks, Prioritizers, and Customer User Interfaces ● See these example NAR's by Frank Sauer – For InfluxDB access – JSON transformation – https://github.com/fsauer65/NiFi-Extensions
What Is Apache NiFi Registry ? ● A subproject of Apache NiFi ● For storage and management of shared resources ● Across one or more instances of NiFi and/or MiNiFi ● Offers version control for flows ● Define users, groups and policies for flows ● Support for Linux, Unix and Mac OS X
NiFi Extension Registry ● There was also an extension registry proposal in 2016 ● Prototyped by Puspendu Banerjee ● Created on github at https://github.com/PuspenduBanerjee/nifi/tree/NIFI-ExtRegistry ● ● Seems like a good idea ● A central location for extensions ● But no update since 2016 – For proposal or prototype
Available Books ● See “Big Data Made Easy” Apress Jan 2015 – See “Mastering Apache Spark” ● Packt Oct 2015 – See “Complete Guide to Open Source Big Data Stack ● “Apress Jan 2018” – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – Connect on LinkedIn ● www.linkedin.com/in/mike-frampton-38563020 –
Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration