1 / 14

Apache Tajo

This presentation gives an overview of the Apache Tajo project. It explains Tajo architecture in relation to Hadoop/Hive and ETL. <br> <br>Links for further information and connecting<br><br>http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/<br><br>https://nz.linkedin.com/pub/mike-frampton/20/630/385<br><br>https://open-source-systems.blogspot.com/

semtechs
Download Presentation

Apache Tajo

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What Is Apache Tajo ? ● A data warehouse system ● Open source / Apache 2.0 license ● Stores data on HDFS and others ● For low latency big data queries / ETL ● Supports SQL ● No release since May 2016

  2. Tajo Catalog Storage ● Tajo can store catalog information in – Apache Derby ( default ) – MySQL – MariaDB – In-memory – Hive Catalog / HiveMetaStore ● Derby is the default with storage under /tmp

  3. Tajo Data Storage ● Tajo can store data in the following locations – HDFS – Amazon S3 – Openstack Swift – Hbase – RDBMS ● It is also possible to register user defined storage – Place user define jar file in tajo/extlib – Copy modified conf/storage-site.json.template into conf/storage-site.json

  4. Tajo Shell ● Tajo provides a shell for instance manipulation – Issue meta commands i.e. \l ( list db ) – Issue HDFS commands – Use \set to set session variables – Issue \admin administration commands – Issues commands interactively or batch – Run as a background process

  5. Tajo Cluster Architecture ● A Tajo cluster has – One or more TajoMaster servers – One or more TajoWorker servers ● TajoMaster coordinates TajoWorkers ● TajoWorkers carry out processing ● More TajoWorkers mean more processing capacity ● Capacity scales linearly

  6. Tajo Cluster Architecture

  7. Tajo TajoMaster Architecture ● A TajoMaster process has a – QueryCoordinator ●Decides whether each query should be executed in a distributed way or be executed immediately in TajoMaster – Resource Tracker ●Manages membership of cluster nodes – Client Service Provider ●Routes client API calls to proper QueryCoordinator or ResourceTracker

  8. Tajo TajoWorker Architecture ● A TajoWorker process has a – NodeResourceManager ●Manages resource of worker node – TaskManager ●Launches task to the TaskExecutor ●Uses multiple threads equal to the number of cpu cores – TaskExecutor ●Creates TaskContainers for workload – NodeStatusUpdater ●Updates the current status when resources change

  9. Tajo Table Spaces ● Tajo supports Table Spaces – Data may be stored in multiple locations – i.e. HDFS, S3, or Hbase – It might be stored in multiple formats – i.e. CSV, Parquet, or ORC ● TableSpaces provide a way to – Easily handle data stored on different storage types – In various file formats

  10. Tajo Table Spaces

  11. Tajo Table Spaces ● Multiple tablespaces exist for a data source ● A tablespace contains multiple tables while a table has only one tablespace ● External tables don't have any tablespaces because they have their own storage information ● A database can contain tables of different tablespaces

  12. Tajo Table Spaces

  13. Available Books ● See “Big Data Made Easy” Apress Jan 2015 – See “Mastering Apache Spark” ● Packt Oct 2015 – See “Complete Guide to Open Source Big Data Stack ● “Apress Jan 2018” – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – Connect on LinkedIn ● www.linkedin.com/in/mike-frampton-38563020 –

  14. Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration

More Related