1 / 9

Vertica to HDFS Capstone Project

Vertica to HDFS Capstone Project. Tharanga Gamaethige, Engineer, Data Management, Vertica. University of Pittsburgh August30th, 2013. Agenda. What is Vertica Bridge from Vertica to HDFS S uccess criteria Benefits to you. What Is Vertica.

eris
Download Presentation

Vertica to HDFS Capstone Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Vertica to HDFS Capstone Project Tharanga Gamaethige, Engineer, Data Management, Vertica University of Pittsburgh August30th, 2013

  2. Agenda • What is Vertica • Bridge from Vertica to HDFS • Success criteria • Benefits to you

  3. What Is Vertica • Founded in 2005 by database researcher Michael Stonebraker and a small group of engineers • Acquired by Hewlett Packard on March 2011.

  4. What Is Vertica Speed • SQLDatabase for Real-time Analytics • Runs on x86hardware • MPP Columnar Architecture – scales to PBs! • Reduced footprint via Advanced Compression • Extensible analytics capabilities • Easy to setup and use • Elastic - grow/shrink as needed • Extensive Ecosystem of analytic tools Scale Simplicity

  5. Bridge from Vertica to HDFS HDFS cluster Vertica database cluster • Use as a database to database export tool. • Export data from Vertica tables into external targets e.g. to HDFS • Extensible to facilitate different data formats, storage formats and data targets.

  6. Bridge from Vertica to HDFS HDFS cluster Vertica database cluster • Pipe delimited • ORC file • Etc. • Zip • TAR • Etc. • HDFS • File system • Etc.

  7. Success criteria Plugin that can read data from Vertica tables and export into an external target. E.g. HDFS cluster. Design the plugin to be scalable to export terabytes of data. Design the plugin to be extensible to support different data formats (pipe delimited, ORC files, etc.), storage formats (zip, tar, plain data, etc.) and data targets (HDFS, QFS, etc.)

  8. Benefits to you • Get hands-on experience in using Vertica and HDFS. • Learn to provide real-life design and implementation for extensibility, in the face of big data and distributed processing. • Recognition of being part of the open source community. • Potential recognition from Vertica’s 1000s of customers. • Most importantly free espressos, t-shirts and a coffee mug.

  9. Thanks! Tharanga Gamaethige : tgamaethige@vertica.com Sennott Square 5404

More Related