1 / 10

HiVe rtica Capstone Project

HiVe rtica Capstone Project. Stephen Walkauskas, Architect, Data Management, Vertica. University of Pittsburgh January 11, 2013. Contact info. Stephen Walkauskas swalkauskas@vertica.com. Vertica culture. What Is Vertica. Speed. SQL Database for Real-time Analytics

eldon
Download Presentation

HiVe rtica Capstone Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HiVertica Capstone Project Stephen Walkauskas, Architect, Data Management, Vertica University of Pittsburgh January 11, 2013

  2. Contact info Stephen Walkauskas swalkauskas@vertica.com

  3. Vertica culture

  4. What Is Vertica Speed • SQLDatabase for Real-time Analytics • Runs on x86hardware • MPP Columnar Architecture – scales to PBs! • Reduced footprint via Advanced Compression • Extensible analytics capabilities • Easy to setup and use • Elastic - grow/shrink as needed • Extensive Ecosystem of analytic tools Scale Simplicity

  5. Map/Reduce

  6. -- HQL SELECT a.val1, a.val2, b.val, c.val FROM a JOIN b ON (a.key = b.key) LEFT OUTER JOIN c ON (a.key = c.key)

  7. HiVertica

  8. HiVertica Write code to read Hive / HCatalog meta-data and generate DDL to create corresponding external tables (ETs) in a Vertica DB. Configure ETs with files referenced by the corresponding Hive tables. Vertica ships a connector to source files from hdfs. Using this connector the aforementioned ETs can be used to query data in Hive (assuming data is in a format Vertica can parse).

  9. HiVertica Vertica supports User Defined Parsers (you can write your own csv parser if you’re so inclined). RCFile is commonly used to store data in Hive. It would be useful to be able to parse that format in a VerticaUDParser. d) Find that place in Hive where it compiles HQL into M/R jobs and instead rename the HQL to SQL and, leveraging the above features, send the query to Vertica instead. The two systems are not 100%; we can tweak them to shrink the feature gap.

  10. Thanks!

More Related