1 / 14

HadoopDB project

HadoopDB project. An Architetural hybrid of MapReduce and DBMS Technologies for Analytical Workloads Anssi Salohalla. Background. Amount of data that needs to be stored for analyzing is exploding On the other hand, analyzing performance can’t be compromized despite the increase in data amount

minowa
Download Presentation

HadoopDB project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HadoopDB project An Architetural hybrid of MapReduce and DBMS Technologies for Analytical Workloads Anssi Salohalla

  2. Background • Amount of data that needs to be stored for analyzing is exploding • On the other hand, analyzing performance can’t be compromized despite the increase in data amount • Efficient high-end proprietary machines are expensive

  3. Parallel databases • Shared-nothing MPP architecture (a collection of independent machines, each with local hard disk and main memory, connected together on high-speed network) • Machines are cheaper, lower-end, commodity hardware • Scales well up to a point, tens of nodes • Good performance • Poor fault tolerance • Problems with heterogeneous environment (machines must be equal in performance) • Good support for flexible query interface

  4. MapReduce systems • Cheap • Scales well to thousands of nodes • Good support for heterogeneous environment • Good fault tolerance • Performance issues compared to parallel DBs • Generally no support for SQL (excluding eg. Hive)

  5. What is HadoopDB • Recent study at Yale University, Database Research Dep. • Hybrid architecture of parallel databases and MapReduce system • The idea is to combine the best qualities of both technologies • Multiple single-node databases are connected using Hadoop as the task coordinator and network communication layer • Queries are distributed across the nodes by MapReduce framework, but as much work as possible is done in the database node

  6. HadoopDB architecture Reference: Azza Abouzeid, Kamil BajdaPawlikowski, Daniel Abadi, Avi Silberschatz, Alexander Rasin. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

  7. Desired properties of HadoopDB • Performance • Fault tolerance • Support for heterogeneous environment • Flexible query interface

  8. Study benchmark systems • Hadoop system • HadoopDB • Vertica • DBMS-X

  9. Benchmark tasks • Data loading • Grep task • Selection task • Aggregation task • Join task • UDF Aggregation task • Fault tolerance and heterogeneous environment

  10. Results 1/2 Reference: Azza Abouzeid, Kamil BajdaPawlikowski, Daniel Abadi, Avi Silberschatz, Alexander Rasin. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

  11. Results 2/2 Reference: Azza Abouzeid, Kamil BajdaPawlikowski, Daniel Abadi, Avi Silberschatz, Alexander Rasin. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

  12. Conclusions • HadoopDB is close in performance to parallel databases • HadoopDB is able to operate in truly heterogeneous environment and has the fault tolerance of Hadoop environment • Equal licensing costs to Hadoop • Better performance expected in future

  13. Further reading • HadoopDB Project. Web page: http://db.cs.yale.edu/hadoopdb/hadoopdb.html • Azza Abouzeid, Kamil BajdaPawlikowski, Daniel Abadi, Avi Silberschatz, Alexander Rasin. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads • Hadoop Project. Hadoop Cluster Setup. Web page: http://hadoop.apache.org/core/docs/current/cluster_setup.html .

  14. Questions?

More Related