MapReduce VS Parallel DBMSs. Presenter: Ran Ding. G uideline. 1. Introduction 2. Where the MR wins 3. DBMS “sweet spot” tests 4. Why the Parallel DBMS wins 5. C onclusion. Introduction-----MR.
MapReduce VS Parallel DBMSs
An Image/Link below is provided (as is) to download presentation
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
MR system can be considered a general-purpose parallel ETL system.
DBMSs may perform the ETL
Cannot be structured as single SQL aggregate queries
MR is a good candidate
MR systems are good at processing the data is prepared for loading into a back-end system
DBMS requires wide tables with many attributes
Plus, MR-style systems are easily store and process
DBMS need the programmer write the schema then load
MR just copy!
MR is basically open sourcefor free
Parallel DBMS: huge cost
DBMS “Sweet Spot” Test
Why the Parallel DBMS wins
1. Repetitive record parsing
5. Column-oriented storage
Repetitive record parsing
Parsing task requires each Map and Reduce task repeatedly parse and convert string fields into the appropriate type
Records are parsed by DBMSs when the data is initially loaded.
It is hard to say……..
Commercial DBMSs may use carefully tuned compression algorithms
In parallel DBMS, data is streamed from producer to consumer
the intermediate data is never written to disk
In MR system, it writes the result to local data structure, and consumers read from it
In a parallel DBMS, every node knows what it should do
MR system is scheduled on processing nodes one storage block at a time.
Reads only the attributes necessary for solving the user query
DBMS-X and Hadoopare both row stores
What should MR learn from Parallel DBMS
MR advocates should learn from parallel DBMS the technologies and techniques for efficient query parallel execution.
MR systems are powerful tools for ETL-style applications and for complex analytics. If the application is query-intensive, whether semi structured or rigidly structured, then a DBMS is probably the better choice