Joins in Hadoop. Gang and Ronnie. Agenda. Introduction of new types of joins Experiment results Join plan generator Summary and future work. Problem at hand. Map join (fragment-duplicate join). Fragment (large table). Map tasks:. Duplicate (small table).
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Gang and Ronnie
Fragment (large table)
§there are 64 nodes in our cluster, and distributed cache will copy the data no more than that amount of time
Out Of Memory Exception!
New Map Joins:
small table as: duplicate
large table as: fragment
Problem? - Reading large table multiple times!
Problem? – not really a Map job…
Problem? – Probing a hashtable on disk might take much time!
Problem? – Step 1 and 2 have overhead!
Some results ignored
Default Map Join?
Reversed Map Join /
Default Reduce side Join