Handling Big Data. Howles Credits to Sources on Final Slide. Handling Large Amounts of Data. Current technologies are to: Parallelize – use multiple processors or threads. Can be a single machine, or a machine with multiple processors
Creditsto Sources on Final Slide
while count == MAX
Put in buffer
while count == 0
Remove from buffer
wait (s); // 1st
wait (q); // 3rd
Assume both semaphores initialized to 1
wait (q); // 2nd
wait (s); // 4th
signal (s);Problems with Semaphores
if (some condition)
call wait() on the monitor
call signal() on the monitor
Deadlock occurs whenever a transaction T1 holds a lock on an item A and is requesting a lock on an item B and a transaction T2 holds a lock on item B and is requesting a lock on item A.
Are T3 and T4 deadlocked here?
T1 is waiting for T2 to release lock on X
T2 is waiting for T1 to release lock on Y
Deadlock: graph cycle
Pessimistic: deadlock will happen and therefore should use “preventive” measures: Deadlock prevention
Optimistic: deadlock will rarely occur and therefore wait until it happens and then try to fix it. Therefore, need to have a mechanism to “detect” a deadlock: Deadlock detection.
value = record // the text ( is the title–ignored)
words = value.split()
for w in words:
mr.emit_intermediate (w, 1)
total = 0
for v in list_of_values: // list_of_values is argprovided to function– the counts for eachword (a list of ones in this case)
total += v
What is the result?
 Chuck Lam’s “Hadoop in Action” book. Manning Publications, 2010.
 Aaron Kimball videos @Google, UWashington
 Doug Laney’s “3D Data Management: Controlling Data Volume, Velocity, and Variety”
 Alex Holmes’ “Hadoop in Practice” book.
 Jennifer Widom at Stanford