There Is No Big Data* * Unless You Are Big Brother or Big Tech. Zachary G. Ives University of Pennsylvania and Inc. (visiting for 2 more weeks). We’ve All Heard the Story…. Google has multi PB - EB of data Facebook 10PB data warehouse (Parikh keynote)
Zachary G. Ives
University of Pennsylvania
and Inc. (visiting for 2 more weeks)
BigTech are solving scale themselves, and leading the way.
They have real data, real workloads, real $$, real machines.
MapReduce, Pregel, F1, Millwheel, Puma, Presto, …
Rowstron+ 12: even Big Tech data isn’t always BIG
What about academia, science, or “medium tech” data?
Single server-sized… But not what we want to look at: most of the data we want to process needs complementary data we don’t own!!!
284M edges, 53M entities
Many issues in “big data integration”
But: How do we convince “small data” owners they WANT to be BIG DATA?
Next step beyond provenance, responsibility, …
Credit, badges, h-index, $$$, …
“I am getting bigger and bigger”