80 likes | 179 Views
This text highlights the potential negative impacts of Big Data analysis and how to mitigate them through democratizing data and analysis. It emphasizes the need for transparency, accessibility, and user-friendliness in Big Data processes to prevent misuse and harm. Big Data's rise is examined in the context of societal implications, urging for responsible handling and understanding of data. The text also stresses the importance of value over volume in Big Data applications.
E N D
Warding off Evil ofBig Data Jun Yang
39% of the experts agree… • Thanks to many changes, including the building of “the Internet of Things,” human and machine analysis of Big Data will cause more problems than it solves by 2020. The existence of huge data sets for analysis will engender false confidence in our predictive powers and will lead many to make significant and hurtful mistakes.Moreover, analysis of Big Data will be misused by powerful people and institutions with selfish agendas who manipulate findings to make the case for what they want.And the advent of Big Data has a harmful impact because it serves the majority (at times inaccurately) while diminishing the minority and ignoring important outliers.Overall, the rise of Big Data is a big negative for society in nearly all respects. — 2012 Pew Research Center Report http://pewinternet.org/Reports/2012/Future-of-Big-Data/Overview.aspx
Warding off evil—how? • “Democratize” data • Push transparency • Also make datasets easier to discover, share, cleanse, integrate… But that’s not enough… • Democratize data analysis
Make analysis easier • Say you are developing a big-data analytics platform for social scientists • Don’t force them to code in SQL or Java • Don’t force them to tune execution plans, fiddle with configuration parameters, or pick clusters • Provide just two knobs: time & money • Focus on user’s experience & cost • E.g., Google DukeCumulon
Make analysis understandable • Say you want to expose “lies, d---ed lies, and statistics” in politics, ads, and news • What datasets are useful in checking a claim? • Can you convert a vague claim to a query? • It may be “correct,” but is it “cherry-picking”? • How do you convince your audience? • There are in fact plenty of “core” database problems • E.g., Google Duke database computational journalism
So what’s “big” about big data? • Yeah there are the big volume, big velocity, big variety, and big variability • But ultimately it’s about big value • Not just to big companies and governments • But to us all • Ward off evil by democratizing data & analysis!