130 likes | 155 Views
This article delves into the dynamics of relationships within crowdsourcing, examining the notions of vertices (persons) and edges (relationships). It analyzes sampling biases, demographic baselines, and behavioral economic factors in crowd behavior. The text also explores the challenges of hiding in plain sight, re-identification risks, and the implications of crowdsourcing for privacy. Furthermore, it discusses the impact of knowledge of observation on behavior and the complexities of using data to inform decision-making. The article concludes by reflecting on the intricacies of crowd dynamics and the need for informed consent in research.
E N D
3 is not a crowd, it’s an anecdote Jon Crowcroft, C2B(I)D’15 http://www.cl.cam.ac.uk/~jac22
We know vertex is person – what’s edge? • Relationship • Kin, friend, colleague, ship-in-the-night • External evidence (registry, dna, HR) • Co-lo • Shared air, drink/food, touch? • In same record (video/photo/ticket?) • Communicated • Messaged, liked/mention/comment etc • Wrote a letter…
3 things • Sampling the Crowd? • Lost in the Crowd? • Same old Crowd?
Samples • Selection bias • Has an Interweb Gadget • WEIRD • Recruitment smallworld re-enforcement • Demographic baseline • Reward (and punishment) • Halo effects, Focussing Illusion • Other behavioural economic fails…
Ground Truth of Sample • Takes old fashioned social science • Diaries/Interviews etc, but beware • Representativeness Bias • Priors, sample size • Regression • Retrievability • Imaginability • exponentials, long range dependence etc • General Anchoring effects… http://psiexp.ss.uci.edu/research/teaching/Tversky_Kahneman_1974.pdf
Sample Skew • Sometimes, you’re not allowed … • FluPhone – H1N1 Epidemic • Self report symptoms • Phone app tracks location/encounters • Build spatial/temporal model of SIR…. • IRB ruled • No location (privacy risk) • No kids (informed consent “hard”) • Nulls most the experiment… … …
Hiding in Plain Sight • Or seeing the wood for the trees • Human’s not good at this, but s/w v. good • Re-identification from power of 4 • External data sets • Yellowcab driver and celeb passengers • Massachusetts health records • Fb loan advice • Other stuff already out there….be aware • Human’s are wired to infer stuff from 3 • But society used to be wired to hide… • Stuff that crowdsourcing may reveal • But humans are very bad at some stuff
Hiding in plain sight • Humans are bad at getting • Small World (6 degrees) • Exponentials (2x grains rice on next square) • Large deviations (black swan) • Makes informed consent nonsense • Violate principal of least astonishment • How to train the public? • C.f. Thinking fast, and slow… • Cooling off period & Examples…?
Crowds from both sides now (looking at) • Reproduceable, not repeatable • Science and Business want evidence • To make better decisions in future • So model from data has to have persistence • Hence, needs to be checked • Need to assume no observer bias • as well as sample problems discussed before • So Big Data isn’t necessarily big • Its lots of small, representative, samples • Over time… … … and yet the world changes
Psychohistory • Knowledge you’re being observed can • Lead to change of behaviour • Well known in stock market • Also google flu search term #2 • Also in Crowd funding… • We built tool to “predict” campaign success • If investors all use that tool, I predict: • It won’t work as well • Oh, outcome was roughly what you expect
4 arguments for TV elim… • conditions us to accept someone else’s authority • facilitates consolidated power through the colonization of experience. • physically conditions us for authoritative rule • inherent biases of TV https://en.wikipedia.org/wiki/Four_Arguments_for_the_Elimination_of_Television
Conclusions, Discussion • 4 arguments (for the elimination of crowd*) - I hope you agree these are not independent: • You really can't summarize complex information • Nielson->$ • Interweb->TV • Crowding • Live long and prosper