Georeferencing in the social sciences promise and peril
This presentation is the property of its rightful owner.
Sponsored Links
1 / 12

Georeferencing in the Social Sciences – Promise and Peril PowerPoint PPT Presentation


  • 55 Views
  • Uploaded on
  • Presentation posted in: General

Georeferencing in the Social Sciences – Promise and Peril. Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate Director, Harvard-MIT Data Center Senior Research Scientist, Institute for Quantitative Social Sciences

Download Presentation

Georeferencing in the Social Sciences – Promise and Peril

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Georeferencing in the Social Sciences– Promise and Peril

Micah Altman

Harvard University

Archival Director, Henry A. Murray Research Archive

Associate Director, Harvard-MIT Data Center

Senior Research Scientist, Institute for Quantitative Social Sciences

E: [email protected]: http://maltman.hmdc.harvard.edu/


The Structural Challenges for Progress in Social Sciences

  • Pervasive Measurement Error

  • Scattered Data

  • Controlled Experiments not Available in Many Fields

  • Weak Theory

Georeferencing in the Social Sciences -- Promise and Peril


Georeferencing Can Make Measurements far More Accurate

  • E.g. travel, time spent exercising, commutes, time at work, agriculture, distance to voting booth

Correlation between reported and real distance to tax office.Source: [McKenzie and Sakho, 2007 as quoted in Gibsen and McKenzie,2007]

LA Voting Precincts Relocated.Source: [Hui and Brady, 2006]

Georeferencing in the Social Sciences -- Promise and Peril


Georeferencing Can Unify Data

  • Establishing comparability of most social science measurements is a major undertaking

  • Yet… most social science phenomenon are unambiguously located in time and space

  • Complete georeferencing would link almost all datasets at a basic conceptual level

  • However, most social science data is not yet georeferenced … this is an engineering challenge

  • Once done, coincident concepts can be revealed …

Source: [Weeks, et al. 2007]

Georeferencing in the Social Sciences -- Promise and Peril


Can Georeferencing fix Experiments Theory?

  • Not in general … although visualizations may help

Source: [Altman & McDonald 2008]

Source: [J. Snow, 1854]

Source: [Calabrese, et al 2007; Real Time Rome Project 2007]

Georeferencing in the Social Sciences -- Promise and Peril


Mountains of Unified, Accurate Data… What’s not to like?

  • “The increasing use of linked social-spatial data has created significant uncertainties about the ability to protect the confidentiality promised to research participants... At this time, however, no known technical strategy … adequately resolves conflicts among the objectives of data linkage, open access, data quality, and confidentiality protection across datasets and data uses” -- [Panel on Confidentiality Issues Arising from the Integration of Remotely Sensed and Self-Identifying Data, National Research Council, 2007]

Georeferencing in the Social Sciences -- Promise and Peril


Can Privacy Problems be Fixed?

  • Maybe not, some challenging findings…

    • Large, sparse datasets can “leak” private information when correlated with external data. Even when significantly sub-sampled, perturbed, etc. [Narayan and Shmatikov 2008]

    • Repeated release of perturbation-masked geospatial point data leaks increasing amounts of information. Does not help to combine with aggregation masking [Zimmerman and Pavlik 2008]

    • Possible to identify other relationships in networks if you can generate seemingly innocuous relationships in same network [Backstrom, et. al 2007]

    • Pseudonymous communication can be linked through textual analysis [Tomkins et. al 2004]

    • K-anonymized data still vulnerable if homogenous, or attacker has enough background knowledge. L-diversity offered as replacement [MachanavaJJhala, et al 2007]

  • Additional anonymization challenges for geospatial data

    • Very fine grained location – versus multi-state aggregation mask required by HIPAA, and large social science surveys

    • Background knowledge very likely

      • Easy to integrate with other datasets

      • Some data points may be directly observable

    • Sequences of locations even more challenging

      • May cross aggregation units

      • Repetitive, temporally correlated

      • Induce unique networks

Georeferencing in the Social Sciences -- Promise and Peril


Managing Privacy Issues With Digital Libraries

  • Embedding all sensitive data access in a digital library can greatly improve subject privacy:

    • Authentication, vetting, and access control

    • Standardized license terms governing analysis (derived from metadata and data characteristics)

    • Models can be run on-line without access to raw data

    • Monitoring and auditing of data use

    • Limit sequence of analyses by a user, in some cases ( for promising results, see [Dwork, et al 2006] )

Georeferencing in the Social Sciences -- Promise and Peril


Federated and Virtually Hosted Digital Libraries

http://dvn.iq.harvard.edu/

Georeferencing in the Social Sciences -- Promise and Peril


Summary

  • Georeferencing would (partially) solve big problems for social sciences: measurement error, data integration

  • Privacy is likely the fundamental challenge for social scientists using this data

  • Privacy problem may never be fully solved mathematically

  • Digital libraries can provide leverage for management of data privacy issues with social, legal and technical means

Georeferencing in the Social Sciences -- Promise and Peril


References

  • M. Altman, M.P. McDonald ,2008. “Better Automated Redistricting”, Journal of Statistical Software, Forthcoming.

  • H.E. Brady, I. Hui. 2006. Is It Worth Going the Extra Mile to Improve Causal Inference?, Political Methodology Annual Meeting, Davis.

  • L. Backstrom, C. Dwork, J. Kleinberg. Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Steganography. Proc. 16th Intl. World Wide Web Conference, 2007.

  • Calabrese F., Colonna M., Lovisolo P., Parata D., Ratti C., 2007, "Real-Time Urban Monitoring Using Cellular Phones: a Case-Study in Rome", Working paper # 1, SENSEable City Laboratory, MIT, Boston http://senseable.mit.edu/papers/, [also see the Real Time Rome Project [http://senseable.mit.edu/realtimerome/]

  • C. Dwork, F. McSherry, K. Nissim, and A. Smith, Calibrating Noise to Sensitivity in Private Data Analysis, Proceedings of the 3rd IACR Theory of Cryptography Conference, 2006

  • J. Gibson, and D. McKenzie 2007. Using Global Positioning Systems in Household Surveys for Better Economics and Better Policy, The World Bank Research Observer 22(2):217-241

  • A. MachanavaJJhala, D Kifer, J Gehrke, M. Venkitasubramaniam, 2007,"l-Diversity: Privacy Beyond k-Anonymity" ACM Transactions on Knowledge Discovery from Data, 1(1): 1-52

  • McKenzie, David, and Yaye Seynabou Sakho. 2007. “Does It Pay Firms to Register for Taxes? The Impact of Formality on Firm Profitability.” Washington, D.C: World Bank.

  • A. Narayanan and V. Shmatikov, 2008, Robust De-anonymization of Large Sparse Datasets, Proc. of 29th IEEE Symposium on Security and Privacy (Forthcoming)

  • J. Novak, P. Raghavan, A. Tomkins, 2004. Anti-aliasing on the Web, Proceedings of the 13th international conference on World Wide Web

  • Panel on Confidentiality Issues Arising from the Integration of Remotely Sensed and Self-Identifying Data, National Research Council, 2007. Putting People on the Map: Protecting Confidentiality with Linked Social-Spatial Data. National Academies Press

  • J. Snow, 1855, On the mode of communication of cholera. London

  • J.R. Weeks, A. Hill, D. Stow, A. Getis, D Fugate, 2007, "Can we spot a neighborhood from the air? Defining neighborhood structure in Accra, Ghana", GeoJournal 69(1-2): 9-22.

  • D.L. Zimmerman, C. Pavlik , 2008. "Quantifying the Effects of Mask Metadata, Disclosure and Multiple Releases on the Confidentiality of Geographically Masked Health Data", Geographical Analysis 40: 52-76

Georeferencing in the Social Sciences -- Promise and Peril


Contact Information

http://maltman.hmdc.harvard.edu/

<[email protected]>

Georeferencing in the Social Sciences -- Promise and Peril


  • Login