Georeferencing in the social sciences promise and peril
This presentation is the property of its rightful owner.
Sponsored Links
1 / 12

Georeferencing in the Social Sciences – Promise and Peril PowerPoint PPT Presentation


  • 53 Views
  • Uploaded on
  • Presentation posted in: General

Georeferencing in the Social Sciences – Promise and Peril. Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate Director, Harvard-MIT Data Center Senior Research Scientist, Institute for Quantitative Social Sciences

Download Presentation

Georeferencing in the Social Sciences – Promise and Peril

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Georeferencing in the social sciences promise and peril

Georeferencing in the Social Sciences– Promise and Peril

Micah Altman

Harvard University

Archival Director, Henry A. Murray Research Archive

Associate Director, Harvard-MIT Data Center

Senior Research Scientist, Institute for Quantitative Social Sciences

E: [email protected]: http://maltman.hmdc.harvard.edu/


The structural challenges for progress in social sciences

The Structural Challenges for Progress in Social Sciences

  • Pervasive Measurement Error

  • Scattered Data

  • Controlled Experiments not Available in Many Fields

  • Weak Theory

Georeferencing in the Social Sciences -- Promise and Peril


Georeferencing can make measurements far more accurate

Georeferencing Can Make Measurements far More Accurate

  • E.g. travel, time spent exercising, commutes, time at work, agriculture, distance to voting booth

Correlation between reported and real distance to tax office.Source: [McKenzie and Sakho, 2007 as quoted in Gibsen and McKenzie,2007]

LA Voting Precincts Relocated.Source: [Hui and Brady, 2006]

Georeferencing in the Social Sciences -- Promise and Peril


Georeferencing can unify data

Georeferencing Can Unify Data

  • Establishing comparability of most social science measurements is a major undertaking

  • Yet… most social science phenomenon are unambiguously located in time and space

  • Complete georeferencing would link almost all datasets at a basic conceptual level

  • However, most social science data is not yet georeferenced … this is an engineering challenge

  • Once done, coincident concepts can be revealed …

Source: [Weeks, et al. 2007]

Georeferencing in the Social Sciences -- Promise and Peril


Can georeferencing fix experiments theory

Can Georeferencing fix Experiments Theory?

  • Not in general … although visualizations may help

Source: [Altman & McDonald 2008]

Source: [J. Snow, 1854]

Source: [Calabrese, et al 2007; Real Time Rome Project 2007]

Georeferencing in the Social Sciences -- Promise and Peril


Mountains of unified accurate data what s not to like

Mountains of Unified, Accurate Data… What’s not to like?

  • “The increasing use of linked social-spatial data has created significant uncertainties about the ability to protect the confidentiality promised to research participants... At this time, however, no known technical strategy … adequately resolves conflicts among the objectives of data linkage, open access, data quality, and confidentiality protection across datasets and data uses” -- [Panel on Confidentiality Issues Arising from the Integration of Remotely Sensed and Self-Identifying Data, National Research Council, 2007]

Georeferencing in the Social Sciences -- Promise and Peril


Can privacy problems be fixed

Can Privacy Problems be Fixed?

  • Maybe not, some challenging findings…

    • Large, sparse datasets can “leak” private information when correlated with external data. Even when significantly sub-sampled, perturbed, etc. [Narayan and Shmatikov 2008]

    • Repeated release of perturbation-masked geospatial point data leaks increasing amounts of information. Does not help to combine with aggregation masking [Zimmerman and Pavlik 2008]

    • Possible to identify other relationships in networks if you can generate seemingly innocuous relationships in same network [Backstrom, et. al 2007]

    • Pseudonymous communication can be linked through textual analysis [Tomkins et. al 2004]

    • K-anonymized data still vulnerable if homogenous, or attacker has enough background knowledge. L-diversity offered as replacement [MachanavaJJhala, et al 2007]

  • Additional anonymization challenges for geospatial data

    • Very fine grained location – versus multi-state aggregation mask required by HIPAA, and large social science surveys

    • Background knowledge very likely

      • Easy to integrate with other datasets

      • Some data points may be directly observable

    • Sequences of locations even more challenging

      • May cross aggregation units

      • Repetitive, temporally correlated

      • Induce unique networks

Georeferencing in the Social Sciences -- Promise and Peril


Managing privacy issues with digital libraries

Managing Privacy Issues With Digital Libraries

  • Embedding all sensitive data access in a digital library can greatly improve subject privacy:

    • Authentication, vetting, and access control

    • Standardized license terms governing analysis (derived from metadata and data characteristics)

    • Models can be run on-line without access to raw data

    • Monitoring and auditing of data use

    • Limit sequence of analyses by a user, in some cases ( for promising results, see [Dwork, et al 2006] )

Georeferencing in the Social Sciences -- Promise and Peril


Federated and virtually hosted digital libraries

Federated and Virtually Hosted Digital Libraries

http://dvn.iq.harvard.edu/

Georeferencing in the Social Sciences -- Promise and Peril


Summary

Summary

  • Georeferencing would (partially) solve big problems for social sciences: measurement error, data integration

  • Privacy is likely the fundamental challenge for social scientists using this data

  • Privacy problem may never be fully solved mathematically

  • Digital libraries can provide leverage for management of data privacy issues with social, legal and technical means

Georeferencing in the Social Sciences -- Promise and Peril


References

References

  • M. Altman, M.P. McDonald ,2008. “Better Automated Redistricting”, Journal of Statistical Software, Forthcoming.

  • H.E. Brady, I. Hui. 2006. Is It Worth Going the Extra Mile to Improve Causal Inference?, Political Methodology Annual Meeting, Davis.

  • L. Backstrom, C. Dwork, J. Kleinberg. Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Steganography. Proc. 16th Intl. World Wide Web Conference, 2007.

  • Calabrese F., Colonna M., Lovisolo P., Parata D., Ratti C., 2007, "Real-Time Urban Monitoring Using Cellular Phones: a Case-Study in Rome", Working paper # 1, SENSEable City Laboratory, MIT, Boston http://senseable.mit.edu/papers/, [also see the Real Time Rome Project [http://senseable.mit.edu/realtimerome/]

  • C. Dwork, F. McSherry, K. Nissim, and A. Smith, Calibrating Noise to Sensitivity in Private Data Analysis, Proceedings of the 3rd IACR Theory of Cryptography Conference, 2006

  • J. Gibson, and D. McKenzie 2007. Using Global Positioning Systems in Household Surveys for Better Economics and Better Policy, The World Bank Research Observer 22(2):217-241

  • A. MachanavaJJhala, D Kifer, J Gehrke, M. Venkitasubramaniam, 2007,"l-Diversity: Privacy Beyond k-Anonymity" ACM Transactions on Knowledge Discovery from Data, 1(1): 1-52

  • McKenzie, David, and Yaye Seynabou Sakho. 2007. “Does It Pay Firms to Register for Taxes? The Impact of Formality on Firm Profitability.” Washington, D.C: World Bank.

  • A. Narayanan and V. Shmatikov, 2008, Robust De-anonymization of Large Sparse Datasets, Proc. of 29th IEEE Symposium on Security and Privacy (Forthcoming)

  • J. Novak, P. Raghavan, A. Tomkins, 2004. Anti-aliasing on the Web, Proceedings of the 13th international conference on World Wide Web

  • Panel on Confidentiality Issues Arising from the Integration of Remotely Sensed and Self-Identifying Data, National Research Council, 2007. Putting People on the Map: Protecting Confidentiality with Linked Social-Spatial Data. National Academies Press

  • J. Snow, 1855, On the mode of communication of cholera. London

  • J.R. Weeks, A. Hill, D. Stow, A. Getis, D Fugate, 2007, "Can we spot a neighborhood from the air? Defining neighborhood structure in Accra, Ghana", GeoJournal 69(1-2): 9-22.

  • D.L. Zimmerman, C. Pavlik , 2008. "Quantifying the Effects of Mask Metadata, Disclosure and Multiple Releases on the Confidentiality of Geographically Masked Health Data", Geographical Analysis 40: 52-76

Georeferencing in the Social Sciences -- Promise and Peril


Contact information

Contact Information

http://maltman.hmdc.harvard.edu/

<[email protected]>

Georeferencing in the Social Sciences -- Promise and Peril


  • Login