Introduction

176 Views

Download Presentation
## Introduction

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Sensitive Data In a Wired WorldNegative Representations of**DataStephanie Forrest Dept. of Computer ScienceUniv. of New MexicoAlbuquerque, NM http://cs.unm.edu/~forrestforrest@cs.unm.edu**Introduction**• Goal: Develop new approaches to data security and privacy that incorporate design principles from living systems: • Survivability and evolvability • Autonomy • Robustness, adaptation and self repair • Diversity • Extends earlier work on computational properties of the immune system: • Intrusion detection • Automated response • Collaborative information filtering**Project Overview**• Immunology and data: • Negative representations of information • Epidemiology and the Internet: • Social networks matter • The real world is not always scale free • The social utility of privacy: • Why is privacy an important value in democratic societies? • Evolutionary perspective**Collaborations**• Paul Helman and Cris Moore (UNM) • Robert Axelrod and Mark Newman (Univ. Michigan) • Matthew Williamson (Sana Security) • Rebecca Wright and Michael de Mare (Stevens) • Joan Feigenbaum and Avi Silberschatz (Yale) • Fernando Esponda’s post-doc next year.**How the Immune System Distributes Detection**• Many small detectors matching nonself (negative detection). • Each detector matches multiple patterns (generalization). • Advantages of distributed negative detection: • Localized (no communication costs) • Scalable and tunable • Robust (no single point of failure) • Private**Applications to Computing**• Anomaly detectorsearlier work • Information filtersearlier work • Adaptive queriesfuture • Negative representationsin progress • A positive set DB is a set of fixed length strings. • A negative set NDB represents all the strings not in DB. • Intuition: If an adversary obtains a string from NDB, little information is revealed. Example: • U= All possible four character strings • DB={juan, eric, dave} • U-DB={aaaa, aaab, cris, john, luca, raul, tehj, tosh,.…} • There are 264-3= 456973 strings in U-DB.**Results**• Can U-DB be represented efficiently, given |U-DB| >> |DB| ? • YES: There is an algorithm that creates an NDB of size polynomial in DB. • Strategy: Compress information using don’t care symbol. Other representations? • What properties does the representation have? • Membership queries are tractable (linear time even without indexing). • Other queries, information leakage are future work. • Inferring information from a subset of NDB (next slide). • Inferring DB from NDB is NP-Hard (note: not doing crypto): • Currently investigating instance difficulty. • Algorithms for increasing instance difficulty. • On-line insert/delete algorithms preserve problem difficulty. • Collaborations with R. Wright, M. de Mare, and C. Moore.**What information is revealed by queries?(without assuming**irreversibility) • Having access to a subset of NDB (or DB) yields some information about strings outside that subset: • Assume NDB (or DB) is partitioned into n subsets. • To the query “Is x in DB,” what do I learn about x if x is not in my subset? • Must consult n subsets of NDB to conclude that x is in DB. • Must consult the subsets only until x is found (on average n/2). • Assumes that we care more about DB than U-DB. Probability and information content as the membership of strings is revealed. DB contains 10% of all possible L-length strings (formulas).**Private Set Intersection**• Determine which records are in the intersection of several databases i.e. • DB1 DB2 … DBn • (NDB1 NDB2 … NDBn) • Each party may compute the intersection • DBi (NDB1 NDB2 … NDBn) • Party i learns only the intersection of all the sets, • And not the cardinality of the other sets.**Results cont.**• How might these properties be useful? • Protect data from insider attacks • Computing set intersections • Surveys involving sensitive information • Anonymous digital credentials • Fingerprint databases • Other ideas? • Prototype implementations: • Perl, C • http://esa.ackleyshack.com/ndb • See demo**Computer EpidemiologyJustin Balthrop, Mark Newman, Matt**Williamson • Information spreads over networks of social contacts between computers: • Email address books. • URL links. • Network topology affects the rate and extent of spreading: • Epidemiological models, and the epidemic threshold. • Controlling spread on scale-free networks: • Random vaccination is ineffective (e.g., anti-virus software). • Targeted vaccination of high-connectivity nodes. • Control degree distribution in time rather than space. Science 304:527-529 (2004)**The Social Utility of PrivacyRobert Axelrod and Ryan Gerety**• Typical framing: • Privacy values should remain as is (e.g., Lessig). • Individual rights vs. state (i.e., civil liberties vs. community safety / crime). • A community may have its own interest in defending individual privacy (and not), independent of the civil liberties argument: • To promote innovation in changing environments. • To cope with distortions (e.g., overconfidence of middle managers). • To compensate for overgeneralized norms. • Not necessarily advocating more privacy: • From a societal/informational point of view how should appropriate bounds on privacy be determined? • Current status: • Exploratory modeling based on simple games.**Next Steps: Negative Representations**• Distributed negative representations • Leaking partial information • Relational algebra operators on the negative database: • Select, join, etc. • Instance difficulty: • Hiding given satisfying assignments in a SAT formula • Approximate representations • Other representations? • More realistic implementations • Negative data mining: • Is it easier/harder to find certain instances in NDB? • Imprecise representations: • Partial matching and queries • Learning algorithms**People**Stephanie Forrest Fernando Esponda Paul Helman Elena Ackley**Publications**• F. Esponda, S. Forrest, and P. Helman ``Negative representations of information.'' International Journal of Information Security (submitted March 2005). • F. Esponda, E.~S. Ackley, S. Forrest, and P. Helman ``On-line negative databases.'' Journal of Unconventional Computing (in press). • F. Esponda, S. Forrest, and P. Helman. ``A formal framework for positive and negative detection.'' IEEE Transactions on Systems, Man, and Cybernetics 34:1 pp. 357-373 (2004). • J. Balthrop, S. Forrest, M. Newman, and M. Williamson.``Technological networks and the spread of computer viruses.'’ Science 304:527-529 (2004). • H. Inoue and S. Forrest ``Inferring Java security policies through dynamic sandboxing.'' "2005 International Conference on Programming Languages and Compilers (PLC'05) (in press). • F. Esponda, E. Ackley, S. Forrest, and P. Helman. ``On-line negative databases.'' Third International Conference on Artificial Immune Systems (ICARIS) Best paper award (2004).**Probabilities**BACK**Generating Hard-to-Reverse Negative Databases**• The randomized algorithm can be used to create a negative database. • Insert/Delete operations turn known hard formulas into negative databases. • The Morph operator may be used to search for hard instances. H. Jia, C. Moore and B. Selman "From spin glasses to hard satisfiable formulas” SAT 2004.**Effect of the Morph operation**• The Morph operation takes as input a negative database NDB and outputs NDB’ that represents the same set U-DB. • The plot shows how the complexity of a database changes after applying the morph operator.