Order Preserving Encryption for Numeric Data
This presentation is the property of its rightful owner.
Sponsored Links
1 / 44

Outline PowerPoint PPT Presentation


  • 67 Views
  • Uploaded on
  • Presentation posted in: General

Order Preserving Encryption for Numeric Data Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu IBM Almaden Research Center. Outline. Motivation and Introduction OPES encryption Modeling the distribution Experimental evaluation. Motivation.

Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Outline

Order Preserving Encryption for Numeric DataRakesh AgrawalJerry KiernanRamakrishnan SrikantYirong XuIBM Almaden Research Center


Outline

Outline

  • Motivation and Introduction

  • OPES encryption

  • Modeling the distribution

  • Experimental evaluation


Motivation

Motivation

  • Encryption is rapidly becoming a requirement in a myriad of business settings (e.g., health care, financial, retail, government), driven by legislations (e.g. SB1386, HIPAA)

  • Encrypting databases unleashes a host of problems:

    • Performance slowdown

    • Incompatibility with standard database features

      • E.g. comparison predicates and the use of indexes

    • Changes to applications for encryption

      • Encryption functions now appear in queries


Order preserving encryption function

Order Preserving Encryption Function

E is an order preserving encryption function,

and p1 and p2 are two plaintext values, and

c1 = E(p1)

c2 = E(p2)

if (p1 < p2) then (c1 < c2)


Threat model

Threat Model

  • The storage system used by the DBMS is untrusted, i.e. vulnerable to compromise

  • The DBMS software is trusted

  • Ciphertext only attack

    • The adversary has access to all (but only) encrypted values

  • Guard against percentile exposure

    • An adversary should not be able to get even an estimate of true values


Design goals

Design Goals

  • Query results from OPES will be sound and complete

  • Comparison operations will be performed without decrypting the operands

  • Standard database indexes can be used over encrypted data

  • Tolerate updates


Integration of encryption and query processing

Integration of Encryption and Query Processing

Users have a plaintext view of an encrypted database

We hereafter strictly focus on the OPES algorithms

Comparison operators are directly applied over encrypted columns

Queries

Plaintext queries are translated into equivalent queries over encrypted data

Select name from Emp where sal > 100000

Translation layer

Select decrypt (“xsxx”)

from “cwlxss”

where “xescs” > OPESencrypt(100000)

DBMS

Tables are encrypted using standard as well as order preserving encryption

Encrypted data

And metadata


Outline1

Outline

  • Motivation and Introduction

  • OPES encryption

  • Modeling the distribution

  • Experimental evaluation


Approach

Approach

  • Plaintext data has unknown distribution

  • User selects the target (ciphertext) distribution

  • Ciphertext values exhibit the target distribution


Effect of opes encryption on plaintext distributions

Encrypted

Original

Target

Effect of OPES Encryption on Plaintext Distributions

Input: Gaussian, Target: Zipf

Input: Uniform, Target: Zipf


Opes key generation

OPES Key Generation

Sample of source values from the plaintext distribution

Sample of target values from the ciphertext distribution

OPES Key Generation

OPES Key


Opes keys

OPES Keys

Target to uniform

Target

Source to uniform

Uniform

Uniform

Source


Two step encryption

Two Step Encryption

  • Source (plaintext) to uniform

  • Uniform to target (ciphertext)


Opes encryption

OPES Encryption

Step II

Step I

Target

Uniform

Uniform

Source

Step II

Step I

Encrypt

Decrypt


Outline2

Outline

  • Motivation and Introduction

  • OPES encryption

  • Modeling the distribution

  • Experimental evaluation


Modeling the distribution

Modeling the Distribution

  • Histograms

    • Equi-depth, equi-width, wavelets

      • Number of buckets required unreasonably large

      • Over fitting the model

  • Parametric

    • Poor estimation for irregular distributions

  • Hybrid [Konig and Weikum 99]

    • Query result size estimation

    • Approach

      • Partition the data into buckets

      • Model the distribution within a bucket as a spline

      • Fixed number of buckets


Our approach

Our Approach

  • Hybrid [Konig and Weikum 99]

    • Partition the data into buckets

    • Model the distribution within each bucket as a linear spline

  • The number of buckets is not fixed

  • We use MDL to determine the number of bucket boundaries


Outline

MDL

  • The best model for encoding data minimizes the sum of the cost of

    • Describing the model

    • Describing data in terms of the model


Model costs

Model Costs

  • Data Cost

    • Using a mapping M from [pl,ph) to [fl,fh), the cost of encoding pi is

      • C(pi)=log(fi-E(i))

      • DC(pl,ph) = C(pl)+C(pl+1)+…+C(ph-1)

  • Incremental Model Cost

    • Fixed cost for each additional bucket

      • Boundary value

      • Boundary parameters

        • Slope

        • Scale factor


Computing boundaries

Computing Boundaries

  • Growth phase

    • [pl,ph) with h-l-1 sorted points {pl+1,pl+2,…,ph-1}

      • Compute spline for [pl,ph)

      • Compute [fl,fh) using the spline

    • Find further split point ps with fs having the maximum deviation from the expected value

  • Prune phase

    • LB(pl,ph)=DC(pl,ph)-DC(pl,ps)-DC(ps,ph)-IMC

    • GB(pl,ph)=LB(pl,ph)+GB(pl,ps)+GB(ps,ph)

    • if (GB > 0), the split at ps is retained


Scaling

Scaling

Number of values in a bucket may be disproportional to the size of the bucket

Uniform

x

x

x

x

x

Source

x

x

x

x

x

b

b+1

b-1


Updates

Updates

  • The scale factor ensures that each distinct plaintext value maps to distinct ciphertext values

  • Encrypted values need not be recomputed unless the distribution of plaintext values changes


Quality of encryption

Quality of Encryption

  • KS Statistical Test

    • Can we disprove, to a certain required level of significance, the null hypothesis that two data sets are drawn from the same distribution function?

    • If not, then the ciphertext distribution cannot be distinguished from the specified target distribution


Duplicates

Duplicates

  • Assumptions

    • A large number of duplicates may leak information about the distribution of values

  • Alternatively,

    • Map duplicates to distinct values

    • if (f = M(p), f’ = M(p+1))

      • [f,f’) = M(p)

    • Equality expressed as a range

    • Equi-joins can no longer be expressed

      • However, many numeric attributes (e.g., salary) may rarely be used in joins


Outline3

Outline

  • Motivation and Introduction

  • OPES encryption

  • Modeling the distribution

  • Experimental evaluation


Experimental evaluation

Experimental Evaluation

  • Percentile exposure

  • Updatability

  • Key size

  • Time overhead


Datasets

Datasets

  • Census

    • UCI KDD archive, PUMS census data (30,000) records

  • Gaussian

  • Zipf

  • Uniform

Default

Source:Gaussian

Target:Zipf


Percentile exposure

Percentile Exposure


Time to the build model

Time to the Build Model


Insertion overhead

Insertion Overhead


Cost of additional insertion

Cost of Additional Insertion


Retrieval overhead

Retrieval Overhead


Retrieval time

Retrieval Time


Related work

Related Work

  • Polynomial functions

    • Ignores the distribution of plaintext/ciphertext values

  • Database as a service

    • Requires post processing of query results

  • Privacy homomorphisms

    • Comparison operations not investigated

  • Keyword searches on encrypted data

    • Designed for keyword retrieval

    • Range queries not supported

  • Smartcard-based schemes

    • Infeasible for large ranges

  • Order-preserving hashing

    • Protecting the hash values from cryptanalysis is not a concern, nor is deciphering plaintext values from hash values

    • Designed for static collections


Closing remarks

Closing Remarks

  • Ensuring safety without impeding the flow of information is a hard problem

  • Current choices

    • Plaintext database

    • Encrypted databases with loss of functionality or performance

  • Our approach focused on the trade-off between security and efficiency

  • We developed an algorithm which could easily be integrated with current systems


Backup

Backup


Encode

Encode

Encode(p) = z(sp2+p)

p c [0,ph), s = q/(2r), z > 0

distribution has density function qp + r

p is the source (target) value

s is the quadratic coefficient

z is the scale factor


Decode

Decode

z ! z2 + 4zsf

Decode (f) =

2zs

fc [0, fh), s = q/(2r), z > 0

f is the flattened value

s is the quadratic coefficient

z is the scale factor


Order preserving encryption

Order Preserving Encryption

Ciphertext is the index value

  • Effectively hides the distribution of plaintext values

  • The key size is proportional to the number of distinct attribute values

  • Any updates require recomputing the key and ciphertext values

Compute distinct attribute values in ascending order


Target distribution requirement

Target Distribution Requirement

  • Why isn’t the source-to-uniform transformation sufficient for order preserving encryption?

  • It is, but

    • The target distribution may cause an adversary to make incorrect assumptions about the source distribution

    • The organization of the source distribution cannot be inferred from the target


Quadratic coefficient

Quadratic Coefficient

x

x

x

x

x

x

x

x

x

x

v =

b1

b2

i1

j1

i2

j2

j2 – i2

j1 – i1

-

vj2 – vi2

vj1 – vi1

q

q =

s =

vb1 – vb2

j1 – i1

2

vj1 – vi1


Scale factor constraints

Scale Factor Constraints

for all p c [0,w) : M(p+1) – M(p) o 2

Ensures that there is a distinct mapped value for each input value

wf = Kn

The width of a bucket in the mapped space is a function of the number of elements n in the bucket

K is the minimum width needed across buckets


Scale factor

Scale Factor

The scale factor will stretch short buckets to the width of the largest bucket, further increasing the dimension of a bucket by a factor of the number of elements in the bucket

Kn

z =

sw2 + w

K = max [x(swi2+w)], i = 1, …, m,

2,s o 0

2/(1 + s(2w – 1)), s < 0

x =


Slope

Slope

The values within a single bucket are unevenly distributed within the bucket

b-1

b


  • Login