privacy preserving support vector machines via random kernels l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Privacy-Preserving Support Vector Machines via Random Kernels PowerPoint Presentation
Download Presentation
Privacy-Preserving Support Vector Machines via Random Kernels

Loading in 2 Seconds...

play fullscreen
1 / 16

Privacy-Preserving Support Vector Machines via Random Kernels - PowerPoint PPT Presentation


  • 131 Views
  • Uploaded on

Privacy-Preserving Support Vector Machines via Random Kernels. The 2008 International Conference on Data Mining. April 2, 2014. Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison. TexPoint fonts used in EMF.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Privacy-Preserving Support Vector Machines via Random Kernels' - briar


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
privacy preserving support vector machines via random kernels

Privacy-Preserving Support Vector Machines via Random Kernels

The 2008 International Conference on Data Mining

April 2, 2014

Olvi Mangasarian

UW Madison & UCSD La Jolla

Edward Wild

UW Madison

TexPoint fonts used in EMF.

Read the TexPoint manual before you delete this box.: AAAAAAAAAAA

slide2

Horizontally Partitioned Data

Data

Features

1 2 ..………….…………. n

1

2

........m

A

A1

Examples

A2

A3

problem statement
Problem Statement
  • Entities with related data wish to learn a classifier based on all data
  • The entities are unwilling to reveal their data to each other
  • If each entity holds a different set of examples with all features, then the data is said to be horizontally partitioned
  • Our approach: privacy-preserving support vector machine (PPSVM) using random kernels
    • Provides accurate classification
    • Does not reveal private information
outline
Outline
  • Support vector machines (SVMs)
  • Reduced and random kernel SVMs
  • Privacy-preserving SVM for horizontally partitioned data
  • Summary
support vector machines

_

_

_

+

+

+

+

+

+

_

_

+

+

+

+

+

_

+

+

_

+

+

+

_

+

_

+

_

_

_

_

_

_

_

_

_

_

_

_

_

_

Linear kernel: (K(A, B))ij = (AB)ij = AiB¢j = K(Ai, B¢j)

Gaussian kernel, parameter : (K(A, B))ij = exp(-||Ai0-B¢j||2)

SVMs

Support Vector Machines
  • x 2Rn
  • SVM defined by parameters u and threshold  of the nonlinear surface
  • A contains all data points
    • {+…+} ½A+
    • {…} ½ A
  • e is a vector of ones

K(A+, A0)u¸ e +e

K(A, A0)u· ee

Minimize e0y (hinge loss or plus function or max{•, 0}) to fit data

Minimize e0s (||u||1 at solution) to reduce overfitting

K(x0, A0)u = 

K(x0, A0)u = 

Slack variable y¸ 0 allows points to be on the wrong side of the bounding surface

K(x0, A0)u = 1

support vector machine

Random Reduced Support Vector Machine

Reduced Support Vector Machine

Support Vector Machine

Using the random kernel K(A, B0) is a key result for generating a simple and accurate privacy-preserving SVM

L&M, 2001: replace the kernel matrix K(A, A0) with K(A, Ā0), where Ā0 consists of a randomly selected subset of the rows of A

M&T, 2006: replace the kernel matrix K(A, A0) with K(A, B0), where B0 is a completely random matrix

error of random kernels is comparable to full kernels linear kernels
Error of Random Kernels is Comparable to Full Kernels:Linear Kernels

B is a random matrix with the same number of columns as A and either 10% as many rows, or one fewer row than columns

Equal error for random and full kernels

Each point represents one of 7 datasets from the UCI repository

Random Kernel AB0 Error

Full Kernel AA0 Error

error of random kernels is comparable full kernels gaussian kernels
Error of Random Kernels is Comparable Full Kernels:Gaussian Kernels

Random Kernel K(A, B0) Error

Full Kernel K(A, A0) Error

horizontally partitioned data each entity holds different examples with the same features

A1

A2

A3

Horizontally Partitioned Data:Each entity holds different examples with the same features

A3

A1

A2

privacy preserving svms for horizontally partitioned data via random kernels
Privacy Preserving SVMs for Horizontally Partitioned Data via Random Kernels
  • Each of q entities privately owns a block of data A1, …, Aq that they are unwilling to share with the other q - 1 entities
  • The entities all agree on the same random basis matrix

and distribute K(Aj, B0) to all entities

  • K(A, B0) =
  • Aj cannot be recovered uniquely from K(Aj, B0)
privacy preservation infinite number of solutions for a i given a i b 0
Privacy Preservation:Infinite Number of Solutions for Ai Given AiB0

Feng and Zhang, 2007: Every submatrix of a random matrix has full rank

  • Given
  • Consider solving for row r of Ai, 1 · r · mi from the equation
    • BAir0 = Pir , Air02 Rn
    • Every square submatrix of the random matrix B is nonsingular
    • There are at least
  • Thus there are

solutions Ai to the equation BAi0 = Pi

  • If each entity has 20 points in R30, there are 3020 solutions
  • Furthermore, each of the infinite number of matrices in the affine hull of these matrices is a solution

B

Air0

Pir

=

results for ppsvm on horizontally partitioned data
Results for PPSVM on Horizontally Partitioned Data
  • Compare classifiers that share examples with classifiers that do not
    • Seven datasets from the UCI repository
  • Simulate a situation in which each entity has only a subset of about 25 examples
error rate of sharing data is better than not sharing linear kernels
Error Rate of Sharing Data is Better than not Sharing:Linear Kernels

7 datasets represented by one point each

Error Sharing Data

Error Rate

WithoutSharing

Error Rate

WithSharing

Error Without Sharing Data

error rate of sharing data is better than not sharing gaussian kernels
Error Rate of Sharing Data is Better than not Sharing:Gaussian Kernels

Error Sharing Data

Error Without Sharing Data

summary
Summary
  • Privacy preserving SVM for horizontally partitioned data
    • Based on using the random kernel K(A, B0)
    • Learn classifier using all data, but without revealing privately held data
    • Classification accuracy is better than an SVM without sharing, and comparable to an SVM where all data is shared
  • Related work
    • Similar approach for vertically partitioned data to appear in ACMTKDD
    • Liu et al., 2006: Properties of multiplicative data perturbation based on random projection
    • Yu et al., 2006: Secure computation of K(A, A0)
questions
Questions
  • Websites with links to papers and talks:

http://www.cs.wisc.edu/~olvi

http://www.cs.wisc.edu/~wildt