Kernel based contrast functions for sufficient dimension r eduction
Download
1 / 26

Kernel-Based Contrast Functions for Sufficient Dimension Reduction - PowerPoint PPT Presentation


  • 202 Views
  • Uploaded on

Kernel-Based Contrast Functions for Sufficient Dimension R eduction. Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach. Outline. Introduction – dimension reduction and conditional independence

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Kernel-Based Contrast Functions for Sufficient Dimension Reduction' - red


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Kernel based contrast functions for sufficient dimension r eduction l.jpg

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Michael I. Jordan

Departments of Statistics and EECS

University of California, Berkeley

Joint work with Kenji Fukumizu and Francis Bach


Outline l.jpg
Outline

  • Introduction – dimension reduction and conditional independence

  • Conditional covariance operators on RKHS

  • Kernel Dimensionality Reduction for regression

  • Manifold KDR

  • Summary


Sufficient dimension reduction l.jpg
Sufficient Dimension Reduction

  • Regression setting: observe (X,Y) pairs, where the covariate X is high-dimensional

  • Find a (hopefully small) subspace S of the covariate space that retains the information pertinent to the response Y

  • Semiparametric formulation: treat theconditional distribution p(Y | X) nonparametrically, and estimate the parameter S


Perspectives l.jpg
Perspectives

  • Classically the covariate vector X has been treated as ancillary in regression

  • The sufficient dimension reduction (SDR) literature has aimed at making use of the randomness in X (in settings where this is reasonable)

  • This has generally been achieved via inverse regression

    • at the cost of introducing strong assumptions on the distribution of the covariate X

  • We’ll make use of the randomness in X without employing inverse regression


Slide5 l.jpg

Dimension Reduction for Regression

  • Regression:

    Y : response variable, X = (X1,...,Xm):m-dimensional covariate

  • Goal: Find the central subspace, which is defined via:


Slide6 l.jpg

Y

Y

X2

X1

Y

central subspace = X1 axis


Some existing methods l.jpg
Some Existing Methods

  • Sliced Inverse Regression (SIR, Li 1991)

    • PCA of E[X|Y] use slice of Y

    • Elliptic assumption on the distribution of X

  • Principal Hessian Directions (pHd, Li 1992)

    • Average Hessian is used

    • If X is Gaussian, eigenvectors gives the central subspace

    • Gaussian assumption on X. Y must be one-dimensional

  • Projection pursuit approach (e.g., Friedman et al. 1981)

    • Additive model E[Y|X] = g1(b1TX) + ... + gd(bdTX) is used

  • Canonical Correlation Analysis (CCA) / Partial Least Squares (PLS)

    • Linear assumption on the regression

  • Contour Regression (Li, Zha & Chiaromonte, 2004)

    • Elliptic assumption on the distribution of X


Slide8 l.jpg

Y

X

U

V

Dimension Reduction and Conditional Independence

  • (U, V)=(BTX, CTX)

    where C: m x (m-d) with columns orthogonal to B

  • B gives the projector onto the central subspace

  • Our approach: Characterize conditional independence

  • Conditional independence


    Outline9 l.jpg
    Outline

    • Introduction – dimension reduction and conditional independence

    • Conditional covariance operators on RKHS

    • Kernel Dimensionality Reduction for regression

    • Manifold KDR

    • Summary


    Reproducing kernel hilbert spaces l.jpg
    Reproducing Kernel Hilbert Spaces

    • “Kernel methods”

      • RKHS’s have generally been used to provide basis expansions for regression and classification (e.g., support vector machine)

      • Kernelization: map data into the RKHS and apply linear or second-order methods in the RKHS

      • But RKHS’s can also be used to characterize independence and conditional independence

    FX(X)

    FY(Y)

    feature map

    feature map

    WX

    WY

    FX

    FY

    X

    Y

    HX

    HY

    RKHS

    RKHS


    Positive definite kernels and rkhs l.jpg
    Positive Definite Kernels and RKHS

    • Positive definite kernel (p.d. kernel)

      k is positive definite if k(x,y) = k(y,x) and for any

      the matrix (Gram matrix) is positive semidefinite.

      • Example: Gaussian RBF kernel

    • Reproducing kernel Hilbert space (RKHS)

      k: p.d. kernel onW

      H: reproducing kernel Hilbert space (RKHS)

      1)

      2) is dense in H.

      3)

    for all

    (reproducing property)


    Slide12 l.jpg

    • Functional data

      Data: X1, …, XN FX(X1),…, FX(XN) : functional data

    • Why RKHS?

      • By the reproducing property, computing the inner product on RKHS is easy:

      • The computational cost essentially depends on the sample size. Advantageous for high-dimensional data of small sample size.


    Covariance operators on rkhs l.jpg
    Covariance Operators on RKHS

    • X , Y : random variables on WX and WY , resp.

    • PrepareRKHS (HX, kX) and (HY , kY) defined on WX and WY, resp.

    • Define random variables on the RKHSHX and HY by

    • Define the covariance operator SYX

    FX(X)

    FY(Y)

    WX

    WY

    FX

    FY

    X

    Y

    HX

    HY


    Slide14 l.jpg

    Covariance Operators on RKHS

    • Definition

      SYX is an operator from HX to HY such that

    for all

    cf. Euclidean case

    VYX = E[YXT] – E[Y]E[X]T : covariance matrix


    Characterization of independence l.jpg
    Characterization of Independence

    • Independence and cross-covariance operators

      If the RKHS’s are “rich enough”:

      • cf. for Gaussian variables,

    XY

    is always true

    requires an assumption

    on the kernel (universality)

    or

    e.g., Gaussian RBF kernels are

    universal

    for all

    i.e. uncorrelated

    X and Y are independent


    Slide16 l.jpg

    for all

    • Independence and characteristic functions

      Random variables X and Y are independent

    • RKHS characterization

      Random variables and are independent

      • RKHS approach is a generalization of the characteristic-function approach

    for all w and h

    I.e., work as test functions


    Rkhs and conditional independence l.jpg
    RKHS and Conditional Independence

    • Conditional covariance operator

      Xand Y are random vectors. HX , HY : RKHS with kernel kX, kY, resp.

      • Under a universality assumption on the kernel

      • Monotonicity of conditional covariance operators

        X = (U,V) : random vectors

    Def.

    : conditional covariance operator

    cf.

    For Gaussian

    : in the sense of self-adjoint operators


    Slide18 l.jpg

    Theorem

    • X = (U,V) and Y are random vectors.

    • HX , HU ,HY : RKHS with Gaussian kernelkX, kU, kY, resp.

    This theorem provides a new methodology for solving the

    sufficient dimension reduction problem


    Outline19 l.jpg
    Outline

    • Introduction – dimension reduction and conditional independence

    • Conditional covariance operators on RKHS

    • Kernel Dimensionality Reduction for regression

    • Manifold KDR

    • Summary


    Kernel dimension reduction l.jpg

    X

    Y

    Kernel Dimension Reduction

    • Use a universal kernel for BTX and Y

    • KDR objective function:

    ( : the partial order of self-adjoint operators)

    | BTX

    which is an optimization over the Stiefel manifold


    Slide21 l.jpg

    Estimator

    • Empirical cross-covariance operator

      gives the empirical covariance:

    • Empirical conditional covariance operator

    eN: regularization coefficient


    Slide22 l.jpg

    where

    : centered Gram matrix


    Experiments with kdr l.jpg
    Experiments with KDR

    • Wine data

      Data 13 dim. 178 data.3 classes2 dim. projection

    KDR

    Partial Least Square

    CCA

    Sliced Inverse Regression

    s= 30


    Consistency of kdr l.jpg
    Consistency of KDR

    Theorem

    Suppose kd is bounded and continuous, and

    Let S0 be the set of optimal parameters:

    Then, under some conditions, for any open set


    Slide25 l.jpg

    Lemma

    Suppose kd is bounded and continuous, and

    Then, under some conditions,

    in probability.


    Conclusions l.jpg
    Conclusions

    • Introduction – dimension reduction and conditional independence

    • Conditional covariance operators on RKHS

    • Kernel Dimensionality Reduction for regression

    • Technical report available at:

    • www.cs.berkeley.edu/~jordan/papers/kdr.pdf


    ad