Randomized variable elimination
1 / 27

Randomized Variable Elimination - PowerPoint PPT Presentation

  • Uploaded on

Randomized Variable Elimination. David J. Stracuzzi Paul E. Utgoff. Agenda. Background Filter and wrapper methods Randomized Variable Elimination Cost Function RVE algorithm when r is known (RVE) RVE algorithm when r is not known ( RVErS ) Results Questions.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Randomized Variable Elimination' - sophie

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Randomized variable elimination

Randomized Variable Elimination

David J. Stracuzzi

Paul E. Utgoff


  • Background

  • Filter and wrapper methods

  • Randomized Variable Elimination

  • Cost Function

  • RVE algorithm when r is known (RVE)

  • RVE algorithm when r is not known (RVErS)

  • Results

  • Questions

Variable selection problem
Variable Selection Problem

  • Choosing relevant attributes from set of attributes.

  • Producing a subset of variables from large set of input variables that best predicts target function.

  • Forward selection algorithm starts with an empty set and searches for variables to add.

  • Backward selection algorithm starts with entire set of variables and go on removing irrelevant variable(s).

  • In some cases, forward selection algorithm also removes variables in order to recover from previous poor selections.

  • Caruna and Freitag (1994) experimented with greedy search methods and found that allowing search to add or remove variables outperform simple forward and backward searches

  • Filter and wrapper methods for variable selection.

Filter methods
Filter methods

  • Uses statistical measures to evaluate the quality of variable subsets.

  • Subset of variables are evaluated with respect to specific quality measure.

  • Statistical evaluation of variables require very little computational cost as compared to running the learning algorithm.

  • FOCUS (Almuallim and Dietterich, 1991) searches for smallest subset that completely discriminates between target classes.

  • Relief (Kira and Rendell, 1992) ranks variables as per distance.

  • In filter methods, variables are evaluated independently and not in context of learning problem.

Wrapper methods
Wrapper methods

  • Uses performance of the learning algorithm to evaluate the quality of subset of input variables.

  • The learning algorithm is executed on the candidate variable set and then tested for the accuracy of resulting hypothesis.

  • Advantage: Since wrapper methods evaluate variables in the context of learning problem, they outperform filter methods.

  • Disadvantage: Cost of repeatedly executing the learning algorithm can become problematic.

  • John, Kohavi, and Pfleger (1994) coined the term “wrapper” but the technique was used before that (Devijver and Kittler, 1982)

Randomized variable elimination1
Randomized Variable Elimination

  • Falls under the category of wrapper methods.

  • First, a hypothesis is produced for entire set of ‘n’ variables.

  • A subset if formed by randomly selecting ‘k’ variables.

  • A hypothesis is then produced for remaining (n-k) variables.

  • Accuracy of the two hypotheses are compared.

  • Removal of any relevant variable should cause an immediate decline in performance

  • Uses a cost function to achieve a balance between successive failures and cost of running the learning algorithm several times.

Probability of selecting k variables
Probability of selecting ‘k’ variables

  • The probability of successfully selecting ‘k’ irrelevant variables at random is given by


    n … remaining variables

    r … relevant variables

Expected number of failures
Expected number of failures

  • The expected number of consecutive failures before a success at selecting k irrelevant variables is given by

  • Number of consecutive trials in which at least one of the r relevant variables will be randomly selected along with irrelevant variables.

Cost of removing k variables
Cost of removing k variables

  • The expected cost of successfully removing k variables from n remaining given r relevant variables is given by

    where, M(L, n) represents an upper bound on the cost of running algorithm ‘L’ on n inputs.

Optimal cost of removing irrelevant variables
Optimal cost of removing irrelevant variables

  • The optimal cost of removing irrelevant variables from n remaining and r relevant is given by

Optimal value for k
Optimal value for ‘k’

  • The optimal value is computed as

  • It is the value of k for which the cost of removing variables is optimal.

Algorithm for computing k and cost values
Algorithm for computing k and cost values

  • Given: L, N, r

  • Isum[r+1…N] ← 0

    kopt[r+1…N] ← 0

    fori ← r+1 to Ndo

    bestCost ← ∞

    for k ← 1 to i-r do

    temp ← I(i,r,k) + Isum[i-k]

    if (temp < bestCost) then

    bestCost ← temp

    bestK ← k

    Isum[i] ← bestCost

    kopt[i] ← bestK

Randomized variable elimination rve when r is known
Randomized Variable Elimination (RVE) when r is known

  • Given: L,n,r, tolerance

  • Compute tables for Isum(i,r) and kopt(i,r)

    h ← hypothesis produced by L on ‘n’ inputs

  • whilen > rdo

    k ← kopt(n,r)

    select k variables at random and remove them

    h’ ← hypothesis produced by L on n-k inputs

    ife(h’) – e(h) ≤ tolerancethen

    n ← n-k

    h ← h’


    replace the selected k variables

Rve example
RVE example

  • Plot of expected cost of running RVE(Isum(N,r = 10)) along with cost of removing inputs individually, and the estimated number of updates M(L,n).

  • L is function that learns a boolean function using perceptron unit.

Randomized variable elimination including a search for r rvers
Randomized Variable Elimination including a search for ‘r’ (RVErS)

  • Given: L, c1, c2, n, rmax , rmin , tolerance

  • Compute tables Isum(i,r) and kopt(i,r) for rmin ≤ r ≤ rmax

    r ← (rmax + rmin) / 2

    success, fail ← 0

    h ← hypothesis produced by L on ‘n’ inputs

  • repeat

    k ← kopt(n,r)

    select k variables at random and remove them

    h’ ← hypothesis produced by L on (n-k) inputs

    ife(h’) – e(h) ≤ tolerance then

    n ← n – k

    h ← h’

    success ← success + 1

    fail ← 0


    replace the selected k variables

    fail ← fail + 1

    success ← 0

Rvers contd
RVErS (contd…) ‘r’ (

ifn ≤ rminthen

r, rmax, rmin ← n

elseiffail ≥ c1E⁻(n,r,k)then

rmin ← r

r ← (rmax + rmin) / 2

success, fail ← 0

elseifsuccess ≥ c2(r – E⁻(n,r,k)) then

rmax ← r

r ← (rmax + rmin) / 2

success, fail ← 0

until rmin < rmaxandfail ≤ c1E⁻(n,r,k)

Results ‘r’ (

Variable selection results using na ve bayes and c4 5 algorithms1
Variable Selection results using naïve algorithmsBayes and C4.5 algorithms

My implementation
My implementation algorithms

  • Integrate with Weka

  • Extend the NaiveBayes and J48 algorithms

  • Obtain results for some UCI datasets used

  • Compare results with those reported by authors

  • Work in progress

RECAP algorithms

Questions algorithms

References algorithms

  • H. Almuallim and T.G Dietterich. Leraning with many irrelevant features. In Proceedings of the Ninth National Conference on Artificial Intelligence, Anaheim, CA, 1991. MIT Press.

  • R. Caruna and D. Freitag. Greedy attribute selection. In Machine Learning: Proceedings of Eleventh International Conference, Amherst, MA, 1993. Morgan Kaufmann.

  • K. Kira and L. Rendell. A practical approach to feature selection. In D. Sleeman and P. Edwards, editors, Machine Learning: Proceedings of Ninth International Conference, San Mateo, CA, 1992. Morgan Kaufmann.

References contd
References (contd…) algorithms

  • G. H. John, R. Kohavi, and K. Pfleger. Irrelevant features and subset selection problem. In Machine Learning: Proceedings of Eleventh Internaltional Conference, pages 121-129, New Brunswick, NJ, 1994. Morgan Kauffmann.

  • P.A. Devijver and J. Kittler. Pattern Recognition: A statistical approach. Prentice Hall/International, 1982