dimensionality reduction in unsupervised learning of conditional gaussian networks n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks PowerPoint Presentation
Download Presentation
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks

Loading in 2 Seconds...

play fullscreen
1 / 27

Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks - PowerPoint PPT Presentation


  • 112 Views
  • Uploaded on

Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks. Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In IEEE Trans. on PAMI , 23 (6), 2001. Summarized by Kyu-Baek Hwang. Abstract.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In IEEE Trans. on PAMI, 23(6), 2001. Summarized by Kyu-Baek Hwang

    2. Abstract • Feature selection for unsupervised learning of Gaussian networks • Unsupervised learning for Bayesian networks? • Which feature is good for the learning task? • Assessment of the relevance of the feature for learning process • How to determine the threshold for cutting? • Accelerate the learning time and still obtain reasonable models • Two artificial datasets • Two benchmark datasets from the UCI repository

    3. Unsupervised Learning for Conditional Gaussian Networks • Data clustering  learning the probabilistic graphical model from the unlabeled data • Cluster membership  a hidden variable • Conditional Gaussian networks • Cluster variable is the ancestor for all the other variables. • The joint probability distribution over all the other variables given the cluster membership is multivariate Gaussian. • Feature selection in classification  feature selection in clustering • Consider all the features eventually, to describe the domain.

    4. Conditional Gaussian Distribution • Data clustering • X = (Y, C) = (Y1, …, Yn, C) • Conditional Gaussian distribution • Pdf for Y given C = c is, • whenever p(c) = p(C = c) > 0 Positive definite

    5. Conditional Gaussian Networks • Factorization of the conditional Gaussian distribution • Conditional independencies among all the variables is encoded by the network structure s. • Local probability distribution

    6. An Example of CGNs C

    7. n 1 O H N Learning CGNs from Data • Incomplete dataset d • Structural EM algorithm

    8. Structural EM Algorithm Expected score Relaxed version:

    9. Scoring Metricsfor the Structural Search • The log marginal likelihood of the expected complete data

    10. Feature Selection • Large databases • Many instances • Many attributes •  Dimensionality reduction required • Select features based on some criterion. • The criterion differs from the purpose of learning. • Learning speed, accurate predictions, and the comprehensibility of the learned models • Non exhaustive search (2n) • Sequential selection (forward or backward) • Evolutionary, population-based, randomized search based on the EDA.

    11. Wrapper and Filter • Wrapper • Feature subsets tailored to the performance function of learning process • Predictive accuracy on the test data set. • Filter • Based on the intrinsic properties of the data set. • Correlation between the class label and each attribute •  Supervised learning • Two problems in unsupervised learning • Absence of the class label  different criterion for the feature selection • No standard accepted performance task  multiple predictive accuracy or class prediction

    12. Feature Selection in Learning CGNs • Data analysis (clustering)  description, not prediction • All the features are necessary for the description. • CGN learning with many features is a time-consuming task. • Preprocessing: feature selection • Learning CGNs • Postprocessing: addition of the other features as conditionally independent given the cluster membership • The goal  how to measure the relevance • Fast learning time • Accuracy  log likelihood for the test data

    13. Relevance • Those features that exhibit low correlation with the rest of the features can be considered irrelevant for the learning process. • Conditionally independent given the cluster membership. • First trial in the continuous domain

    14. Relevance Measure • The relevance measure: • Null hypothesis (edge exclusion test) • r2ij|rest • The sample partial correlation of Yi and Yj • The maximum likelihood estimates (mles) of the elements of the inverse variance matrix

    15. Graphical Gaussian Models (1/2)

    16. Graphical Gaussian Models (2/2)

    17. Relevance Threshold • Distribution of the test statistic • G(x): pdf of a 12 random variable • 5 percent test • The resolution of the above equation  optimization

    18. Learning Scheme

    19. C Experimental Settings • Model speicifications • Tree augmented Naïve Bayes (TANB) models • Predictive attributes may have, at most, one other predictive attribute as a parent. • An example

    20. Data Sets • Synthetic data sets (4000:1000) • TANB model with 25 (15:14[-1, 1]) attributes, (0, 4, 8), 1 • C: uniform, (0, 1) • TANB model with 30 (15:14[-1, 1]) attributes, (0, 4, 8), 2 • C:uniform, (0, 5) • Waveform (artificial data) (4000:1000) • 3 clusters, 40 attributes, the last 19 are noise attributes • Pima • 768 cases (700:68) • 8 attributes

    21. Performance Criteria • The log marginal likelihood of the training data • The multiple predictive accuracy • A probabilistic approach to the standard multiple predictive accuracy • Runtime • 10 independent runs for the synthetic data sets and the waveform data • 50 independent runs for the pima data • On a Pentium 366 machine

    22. Relevance Ranking

    23. Likelihood Plots for Synthetic Data

    24. Likelihood Plots for Real Data

    25. Runtime

    26. Automatic Dimensionality Reduction

    27. Conclusions and Future Work • Relevance assessment for feature selection in unsupervised learning and continuous domain • Reasonable learning performance • Extension to categorical domain • Redundant feature problem • Relaxation of the model structure • More realistic data set