Instance construction via likelihood based data squashing
This presentation is the property of its rightful owner.
Sponsored Links
1 / 13

Instance Construction via Likelihood-Based Data Squashing PowerPoint PPT Presentation


  • 57 Views
  • Uploaded on
  • Presentation posted in: General

Instance Construction via Likelihood-Based Data Squashing. Madigan D., et. al . (Ch 12, Instance selection and Construction for Data Mining (2001), Kruwer Academic Publishers) Summarize: Jinsan Yang, SNU Biointelligence Lab. Abstract Data Compression Method: Squashing

Download Presentation

Instance Construction via Likelihood-Based Data Squashing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Instance construction via likelihood based data squashing

Instance Construction via Likelihood-Based Data Squashing

Madigan D.,et. al.

(Ch 12, Instance selection and Construction for Data Mining (2001), Kruwer Academic Publishers)

Summarize: Jinsan Yang, SNU Biointelligence Lab


Instance construction via likelihood based data squashing

  • Abstract

    • Data Compression Method: Squashing

    • LDS: Likelihood based data squashing

  • Keywords

    Instance Construction, Data Squashing


Outline

Outline

  • Introduction

  • The LDS Algorithm

  • Evaluation: Logistic Regression

  • Evaluation: Neural Networks

  • Iterative LDS

  • Discussion


Introduction

Introduction

  • Massive data examples

    • Large-scale retailing

    • Telecommunications

    • Astronomy

    • Computational biology

    • Internet logging

  • Some computational challenges

    • Need of multiple passes for data access

    • 10^5~6 times slower than main memory

    • Current Solution:Scaling up existing algorithm

    • Here: Scaling down the data

  • Data squashing: 750000  8443 ( DuMouchel et al (1999),

    • Outperforms by a factor of 500 in MSE than random sample of size 7543


Lds algorithm

LDS Algorithm

  • Motivation: Bayesian rule

    • Given three data points d1,d2,d3, estimate the parameter :

    • Clusters by likelihood profile:


Lds algorithm1

LDS Algorithm

  • Details of LDS Algorithm

    • [Select] Values of by a central composite design

Central composite Design for 3 factors


Lds algorithm2

LDS Algorithm

  • [Profile] Evaluate the likelihood profiles

  • [Cluster] Cluster the mother data in a single pass

    • Select n’ random samples as initial cluster centers

    • Assign the remaining data to each cluster

  • [Construct] Construct the Pseudo data:

  • cluster center


Evaluation logistic regression

Evaluation: Logistic Regression

  • Small-scale simulations:

  • Initial estimate of

  • Plot: Log (Error Ratio)

  • Three methods of initial parameter estimations

  • 100 data / 48 squashed data


Evaluation logistic regression1

Evaluation: Logistic Regression

  • Medium Scale: 100000 , base: 1% simple random sampling


Evaluation logistic regression2

Evaluation: Logistic Regression

  • Large Scale: 744963 , base: 1% simple random sampling


Evaluation neural networks

Evaluation: Neural Networks

  • Feed forward, two input nodes, one hidden layer with 3 units,

    Single binary output

  • Mother data: 10000, Squashed data: 1000, repetitions:30

    test data: 1000 from the same network

  • Comparisons for P(whole) - P(reduced)


Evaluation neural networks1

Evaluation: Neural Networks


Iterative lds

Iterative LDS

  • When the estimation of is not accurate.

    1. Set from simple random sampling

  • 2. Squash by LDS

  • 3. Estimate

  • 4. Go to 2.


  • Login