Towards efficient learning of neural network ensembles from arbitrarily large datasets
Download
1 / 34

Towards Efficient Learning of Neural Network Ensembles from Arbitrarily Large Datasets - PowerPoint PPT Presentation


  • 135 Views
  • Uploaded on

Towards Efficient Learning of Neural Network Ensembles from Arbitrarily Large Datasets. Kang Peng, Zoran Obradovic and Slobodan Vucetic. Center for Information Science and Technology, Temple University 303 Wachman Hall, 1805 N Broad St, Philadelphia, PA 19122, USA . Agenda. Introduction

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Towards Efficient Learning of Neural Network Ensembles from Arbitrarily Large Datasets' - arista


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Towards efficient learning of neural network ensembles from arbitrarily large datasets

Towards Efficient Learning of Neural Network Ensembles from Arbitrarily Large Datasets

Kang Peng, Zoran Obradovic and Slobodan Vucetic

Center for Information Science and Technology, Temple University

303 Wachman Hall, 1805 N Broad St, Philadelphia, PA 19122, USA


Agenda
Agenda Arbitrarily Large Datasets

  • Introduction

  • Motivation

  • Related Work

  • Proposed Work

  • Experimental Evaluation

    • Datasets

    • Experimental Setup

    • Results

  • Conclusions


Introduction
Introduction Arbitrarily Large Datasets

  • More and more very large datasets become available

    • Geosciences

    • Bioinformatics

    • Intrusion detection

    • Credit fraud detection

  • Learning from arbitrarily large datasets is one of the next generation data mining challenges


The misr data a real life example
The MISR Data – a Real Life Example Arbitrarily Large Datasets

MISR - Multi-angle Imaging SpectroRadiometer, launched into orbit in December 1999 with the Terra satellite, for studying the ecology and climate of Earth

  • 9 cameras from different angles

  • 4 spectral bands at each angle

  • Global coverage time of every 9 days

  • Average data rate

    3.3 Megabits per second

    3.5 TeraBytes per year


Agenda1
Agenda Arbitrarily Large Datasets

  • Introduction

  • Motivation

  • Related Work

  • Proposed Work

  • Experimental Evaluation

    • Datasets

    • Experimental Setup

    • Results

  • Conclusions


Feed forward neural networks

y Arbitrarily Large Datasets1

weights

weights

x1

y2

x2

y3

x3

y4

1

1

bias

bias

Inputs

Hidden

Layer

Output Layer

Outputs

Feed-Forward Neural Networks

  • Feed-forward Neural Network (NN) is a powerful machine learning / data mining technique

  • Universal approximator – applicable to both classification and regression problems

  • Learning – weights adjustments (e.g. back-propagation)


Motivation
Motivation Arbitrarily Large Datasets

  • Learning a single NN from an arbitrarily large dataset could be difficult due to

    • The unknown intrinsic complexity of the learning task

      • Difficult to determine appropriate NN architecture

      • Difficult to determine how much data is necessary for sufficient learning

    • The computational constraints

  • On the other hand, learning an ensemble of NNs would be advantageous if

    • Each component NN needs only a small portion of data

    • Accuracy is comparable to single NN from all data


Motivation1
Motivation Arbitrarily Large Datasets

  • Need: To learn an ensemble of optimal accuracy but with fewest computational effort, one still has to decide

    • Model complexity

      • The number (E) of component NNs

      • The number (H) of hidden neurons for component NNs

    • Training sample sizes (N) for component NNs

  • Open problem: No efficient algorithm exists to find an exact solution (i.e.optimal combination of E, H and N)even if the component NNs are required to have same H and N

  • Proposed: An iterative procedure that

    • learns near-optimal NN ensembles with reasonable computational effort

    • Adapts to the intrinsic complexity of underlying learning task


Agenda2
Agenda Arbitrarily Large Datasets

  • Introduction

  • Motivation

  • Related Work

  • Proposed Work

  • Experimental Evaluation

    • Datasets

    • Experimental Setup

    • Results

  • Conclusions


Nn architecture selection
NN Architecture Selection Arbitrarily Large Datasets

  • Trial-and-error (manual) procedure

    • Training one model with each architecture

    • Trying as many architectures as possible and selecting the one with highest accuracy

    • Ineffective and inefficient for large datasets

  • Constructive learning

    • Starting with a small network and gradually adding neurons as needed

    • Examples

      • The tiling algorithm

      • The upstart algorithm

      • The cascade-correlation algorithm

    • Suitable for small datasets


Nn architecture selection1
NN Architecture Selection Arbitrarily Large Datasets

  • Network pruning

    • Training a larger-than-necessary NN and then pruning redundant neurons/weights

    • Examples

      • Optimal Brain Damage

      • Optimal Brain Surgeon

    • Suitable for small and medium datasets

  • Evolutionary algorithms

    • Population-based stochastic search algorithms

    • More efficient in searching NN architecture space

    • Applicable to learning rules selection as well as network training (weight adjusting)

    • Inefficient for large datasets


Progressive sampling

accuracy Arbitrarily Large Datasets

nmin

nall

sample size

Progressive Sampling

  • To achieve near-optimal accuracy but with significantly less data than if using the whole dataset

  • Originally proposed for decision tree learning

  • It builds a series of models with increasingly larger samples until accuracy no longer improves

  • The sample sizes follow a sample schedule

    S = {n1, n2, …, nk}

    • where ni is sample size for the i-th model

  • Geometric sampling schedule is efficient in determining nmin

    ni = n0* ai ,

    where constant n0 is positive integer anda>1

  • Progressive sampling may not be suitable for NN learning

    • The learning algorithm should be able to adjust model complexity as samples grow larger – this is not true for back-propagation algorithm


Agenda3
Agenda Arbitrarily Large Datasets

  • Introduction

  • Motivation

  • Related Work

  • Proposed Work

  • Experimental Evaluation

    • Datasets

    • Experimental Setup

    • Results

  • Conclusions


An iterative procedure for learning nn ensembles from arbitrarily large datasets
An Iterative Procedure for Learning NN Ensembles from Arbitrarily Large Datasets

The idea: Building a series of NNs such that

  • Each NN is trained on a sample much smaller than the whole dataset

  • The sample sizes for individual NNs are increased as needed

  • The numbers of hidden neurons for individual NNs increase as needed

  • The final predictor is the best one of all possible ensembles constructed from the trained NNs


The proposed iterative procedure

Initialize Arbitrarily Large DatasetsHa and Nb to certain small values (e.g. 1 and 40)

Draw a sample S of size N from dataset D

Train a NN of H hidden neurons with sample S

Yes

Accuracyc significantly improved?

No

No

Increase H or N

Converged OR resource limitsd reached?

Yes

Identify the best ensemble as the final predictor

The Proposed Iterative Procedure

a) H– number of hidden neurons

b) N – number of training sample size

c) The best accuracy of all possible ensembles from trained NNs, estimated on an independent set

d) Could be main memory (maximal sample size) or cumulative execution time


The use of dataset d
The Use of Dataset Arbitrarily Large DatasetsD

  • Dataset D is divided into 3 disjoint subsets

    • DTR– for training NN

    • DVS – for accuracy estimation during learning

    • DTS – for accuracy estimation of the final predictor

  • To draw a sample of size N from DTR

    • Assumption - data points are stored in random order

    • Sequentially take N data points

    • Rewind if the end of dataset is encountered


Accuracy estimation during learning
Accuracy Estimation during Learning Arbitrarily Large Datasets

  • Accuracy ACCi (for i-th iteration) is estimated on the independent subset DVS as accuracy of the best possible ensemble from i trained NNs

  • To determine if ACCi is significantly higher than ACCi-1, test condition ACCi > ACCi-1AND CIi-1  CIi = 

    Here,

    ACCi is accuracy for i-th iteration,

    CIi is the 90% confidence interval for ACCi

    calculated as ACCi1.645SE(ACCi),

    where SE(ACCi) is standard error of ACCi


Accuracy standard error estimation
Accuracy Standard Error Estimation Arbitrarily Large Datasets

  • For classification problems

  • For regression problems

    • Draw 1000 bootstrap samples from DVS

    • Calculate R2 on each bootstrap sample

    • SE(ACCi) = standard deviation of these R2 values


Adjusting model complexity and sample size
Adjusting Model Complexity Arbitrarily Large Datasetsand Sample Size

  • If ACCi is NOT significantly higher than ACCi-1

    • If ACCi-1 is NOT significantly higher than ACCi-2

      • If already increased N in the i-1 th iteration

        then increase H by a pre-defined amount IH(IH is positive integer)

      • If already increased H in the i-1 th iteration

        then multiply N by a pre-defined factor FA(FA > 1)

    • If ACCi-1 is significantly higher than ACCi-2

      (i.e. neither H nor N is increased in the i-1 th iteration)

      then multiply N by a pre-defined factor FA(FA > 1)


Convergence detection
Convergence Detection Arbitrarily Large Datasets

  • In each (i-th) iteration, test condition

    where Cis a small positive constant, and

    k ranges from i-4 to i


Agenda4
Agenda Arbitrarily Large Datasets

  • Introduction

  • Motivation

  • Related Work

  • Proposed Work

  • Experimental Evaluation

    • Datasets

    • Experimental Setup

    • Results

  • Conclusions


The waveform dataset
The Waveform Dataset Arbitrarily Large Datasets

  • Synthetic classification problem

  • From UCI Machine Learning Repository

  • 3 classes of waveforms

  • 21 continuous attributes

  • Originally reported accuracy of 86.8%

    • with an Optimal Bayes classifier

  • 100,000 examples were generated for each class

    • |DTR| = 80,000, |DVS| = 10,000, |DTS| = 10,000


The covertype dataset
The Covertype Dataset Arbitrarily Large Datasets

  • Real-life classification problem

  • From UCI Machine Learning Repository

  • 7 classes of forest cover types

  • 44 binary and 10 continuous attributes

    • 40 binary attributes (for soil type) were transformed into 7 continuous attributes

  • Originally reported accuracy of 70%

    • obtained using a neural network classifier

  • 581,012 examples

    • |DTR| = 561,012, |DVS| = 10,000, |DTS| = 10,000


The misr dataset
The MISR Dataset Arbitrarily Large Datasets

  • Real-life regression problem

  • From NASA

  • 1 continuous target

    • retrieved aerosol optical depth

  • 36 continuous attributes

    • constructed from raw MISR data

  • 45,449 examples

    • Retrieved over land for the 48 contiguous United States during a 15-day period of summer 2002

    • |DTR| = 35,449, |DVS| = 5,000, |DTS| = 5,000


Experimental setup
Experimental Setup Arbitrarily Large Datasets

  • The procedure was repeated 50 times on each dataset

    • Stopped when convergence was reached or the sample size exceeded a pre-defined upper limit Nmax = 20,000

    • Parameters IH = 4, FA= 1.5 and C = 0.0025 selected based on preliminary experiments on Waveform dataset

  • For comparison purpose, “simple” NN ensembles of known parameters were also built

    • Trained and tested on DTR and DTS, respectively

    • Ensemble size (E)  {1, 5, 10}

    • Number of hidden neurons (H)  {1, 5, 10, 20, 40, 80}

    • Sample size (N)  {200, 400, 800, 1600, …, 204800}


Evaluation criteria
Evaluation Criteria Arbitrarily Large Datasets

  • Prediction accuracy

    • Classification – percentage of correct classifications

    • Regression – percentage of variances in target variable that can be explained by the regression model (coefficient of determination R2)

  • Computational learning cost

    • Ensembles learnt with the proposed procedure: i=1~E Hi*Ni

      where E is ensemble size, Hi is # of hidden neurons for i-th NN, and Ni is training sample size for i-th NN

    • “Simple” ensembles: H*N*E (since Hi = H and Ni = N forall i = 1~E)

  • Scatter plot

    • prediction accuracy vs. computational learning cost


Results summary
Results Summary Arbitrarily Large Datasets

  • For Waveform and MISR datasets

    • The resulting ensembles were comparable to the optimal solution in terms of accuracy and computational effort

  • For Covertype data

    • The resulting ensembles were slightly inferior to the optimal solution in terms of accuracy, but with near one order of magnitude smaller computational effort

      The optimal solution refers to the optimal combination of (E, H, N), assuming exact same component NNs


Results waveform

Arbitrarily Large Datasetsi=1~EHi*Ni or H*N*E

H*N*E

Results – Waveform


Results covertype

Arbitrarily Large Datasetsi=1~EHi*Ni or H*N*E

Results – Covertype


Results misr

Arbitrarily Large Datasetsi=1~EHi*Ni or H*N*E

Results – MISR


Summary of the resulting ensembles
Summary of the Resulting Ensembles Arbitrarily Large Datasets


Agenda5
Agenda Arbitrarily Large Datasets

  • Introduction

  • Motivation

  • Related Work

  • Proposed Work

  • Experimental Evaluation

    • Datasets

    • Experimental Setup

    • Results

  • Conclusions


Conclusions
Conclusions Arbitrarily Large Datasets

  • It can learn ensembles of near-optimal accuracy with moderate computational effort

  • It is adaptive to the inherent complexity of the datasets

  • It is different from progressive sampling

    • Automatically adjusts model complexity

    • Utilize previously built models to guide the learning process

A cost-effective iterative procedure was proposed to learn NN ensembles from arbitrarily large datasets


Towards efficient learning of neural network ensembles from arbitrarily large datasets 1329338

Thanks! Arbitrarily Large Datasets