Nonlinear Models 8 February 1999

Data Mining in Finance Andreas S. Weigend Leonard N. Stern School of Business, New York University Nonlinear Models 8 February 1999 RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU

The seven steps of model building • 1. Task • Predict distribution of portfolio returns, understand structure in yield curves, find profitable time scales, discover trade styles, … • 2. Data • Which data to use, and how to code/ preprocess/ represent them • 3. Architecture • 4. Objective/ Cost function (in-sample) • 5. Search/ Optimization/ Estimation • 6. Evaluation • 7. Analysis and Interpretation RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU

How to make predictions? • “Pattern” = Input + Output Pair • Keep all data • Nearest neighbor lookup • Local constant model • Local linear model • Throw away data, only keep model • Global linear model • Global nonlinear model • Neural network with hidden units • Sigmoids or hyperbolic tangents (tanh) • Radial basis functions • Keep only a few representative data point • Support vector machines RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU

Training data: Inputs and corresponding outputs output input2 input1 RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU

What is the prediction for a new input? output input2 input1 new input RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU

Nearest neighbor • Use output value of nearest neighbor in input space as prediction prediction nearest neighbor output input2 input1 new input RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU

Local constant model • Use average of the outputs of nearby points in input space output input2 input1 new input RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU

Local linear model • Find best-fitting plane (linear model) through nearby points in input space output input2 input1 new input RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU

Nonlinear regression surface • Minimize “energy” stored in the “springs” output input2 input1 RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU

Throw away the data… just keep the surface! output input2 input1 RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU

Modeling – an iterative process  Step 1: Task/ Problem definition  Step 2: Data and Representation  Step 3: Architecture  Step 4: Objective/ Cost function (in-sample)  Step 5: Search/ Optimization/ Estimation  Step 6: Evaluation (out-of-sample)  Step 7: Analysis and Interpretation RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU

Modeling issues  Step 1: Task and Problem definition  Step 2: Data and Representation  Step 3: Architecture • What are the “primitives” that make up the surface?  Step 4: Objective/ Cost function (in-sample) • How flexible should the surface be? • Too rigid model: stiff board (global linear model) • Too flexible model: cellophane going through all points • Penalize too flexible models (regularization)  Step 5: Search/ Optimization/ Estimation • How do we find the surface?  Step 6: Evaluation (out-of-sample)  Step 7: Analysis and Interpretation RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU

Step 3: Architecture – Example of neural networks • Project the input vector x onto a weight vector w • w * x • This projection is then be nonlinearly “squashed” to give a hidden unit activation • h = tanh (w * x) • Usually, a constant c in the argument allows the shifting of the location • h = tanh (w * x + c) • There are several such hidden units, responding to different projections of the input vectors • Their activations are combined with weights v to form the output (and another constant b can be added) • output = v * h + b RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU

Neural networks compared to standard statistics • Comparison between neural nets and standard statistics • Complexity • Statistics: Fix order of interactions • Neural nets: Fix number of features • Estimation • Statistics: Find exact solution • Neural nets: Focus on path • Dimensionality • Number of inputs: Curse of dimensionality • Points far away in input space • Number of parameters: Blessing of dimensionality • Many hidden units make it easier to find good local minimum • But need to control for model complexity RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU

Step 4: Cost function • Key problem: • Want to be good on new data... • ...but we only have data from the past • Always observation y = f(input) + noise • Assume • Large sudden variations in output are due to noise • Small variation (systematic) are signal, expressed as f(input) • Flexible models • Good news: can fit any signal • Bad news: can also fit any noise • Requires modeling decisions: • Assumptions about model complexity • Weight decay, weight elimination, smoothness • Assumptions about noise: error model or noise model RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU

Step 5: Determining the parameters • Search with gradient descent: iterative • Vice to virtue: path important • Guide network through solution space • Hints • Weight pruning • Early stopping • Weight-elimination • Pseudo-data • Add noise • … • Alternative approaches: • Model to match the local noise level of the data • Local error bars • Gated experts architecture with adaptive variances RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU

Nonlinear Models 8 February 1999

Nonlinear Models 8 February 1999

Presentation Transcript

Lecture 9 – Nonlinear Programming Models

Nonlinear Models

Chapter 8 Nonlinear Optimization Models

Nonlinear Models with Spatial Data

February 1999: Ylla

February 8

February 14, 1999 (WINTER)

Nonlinear Models with Spatial Data

Xilinx Academy February 22, 1999

WORKSHOP 8 NONLINEAR CONTACT

EnronOnline 8 July 1999

Lecture 8 – Nonlinear Programming Models

Nonlinear Programming Models

November 8, 1999

Nonlinear Models

Nonlinear models for Natural Images

Lecture 9 – Nonlinear Programming Models

Nonlinear Regression Models

Chapter 8 NONLINEAR OPTICS

Section 7.4: Other Nonlinear Models

Mukund Moorthy 2nd February 1999

Nonlinear Models with Spatial Data