1 / 18

Chapter 7 Neural Networks in Data Mining

Chapter 7 Neural Networks in Data Mining. Automatic Model Building (Machine Learning) Artificial Intelligence. Contents. Describe neural networks as used in Data mining Reviews real applications of each model Shows the application of models to larger data sets. High-Growth Product.

Download Presentation

Chapter 7 Neural Networks in Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 7Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence

  2. Contents • Describe neural networks as used in Data mining • Reviews real applications of each model • Shows the application of models to larger data sets

  3. High-Growth Product • There are some types of data where neural network models usually outperform better when there are complicated relationships (nonlinearity) in the data. • Used forclassifying data • target customers • bank loan approval • hiring • stock purchase • DATA MINING • Used forprediction

  4. Neural Network • Neural networks are the most widely used method in data mining. • The idea of neural networks was derived from how neurons operate in the brain. • Real neurons are connected to each other, and accept electrical charges across synapses and pass on the electrical charge to other neighboring neurons. • ANN is usually arranged in at least three layers, have a defined and constant structure to reflect complex nonlinear relationships. (at least one hidden layer)

  5. Network Input Hidden Output Layer Layers Layer Good Bad

  6. Neural Network • For classification neural network models, the output layer has on node for each classification category (true or false). • Each node is connected by an arc to nodes in the next layer. These arcs have weights, which are multiplied by the value of incoming nodes and summed. • Middle layer node values are the sum of incoming node values multiplied by the arc weights. • ANN learn through feedback loops. Output is compared to target values, and the difference between attained and target output is fed back to the system to adjust the weights on arcs. • Measure fit • fine tune around best fit

  7. Neural Network • ANN can apply learned experience to new cases, for decision, classifications, and forecasts. • ANN modeling should consider: • Input variable selection and manipulation • Select learning parameter, such as the no. of hidden layers, learning rate, momentum, activation function… • About 95% of business applications were reported to use multilayered feedforward neural network with backpropagation learning rule. • Supervised learning • Each element in each layer is connected to all elements of the next layer.

  8. Neural Network • Multilayered feedforward neural networks are analogous to regression and discriminant analysis in dealing with cases where training data is available. • Self-organizing map (SOM) is analogous to clustering technique used there is no training data. • To classify data to maximize the similarity of patterns within clusters while minimizing the similarity to patterns of different clusters. • Kohonen SOM were developed to detect strong features of large data sets.

  9. Neural Network Testing • Usuallytrain on part of available data • package tries weights until it successfully categorizes a selected proportion of the training data • When trained,testmodel onpart of data • if given proportion successfully categorized, quits • if not, works some more to get better fit • The “model” is internal to the package • Model can be applied to new data

  10. Neural Network Process • Collect data • Separate into training, test sets • Transform data to appropriate units • Categorical works better, but not necessary • Select, train, & test the network • Can set number of hidden layers • Can set number of nodes per layer • A number of algorithmic options • Apply (need to use system on which built)

  11. Loan Applications • Loan decision is repetitive and time consuming, and every attempt should be made the decision that is fair to the applicant while reducing the risk of default to the lender. • Data collection: sex, marital status, No. of dependent children, occupation, … • Separating data: learning data (at least 100 sets) and testing data (100 sets) • Transform the inputs: ANN requires numeric data. See page 125.

  12. Loan Applications • Select, train and test the network: • The number of middle layer nodes, transfer function, learning algorithms. • Too many hidden layer nodes results in the ANN memorizing the input data, without learning a generalizable pattern for the accurate analysis of new data. Too few nodes, requires more training time and result in less accurate models. • Repeat step 1 through 4 until the prescribed tolerance reached.

  13. Neural Nets to Predict Bankruptcy Wilson & Sharda (1994) Monitor firm financial performance • Useful to identify internal problems, investment evaluation, auditing • Predict bankruptcy - multivariate discriminant analysis of financial ratios (develop formula of weights over independent variables) • Neural network - inputs were 5 financial ratios - data from Moody’s Industrial Manuals (129 firms, 1975-1982; 65 went bankrupt) • Tested against discriminant analysis • Neural network significantly better

  14. Ranking Neural Network Wilson (1994) • Decision problem - ranking • candidates for position, computer systems, etc. • INPUT - manager’s ranking of alternatives • Real decision - hire 2 sales people from 15 applicants • Each applicant scored by manager • Neural network took scores, rank ordered • best fit to manager of alternatives compared (AHP)

  15. Application results

  16. Application results

  17. Application results

  18. Exercise • Data coding refers to page 117. • Age <20 0 20~50 (age-20)/30 > 50 1.0 • State CA 1.0 Rest 0 • Degree Cert 0 UG 0.5 Rest 1.0 • Major IS 1.0 Csci, Engr Sci 0.9 BusAd 0.7 Other 0.5 None 0 • Experience Max Years/5 • Minimal 2 • Adequate 3

More Related