1 / 9

Data Mining CS 541

Data Mining CS 541. By Dan Stalloch. An Overview of the Uses of Data Mining. Association – what could be linked together in away with something Patterns – sequential and time series, shows us how often certain things occur Classification – shows us how data is grouped.

hesper
Download Presentation

Data Mining CS 541

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data MiningCS 541 By Dan Stalloch

  2. An Overview of the Uses of Data Mining • Association – what could be linked together in away with something • Patterns – sequential and time series, shows us how often certain things occur • Classification – shows us how data is grouped

  3. Why Data Mining is Useful • Prediction – the detection of a stable occurrence within the data that may continue into the future • Identification – what can be found out by system usage or what might be present in a thing • Classification – how the data could be grouped • Optimization – finding ways to utilize resources

  4. Data Mining Algorithms • Apriori – frequent large item sets • Sampling – small frequent item sets • Frequent-Pattern (FP) Tree and FP-Growth – better version of Apriori • Partition – efficient way to use the Apriori algorithm • Decision Tree Induction – constructing a decision tree from a training data set • k-Means – creates clustering • And others

  5. Areas that use Data Mining to Enhance Performance • Marketing – analyzing customer behavior • Finance – keeping track of credit and fraud • Manufacturing – optimizing use of resources • Health Care – checking patterns for useful information

  6. An Example of a Databse We may Wish to Mine And Why • http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data • This is a Car database from a depository of databases made available to everyone through UCI • When mining a database it is essential to ask what would you like to be able to predict from it and in this instance we would like to know which cars have decent mpg • We might also be able to predict which companies are likely to stay in business

  7. How Can We Predict Information from Mining a Database? • We must create or use programs that shows us either a 2-D contingency table or a 3-D contingency table http://www.autonlab.org/tutorials/dtree18.pdf

  8. How do we Know what Information is Worth Mining? • We use a formula to decide which areas have the highest information gain dependent on what we would like to know. That forumula goes • like this • IG(Y|X) = H(Y) - H(Y | X) • Where H(X) = the entropy of X

  9. References • http://www.autonlab.org/tutorials/dtree18.pdf • http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data • http://www.autonlab.org/tutorials/infogain11.pdf • Chapter 28 from Fundamentals of Database Systems 6th Edition By Elmasri and Navathe • Pictures from Andrew W. Moore Slides

More Related