OVERVIEW DATA MINING - PowerPoint PPT Presentation

overview data mining n.
Skip this Video
Loading SlideShow in 5 Seconds..
OVERVIEW DATA MINING PowerPoint Presentation
Download Presentation

Loading in 2 Seconds...

play fullscreen
1 / 116
Download Presentation
Download Presentation


- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


  2. Outline Of the Presentation • Motivation & Introduction • Data Mining Algorithms • Teaching Plan DATA MINING VESIT M.VIJAYALAKSHMI

  3. Why Data Mining? Commercial Viewpoint DATA MINING VESIT M.VIJAYALAKSHMI • Lots of data is being collected and warehoused • Web data, e-commerce • purchases at department/grocery stores • Bank/Credit Card transactions • Computers have become cheaper and more powerful • Competitive Pressure is strong • Provide better, customized services for an edge (e.g. in Customer Relationship Management)

  4. Typical Decision Making • Given a database of 100,000 names, which persons are the least likely to default on their credit cards? • Which of my customers are likely to be the most loyal? • Which claims in insurance are potential frauds? • Who may not pay back loans? • Who are consistent players to bid for in IPL? • Who can be potential customers for a new toy? Data Mining helps extract such information DATA MINING VESIT M.VIJAYALAKSHMI

  5. Why Mine Data? Scientific Viewpoint • Data collected and stored at enormous speeds (GB/hour) • remote sensors on a satellite • telescopes scanning the skies • microarrays generating gene expression data • scientific simulations generating terabytes of data • Traditional techniques infeasible for raw data • Data mining may help scientists • in classifying and segmenting data • in Hypothesis Formation DATA MINING VESIT M.VIJAYALAKSHMI

  6. Mining Large Data Sets - Motivation There is often information “hidden” in the data that is not readily evident. Human analysts may take weeks to discover useful information. DATA MINING VESIT M.VIJAYALAKSHMI

  7. Data Mining works with Warehouse Data • Data Warehousing provides the Enterprise with a memory • Data Mining provides the Enterprise with intelligence DATA MINING VESIT M.VIJAYALAKSHMI

  8. What Is Data Mining? • Data mining (knowledge discovery in databases): • Extraction of interesting (non-trivial,implicit, previously unknown and potentially useful)information or patterns from data in large databases • Alternative names and their “inside stories”: • Knowledge discovery(mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc. • What is not data mining? • (Deductive) query processing. • Expert systems or small ML/statistical programs DATA MINING VESIT M.VIJAYALAKSHMI

  9. Potential Applications • Market analysis and management • target marketing, CRM, market basket analysis, cross selling, market segmentation • Risk analysis and management • Forecasting, customer retention, quality control, competitive analysis • Fraud detection and management • Text mining (news group, email, documents) and Web analysis. • Intelligent query answering DATA MINING VESIT M.VIJAYALAKSHMI

  10. Other Applications • game statistics to gain competitive advantage Astronomy • JPL and the Palomar Observatory discovered 22 quasars with the help of data mining • IBM Surf-Aid applies data mining algorithms to Web access logs for market-related pages to discover customer preference and behavior pages, analyzing effectiveness of Web marketing, improving Web site organization, etc. DATA MINING VESIT M.VIJAYALAKSHMI

  11. What makes data mining possible? • Advances in the following areas are making data mining deployable: • data warehousing • better and more data (i.e., operational, behavioral, and demographic) • the emergence of easily deployed data mining tools and • the advent of new data mining techniques. • -- Gartner Group DATA MINING VESIT M.VIJAYALAKSHMI

  12. What is Not Data Mining • Database • Find all credit applicants with last name of Smith. • Identify customers who have purchased more than $10,000 in the last month. • Find all customers who have purchased milk • Data Mining • Find all credit applicants who are poor credit risks. (classification) • Identify customers with similar buying habits. (Clustering) • Find all items which are frequently purchased with milk. (association rules) DATA MINING VESIT M.VIJAYALAKSHMI

  13. Data Mining: On What Kind of Data? • Relational databases • Data warehouses • Transactional databases • Advanced DB and information repositories • Object-oriented and object-relational databases • Spatial databases • Time-series data and temporal data • Text databases and multimedia databases • Heterogeneous and legacy databases • WWW DATA MINING VESIT M.VIJAYALAKSHMI

  14. Data Mining Models And Tasks DATA MINING VESIT M.VIJAYALAKSHMI

  15. Are All the “Discovered” Patterns Interesting? • A data mining system/query may generate thousands of patterns, not all of them are interesting. • Interestingness measures: • A pattern is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful, novel, or validates some hypothesis that a user seeks to confirm • Objective vs. subjective interestingness measures: • Objective: based on statistics and structures of patterns, e.g., support, confidence, etc. • Subjective: based on user’s belief in the data, e.g., unexpectedness, novelty, etc. DATA MINING VESIT M.VIJAYALAKSHMI

  16. Can We Find All and Only Interesting Patterns? • Find all the interesting patterns: Completeness • Association vs. classification vs. clustering • Search for only interesting patterns: • First general all the patterns and then filter out the uninteresting ones. • Generate only the interesting paterns DATA MINING VESIT M.VIJAYALAKSHMI

  17. Data Mining vs. KDD • Knowledge Discovery in Databases (KDD): process of finding useful information and patterns in data. • Data Mining: Use of algorithms to extract the information and patterns derived by the KDD process. DATA MINING VESIT M.VIJAYALAKSHMI

  18. KDD Process • Selection: Obtain data from various sources. • Preprocessing: Cleanse data. • Transformation: Convert to common format. Transform to new format. • Data Mining: Obtain desired results. • Interpretation/Evaluation: Present results to user in meaningful manner. DATA MINING VESIT M.VIJAYALAKSHMI

  19. Data Mining and Business Intelligence Increasing potential to support business decisions End User Making Decisions Business Analyst Data Presentation Visualization Techniques Data Mining Data Analyst Information Discovery Data Exploration Statistical Analysis, Querying and Reporting Data Warehouses / Data Marts OLAP, MDA DBA Data Sources Paper, Files, Information Providers, Database Systems, OLTP DATA MINING VESIT M.VIJAYALAKSHMI

  20. Data Mining Development • Similarity Measures • Hierarchical Clustering • IR Systems • Imprecise Queries • Textual Data • Web Search Engines • Relational Data Model • SQL • Association Rule Algorithms • Data Warehousing • Scalability Techniques • Bayes Theorem • Regression Analysis • EM Algorithm • K-Means Clustering • Time Series Analysis • Algorithm Design Techniques • Algorithm Analysis • Data Structures • Neural Networks • Decision Tree Algorithms DATA MINING VESIT M.VIJAYALAKSHMI

  21. Human Interaction Overfitting Outliers Interpretation Visualization Large Datasets High Dimensionality Multimedia Data Missing Data Irrelevant Data Noisy Data Changing Data Integration Application Data Mining Issues DATA MINING VESIT M.VIJAYALAKSHMI

  22. Social Implications of DM • Privacy • Profiling • Unauthorized use DATA MINING VESIT M.VIJAYALAKSHMI

  23. Data Mining Metrics • Usefulness • Return on Investment (ROI) • Accuracy • Space/Time DATA MINING VESIT M.VIJAYALAKSHMI

  24. Data Mining Algorithms Classification Clustering Association Mining Web Mining DATA MINING VESIT M.VIJAYALAKSHMI

  25. Data Mining Tasks • Prediction Methods • Use some variables to predict unknown or future values of other variables. • Description Methods • Find human-interpretable patterns that describe the data. DATA MINING VESIT M.VIJAYALAKSHMI

  26. Data Mining Algorithms Classification [Predictive] Clustering [Descriptive] Association Rule Discovery [Descriptive] Sequential Pattern Discovery [Descriptive] Regression [Predictive] Deviation Detection [Predictive] DATA MINING VESIT M.VIJAYALAKSHMI


  28. Classification Given old data about customers and payments, predict new applicant’s loan eligibility. Previous customers Classifier Decision tree Age Salary Profession Location Customer type Salary > 5 K good/ bad Prof. = Exec New applicant’s data DATA MINING VESIT M.VIJAYALAKSHMI

  29. Classification Problem • Given a database D={t1,t2,…,tn} and a set of classes C={C1,…,Cm}, the Classification Problem is to define a mapping f:DgC where each ti is assigned to one class. • Actually divides D into equivalence classes. • Predictionis similar, but may be viewed as having infinite number of classes. DATA MINING VESIT M.VIJAYALAKSHMI

  30. Supervised vs. Unsupervised Learning • Supervised learning (classification) • Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations • New data is classified based on the training set • Unsupervised learning (clustering) • The class labels of training data is unknown • Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data DATA MINING VESIT M.VIJAYALAKSHMI

  31. Overview of Naive Bayes • The goal of Naive Bayes is to work out whether a new example is in a class given that it has a certain combination of attribute values. We work out the likelihood of the example being in each class given the evidence (its attribute values), and take the highest likelihood as the classification. • Bayes Rule: E- Event has occurred • P[H] is called the prior probability (of the hypothesis).P[H|E] is called the posterior probability (of the hypothesis given the evidence) 31 DATA MINING VESIT M.VIJAYALAKSHMI

  32. Children Income Status Many Medium DEFAULTS Many Low DEFAULTS Few Medium PAYS Few High PAYS ApplicantID City 1 Delhi 2 Delhi 3 Delhi 4 Delhi Worked Example 1 Take the following training data, from bank loan applicants: • P[City=Delhi | Status = DEFAULTS] = 2/2 = 1 • P[City=Delhi | Status = PAYS] = 2/2 = 1 • P[Children=Many | Status = DEFAULTS] = 2/2 = 1 • P[Children=Few | Status = DEFAULTS] = 0/2 = 0 • etc. 32 DATA MINING VESIT M.VIJAYALAKSHMI

  33. Worked Example 1 Summarizing, we have the following probabilities: and P[Status = DEFAULTS] = 2/4 = 0.5 P[Status = PAYS] = 2/4 = 0.5 The probability of ( Income=Medium) /applicant DEFAULTs = the number of applicants with Income=Mediumwho DEFAULT divided by the number of applicants who DEFAULT = 1/2 = 0.5 33 DATA MINING VESIT M.VIJAYALAKSHMI

  34. Worked Example 1 Now, assume a new example is presented where City=Delhi, Children=Many, and Income=Medium: First, we estimate the likelihood that the example is a defaulter, given its attribute values: P[H1|E] = P[E|H1].P[H1] (denominator omitted*) P[Status = DEFAULTS | Delhi,Many,Medium] = P[Delhi|DEFAULTS] x P[Many|DEFAULTS] x P[Medium|DEFAULTS] x P[DEFAULTS] = 1 x 1 x 0.5 x 0.5 = 0.25 Then we estimate the likelihood that the example is a payer, given its attributes: P[H2|E] = P[E|H2].P[H2] (denominator omitted*) P[Status = PAYS | Delhi,Many,Medium] = P[Delhi|PAYS] x P[Many|PAYS] x P[Medium|PAYS] x P[PAYS] = 1 x 0 x 0.5 x 0.5 = 0 As the conditional likelihood of being a defaulter is higher (because 0.25 > 0), we conclude that the new example is a defaulter. 34 DATA MINING VESIT M.VIJAYALAKSHMI

  35. Worked Example 1 Now, assume a new example is presented where City=Delhi, Children=Many, and Income=High: First, we estimate the likelihood that the example is a defaulter, given its attribute values: P[Status = DEFAULTS | Delhi,Many,High] = P[Delhi|DEFAULTS] x P[Many|DEFAULTS] x P[High|DEFAULTS] x P[DEFAULTS] = 1 x 1 x 0 x 0.5 = 0 Then we estimate the likelihood that the example is a payer, given its attributes: P[Status = PAYS | Delhi,Many,High] = P[Delhi|PAYS] x P[Many|PAYS] x P[High|PAYS] x P[PAYS] = 1 x 0 x 0.5 x 0.5 = 0 As the conditional likelihood of being a defaulter is the same as that for being a payer, we can come to no conclusion for this example. 35 DATA MINING VESIT M.VIJAYALAKSHMI

  36. Weaknesses • Naive Bayes assumes that variables are equally important and that they are independent which is often not the case in practice. • Naive Bayes is damaged by the inclusion of redundant (strongly dependent) attributes. • Sparse data: If some attribute values are not present in the data, then a zero probability for P[E|H] might exist. This would lead P[H|E] to be zero no matter how high P[E|H] is for other attribute values. Small positive values which estimate the so-called ‘prior probabilities’ are often used to correct this. 36 DATA MINING VESIT M.VIJAYALAKSHMI

  37. Classification Using Decision Trees • Partitioning based: Divide search space into rectangular regions. • Tuple placed into class based on the region within which it falls. • DT approaches differ in how the tree is built: DT Induction • Internal nodes associated with attribute and arcs with values for that attribute. • Algorithms: ID3, C4.5, CART DATA MINING VESIT M.VIJAYALAKSHMI

  38. DT Issues • Choosing Splitting Attributes • Ordering of Splitting Attributes • Splits • Tree Structure • Stopping Criteria • Training Data • Pruning DATA MINING VESIT M.VIJAYALAKSHMI

  39. DECISION TREES • An internal node represents a test on an attribute. • A branch represents an outcome of the test, e.g., Color=red. • A leaf node represents a class label or class label distribution. • At each node, one attribute is chosen to split training examples into distinct classes as much as possible • A new case is classified by following a matching path to a leaf node. DATA MINING VESIT M.VIJAYALAKSHMI


  41. Example Outlook sunny overcast rain humidity windy P high normal false true N P N P DATA MINING VESIT M.VIJAYALAKSHMI

  42. Building Decision Tree • Top-down tree construction • At start, all training examples are at the root. • Partition the examples recursively by choosing one attribute each time. • Bottom-up tree pruning • Remove subtrees or branches, in a bottom-up manner, to improve the estimated accuracy on new cases. • Use of decision tree: Classifying an unknown sample • Test the attribute values of the sample against the decision tree DATA MINING VESIT M.VIJAYALAKSHMI

  43. Algorithm for Decision Tree Induction • Basic algorithm (a greedy algorithm) • Tree is constructed in a top-down recursive divide-and-conquer manner • At start, all the training examples are at the root • Attributes are categorical • Examples are partitioned recursively based on selected attributes • Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain) • Conditions for stopping partitioning • All samples for a given node belong to the same class • There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf • There are no samples left DATA MINING VESIT M.VIJAYALAKSHMI

  44. Choosing the Splitting Attribute • At each node, available attributes are evaluated on the basis of separating the classes of the training examples. A Goodness function is used for this purpose. • Typical goodness functions: • information gain (ID3/C4.5) • information gain ratio • gini index DATA MINING VESIT M.VIJAYALAKSHMI

  45. Which attribute to select? DATA MINING VESIT M.VIJAYALAKSHMI

  46. A criterion for attribute selection • Which is the best attribute? • The one which will result in the smallest tree • Heuristic: choose the attribute that produces the “purest” nodes • Popular impurity criterion: information gain • Information gain increases with the average purity of the subsets that an attribute produces • Strategy: choose attribute that results in greatest information gain DATA MINING VESIT M.VIJAYALAKSHMI

  47. Information Gain (ID3/C4.5) • Select the attribute with the highest information gain • Assume there are two classes, P and N • Let the set of examples S contain p elements of class P and n elements of class N • The amount of information, needed to decide if an arbitrary example in S belongs to P or N is defined as DATA MINING VESIT M.VIJAYALAKSHMI

  48. Information Gain in Decision Tree Induction • Assume that using attribute A a set S will be partitioned into sets {S1, S2 , …, Sv} • If Si contains piexamples of P and ni examples of N, the entropy, or the expected information needed to classify objects in all subtrees Si is • The encoding information that would be gained by branching on A DATA MINING VESIT M.VIJAYALAKSHMI

  49. Example: attribute “Outlook” • “Outlook” = “Sunny”: • “Outlook” = “Overcast”: • “Outlook” = “Rainy”: • Expected information for attribute: Note: this is normally not defined. DATA MINING VESIT M.VIJAYALAKSHMI

  50. Computing the information gain • Information gain: information before splitting – information after splitting • Information gain for attributes from weather data: DATA MINING VESIT M.VIJAYALAKSHMI