Data Mining Technique

Data Mining Technique

Data Mining Techniques • In data mining techniques we focus on understanding ways, methods used in analysing (sub) set of data. • In this case you have to understand four classes of task involved in data mining • Classification - Arranges the data into predefined groups. For example an email program might attempt to classify an email as legitimate or spam. Common algorithms include Nearest neighbor, Naive Bayes classifier and Neural network.

Data Mining Techniques • Clustering - Is like classification but the groups are not predefined, so the algorithm will try to group similar items together. • Regression - Attempts to find a function which models the data with the least error. A common method is to use Genetic Programming.

Data Mining Techniques • Association rule learning (Mining) - Searches for relationships between variables. For example a supermarket might gather data of what each customer buys. Using association rule learning, the supermarket can work out what products are frequently bought together, which is useful for marketing purposes. This is sometimes referred to as "market basket analysis".

Data Mining Techniques • The ultimate goal of data mining is prediction • Predictive data mining is the most common type of data mining and one that has the most direct business applications. • The process of data mining consists of three stages: • The initial exploration • Model building or pattern identification with validation/verification • Deployment(i.e., the application of the model to new data in order to generate predictions).

Applications • These techniques can be applied in companies with a strong consumer focus – retail, financial communication, and marketing organisations. • They enable companies to determine relationships among “internal” factors such as price, product positioning, or staff skills. • Also help determine "external" factors such as economic indicators, competition, and customer demographics.

Applications • Also it enables them to determine the impact on sales, customer satisfaction, and corporate profits. • Finally, it enables them to "drill down" into summary information to view detail transactional data.

Applications • With data mining, a retailer could use point-of-sale (PoS) records of customer purchases to send targeted promotions based on an individual's purchase history. By mining demographic data from comment or warranty cards, the retailer could develop products and promotions to appeal to specific customer segments.

Applications • For example, Blockbuster Entertainment in USA mines its video rental history database to recommend rentals to individual customers. • American Express can suggest products to its cardholders based on analysis of their monthly expenditures.

Applications • The National Basketball Association (NBA) is exploring a data mining application that can be used in conjunction with image recordings of basketball games. • The “Advanced Scout” software analyzes the movements of players to help coaches orchestrate plays and strategies.

Applications • For example, an analysis of the play-by-play sheet of the game played between the New York Knicks and the Cleveland Cavaliers on January 6, 1995 reveals that when Mark Price played the Guard position, John Williams attempted four jump shots and made each one! “Advanced Scout” not only finds this pattern, but explains that it is interesting because it differs considerably from the average shooting percentage of 49.30% for the Cavaliers during that game.

How does Data Mining Work? • Data mining software analyzes relationships and patterns in stored transaction data based on open-ended user queries. • Four types of relationships are sought using several types of available analytical software:-

How does Data Mining Work? • Classes: Stored data is used to locate data in predetermined groups. For example, a restaurant chain could mine customer purchase data to determine when customers visit and what they typically order. This information could be used to increase traffic by having daily specials. • Clusters: Data items are grouped according to logical relationships or consumer preferences. For example, data can be mined to identify market segments or consumer affinities.

How does Data Mining Work? • Associations: Data can be mined to identify associations. The beer-diaper example is an example of associative mining. • Sequential patterns: Data is mined to anticipate behavior patterns and trends. For example, an outdoor equipment retailer could predict the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes.

How does Data Mining Work? • Data mining consists of five major elements: • Extract, transform, and load transaction data onto the data warehouse system • Store and manage the data in a multidimensional database system. • Provide data access to business analysts and information technology professionals. • Analyze the data by application software. • Present the data in a useful format, such as a graph or table.

Association Rule Mining

Association Rule Mining • Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories Frequent Pattern: A pattern (set of items, sequence, etc.) that occurs frequently in a database

Motivations For Association Mining • Motivation: Finding regularities in data • What products were often purchased together? • Beer and nappies! • What are the subsequent purchases after buying a PC? • What kinds of DNA are sensitive to this new drug? • Can we automatically classify web documents?

Motivations For Association Mining (cont…) • Broad applications • Basket data analysis, cross-marketing, catalog design, sale campaign analysis • Web log (click stream) analysis, DNA sequence analysis, etc.

Market Basket Analysis • Market basket analysis is a typical example of frequent itemset mining • Customers buying habits can be presumed by finding associations between different items that customers place in their “shopping baskets” • This information can be used to develop marketing strategies

Market Basket Analysis (cont…)

Association Rule Basic Concepts • Let I be a set of items {I1, I2, I3,…, Im} • Let D be a database of transactions where each transaction T is a set of items such that TI • So, if A is a set of items, a transaction T is said to contain A if and only if A T • An association rule is an implication AB where AI, BI, and A B=

Association Rule Support & Confidence • We say that an association rule AB holds in the transaction set D with support, s, and confidence, c • The support AB of the association rule is given as the percentage of transactions in D that contain both A or B (A B) • So, the support can be considered the probability P(A B)

Association Rule Support & Confidence (cont…) • The confidence of the association rule is given as the percentage of transactions in D containing A that also contain B • So, the confidence can be considered the conditional probability P(B|A) • Association rules that satisfy minimum support and confidence values are said to be strong

Itemsets & Frequent Itemsets • An itemset is a set of items • A k-itemset is an itemset that contains k items • The occurrence frequency of an itemset is the number of transactions that contain the itemset • This is also known more simply as the frequency, support count or count • An itemset is said to be frequent if the support count satisfies a minimum support count threshold • The set of frequent itemsets is denoted Lk

Summary of Support and Confidence • Support for an itemset A in a transactional database D is defined as count(A) / |D|. • For an association rule A B, we can calculate support(A B) = support(AB) = support(A union B). confidence (A B) = support(AB) / support(A).

Summary of Support and Confidence • The strength of an association rule is often measured in terms of the support and confidence metrics. • Support determines how frequently a rule is satisfied in the entire data set and is defined as the fraction of all transactions that contain A ∪ B. • Confidence determines how frequently items in A appear in transactions that contain B

Summary of Support and Confidence • Support (S) and Confidence (C) can also be related to joint probabilities and conditional probabilities as follows • support(A B) = P(AB). • confidence(A B) = P(A/B).

Support & Confidence Again • Support and confidence values can be calculated as follows:

Mining Association Rules: An Example

Association Rule Mining • So, in general association rule mining can be reduced to the following two steps: • Find all frequent itemsets • Each itemset will occur at least as frequently as as a minimum support count • Generate strong association rules from the frequent itemsets • These rules will satisfy minimum support and confidence measures

Data Mining Technique

Data Mining Technique

Presentation Transcript

Data Mining

Data Mining

Data Mining: Data

Data Mining

DATA MINING

Data Mining: Data

Data Mining: Data

Data Mining: P enelitian Data Mining

Data Mining

Data Mining Technique

Data Mining: Data

Applying Data Mining Technique to Direct Marketing

Data Mining: Data

Data-mining

Data Mining

Data Mining: Data

Data Mining: Data

Data Mining: Data

Data Mining based on Hashing Technique

Data Mining: Data

Data Mining: Data