- 70 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Association Rules' - kellie-hinton

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Association Rules

Presented by: Anilkumar Panicker

What is Data Mining??

- Search for valuable information in large volumes of data.
- A step in knowledge discovery in databases.
- It enables companies to focus on customer satisfaction, corporate profits, and determining the impact of various parameters on the sales.

Association Rule

- Association rules are used to show the relationships between data items.
- Association rules detect common usage of data items.
- E.g. The purchasing of one product when another product is purchased represents an association rule.

Example 1

- Grocery store.
- Association rules have most direct application in the retail businesses.
- Association rules used to assist in marketing, advertising, floor placements and inventory control.

From the transaction history several association rules can be derived.

- E.g. 100% of the time that PeanutButter is purchased, so is bread.
- 33% of the time PeanutButter is purchased, Jelly is also purchased.

Example 2

- A Telephone Company.
- A telephone company must ensure that all calls are completed and in acceptable period of time.
- In this environment, a potential data mining problem would be to predict a failure of a node.
- This can be done by finding association rules of the type XFailure.

If these types of rules occur with a high confidence, Failures can be predicted.

- Even though the support might be low because the X condition does not frequently occur.

Association rule

- Given a set of items I = {I1,I2,….Im} and a database of transactions D = {t1,t2,….tm} where ti = { Ii1,Ii2,….Iik} and IiJ € I , an association rule is an implication of the form X Y where X,Y C I are sets of items called itemsets and X∩Y =ø.

Support (s):

The support (s) for an association rule

XY is the percentage of transactions in the database that contain X U Y.

E.g. If bread along with peanutbutter occurs in 60% of the total transactions, then the support for breadpeanutbutter is 60%

Confidence or Strength (α):

The confidence or strength (α) for an association rule XY is the ratio of the number of transactions that contain X U Y to the number of transactions that contain X.

Eg.if support for breadpeanutbutter is 60% and bread occurs in 80% of total transactions then confidence for breadpeanutbutter is 75%.

Selecting Association rules

- The selection of association rules is based on Support and Confidence.
- Confidence measures the strength of the rule, Whereas support measures how often it should occur in the database.
- Typically large confidence values and a smaller support are used.
- Rules that satisfy both minimum support and minimum confidence are called strong rules.

Association Rule Problem

- Given a set of Items I = {I1,I2,….Im} and a database of transactions D = {t1,t2,….tn} where ti = { Ii1,Ii2,….Iik} and IiJ € I . The association rule problem is to identify all association rules XY with a minimum support and confidence. These values (s,α) are given as input to the problem.

Large Itemsets

- A Large Itemset / frequent Itemset is an itemset whose number of occurrences is above a threshold, s (Support)
- Finding large Itemsets generally is quite easy but very costly.
- The naive approach would be to count all itemsets that appear in any transaction.
- Given a set of items of size m, there are 2m subsets. Ignoring the empty set we are still left with 2m – 1 subsets.

For e.g. In the retail store example if have set of items of size 5, i.e the store sells 5 products. Then the possible number of itemsets is 25 – 1 = 31.

- If the 5 products sold are bread,peanutbutter,milk,beer and jelly.

then the 31 possible itemsets are

Bread

- Peanutbutter
- Milk
- Beer
- Jelly
- Bread,peanutbutter
- Bread,milk
- Bread,beer
- Bread,jelly
- Peanutbutter,milk
- Peanutbutter,beer
- Peanutbutter,jelly
- Milk,beer
- Milk,jelly
- Beer, jelly
- Bread,peanutbutter,milk
- Bread, Peanutbutter, beer and so on.

For m = 30 the number of potential itemsets become 1073741823.

- The challenge in solving an association problem is hence to efficiently determining all large itemsets.
- Most association rule algorithms are based on smart ways to reduce the number of itemsets to be counted.

Large Itemsets

- The most common approach to finding association rules is to breakup the problem into two parts
- Finding large Itemsets and
- Generating rules from these itemsets.

Subset of any large itemset is also large.

- Once the large Itemsets have been found, we know that any interesting association rule, XY ,must have X U Y in this set of frequent itemsets.
- When all large itemsets are found, generating the association rules is straightforward.

Apriori Algorithm

- Apriori algorithm is the most well known association rule algorithm.
- Apriori algorithm is used to efficiently discover large itemsets.
- Apriori algorithm uses the property that any subset of a large itemset must be large.
- Inputs: Itemsets, Database of transactions, support and the output is large itemsets.

References

- Data Mining by Margaret Dunham.
- Wikipedia

Q & A

…… Thanks..

Download Presentation

Connecting to Server..