Association rules
Download
1 / 25

Association Rules - PowerPoint PPT Presentation


  • 69 Views
  • Uploaded on

Association Rules. Presented by: Anilkumar Panicker. What is Data Mining??. Search for valuable information in large volumes of data. A step in knowledge discovery in databases.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Association Rules' - kellie-hinton


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Association rules

Association Rules

Presented by: Anilkumar Panicker


What is data mining
What is Data Mining??

  • Search for valuable information in large volumes of data.

  • A step in knowledge discovery in databases.

  • It enables companies to focus on customer satisfaction, corporate profits, and determining the impact of various parameters on the sales.


Association rule
Association Rule

  • Association rules are used to show the relationships between data items.

  • Association rules detect common usage of data items.

  • E.g. The purchasing of one product when another product is purchased represents an association rule.


Example 1
Example 1

  • Grocery store.

  • Association rules have most direct application in the retail businesses.

  • Association rules used to assist in marketing, advertising, floor placements and inventory control.



Example 2
Example 2 be derived.

  • A Telephone Company.

  • A telephone company must ensure that all calls are completed and in acceptable period of time.

  • In this environment, a potential data mining problem would be to predict a failure of a node.

  • This can be done by finding association rules of the type XFailure.



Association rule1
Association rule Failures can be predicted.

  • Given a set of items I = {I1,I2,….Im} and a database of transactions D = {t1,t2,….tm} where ti = { Ii1,Ii2,….Iik} and IiJ € I , an association rule is an implication of the form X  Y where X,Y C I are sets of items called itemsets and X∩Y =ø.


  • Support (s): Failures can be predicted.

    The support (s) for an association rule

    XY is the percentage of transactions in the database that contain X U Y.

    E.g. If bread along with peanutbutter occurs in 60% of the total transactions, then the support for breadpeanutbutter is 60%


  • Confidence or Strength ( Failures can be predicted.α):

    The confidence or strength (α) for an association rule XY is the ratio of the number of transactions that contain X U Y to the number of transactions that contain X.

    Eg.if support for breadpeanutbutter is 60% and bread occurs in 80% of total transactions then confidence for breadpeanutbutter is 75%.


Selecting association rules
Selecting Association rules Failures can be predicted.

  • The selection of association rules is based on Support and Confidence.

  • Confidence measures the strength of the rule, Whereas support measures how often it should occur in the database.

  • Typically large confidence values and a smaller support are used.

  • Rules that satisfy both minimum support and minimum confidence are called strong rules.


Association rule problem
Association Rule Problem Failures can be predicted.

  • Given a set of Items I = {I1,I2,….Im} and a database of transactions D = {t1,t2,….tn} where ti = { Ii1,Ii2,….Iik} and IiJ € I . The association rule problem is to identify all association rules XY with a minimum support and confidence. These values (s,α) are given as input to the problem.


Large itemsets
Large Itemsets Failures can be predicted.

  • A Large Itemset / frequent Itemset is an itemset whose number of occurrences is above a threshold, s (Support)

  • Finding large Itemsets generally is quite easy but very costly.

  • The naive approach would be to count all itemsets that appear in any transaction.

  • Given a set of items of size m, there are 2m subsets. Ignoring the empty set we are still left with 2m – 1 subsets.



  • Bread size 5, i.e the store sells 5 products. Then the possible number of itemsets is 2

  • Peanutbutter

  • Milk

  • Beer

  • Jelly

  • Bread,peanutbutter

  • Bread,milk

  • Bread,beer

  • Bread,jelly

  • Peanutbutter,milk

  • Peanutbutter,beer

  • Peanutbutter,jelly

  • Milk,beer

  • Milk,jelly

  • Beer, jelly

  • Bread,peanutbutter,milk

  • Bread, Peanutbutter, beer and so on.


  • For m = 30 the number of potential itemsets become 1073741823.

  • The challenge in solving an association problem is hence to efficiently determining all large itemsets.

  • Most association rule algorithms are based on smart ways to reduce the number of itemsets to be counted.


Large itemsets1
Large Itemsets 1073741823.

  • The most common approach to finding association rules is to breakup the problem into two parts

  • Finding large Itemsets and

  • Generating rules from these itemsets.


  • Subset of any large itemset is also large. 1073741823.

  • Once the large Itemsets have been found, we know that any interesting association rule, XY ,must have X U Y in this set of frequent itemsets.

  • When all large itemsets are found, generating the association rules is straightforward.


Apriori algorithm
Apriori Algorithm 1073741823.

  • Apriori algorithm is the most well known association rule algorithm.

  • Apriori algorithm is used to efficiently discover large itemsets.

  • Apriori algorithm uses the property that any subset of a large itemset must be large.

  • Inputs: Itemsets, Database of transactions, support and the output is large itemsets.





References
References 1073741823.

  • Data Mining by Margaret Dunham.

  • Wikipedia


Q & A 1073741823.

…… Thanks..


ad