- 83 Views
- Uploaded on
- Presentation posted in: General

Michael R. Wick and Paul J. Wagner Department of Computer Science

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Connecting Discrete Structures to the “Real World”Using Market Basket Analysis (and Gray Codes) to Integrate and Motivate Topics in Discrete Structures

Michael R. Wick and Paul J. Wagner

Department of Computer Science

University of Wisconsin - Eau Claire

Eau Claire, WI 54701

- Introduction
- Our Discrete Structures Course
- Application: Market Basket Analysis
- The Apriori Algorithm
- Set Theory
- Dynamic Programming
- Algorithm Analysis

- Application: Binary Reflected Gray Codes
- Applications
- Recursion
- Algorithm Analysis
- Divide-and-Conquer
- Dynamic Programming

- Summary
- Contact Information

- Perceived disconnect with Discrete Structures
- Rest of curriculum
- Application to “real world”

- Particularly problematic in applied programs
- We claim this course for our own
- Replaced similar course in Mathematics
- Retained rigor
- Infused applications and algorithmics

- Topics
- Logic
- Expert Systems, Algorithm Correctness Proof

- Proof Techniques
- Recursion
- Graycodes
- Divide and Conquer
- Dynamic Programming

- Sets & Relations
- Market-basket Analysis
- compareTo and equals implementations

- Functions
- Algorithm Analysis

- Combinatorics/Probability
- Expert Systems

- Matrices
- Graphics/Transmission Errors

- Graphs and Trees
- Shortest Path, Iterative Deepening, Huffman Coding

- Logic

- Sets are a powerful way to describe the application
- Market Basket Analysis: the use of association techniques to find groups of items that tend to occur together in transactions
- frequent item sets
- sets of items that occur above some minimum threshold (called the minimum support)
- example: {a,b,c,d} occurs 12 times (min. support == 10)

- association rules
- a,b,c d iff support({a,b,c,d}) / support({a,b,c}) r (called minimum confidence)
- a,b c,d iff support({a,b}) / support({c,d}) r
- how many such rules are there?

- frequent item sets
- Suggestive Sell
- When the client selects the antecedent items suggest that they select the consequent items

- Market Basket Analysis: the use of association techniques to find groups of items that tend to occur together in transactions

- Apriori Algorithm (1997)
- Principles
- Every subset of a frequent item set must be frequent
- Every frequent item set of cardinality n+1 must have at least two frequent item sets of cardinality n as subsets
- The intersection of these two subsets must have a cardinality of n-1
- We can build every possible frequent item set of size n+1 from the union of frequent item sets of size n.

- Principles

- Apriori Algorithm (1997)
- Example: minSupport = 2
I= {Table Saw, Router, Kreg Jig, Sander, Drill Press}

T= {{Table Saw, Router, Drill Press},

{ Router, Sander },

{ Router, Kreg Jig },

{Table Saw, Router, , Sander },

{Table Saw, , Kreg Jig },

{ Router, Kreg Jig },

{Table Saw, , Kreg Jig },

{Table Saw, Router, Kreg Jig, , Drill Press},

{Table Saw, Router, Kreg Jig }}

L1 = { {T}, {R}, {K}, {S}, {D} }

L2 = { {R,T}, {K,T}, {D,T}, {K,R}, {R,S}, {D,R} }

L3 = { {K,R,T}, {D,R,T} }

L4 =

Rules = ????

- Example: minSupport = 2

k

- Apriori Algorithm (1997)
Let I = {a,b,c,…} be a set of all items in the domain

Let T = { S | S I } be a bag of all transaction records of item sets

Let support(S) = {A | A T S A} |

Let L1 = { {a} | a I support({a}) minSupport }

k (k > 1 Lk-1 ) Let

Lk = { Si Sj| (Si Lk-1) (Sj Lk-1)

( |Si– Sj| = 1 ) ( |Sj– Si| = 1)

( S[ ((S Si Sj) (|S| = k-1)) S Lk-1] )

( support(Si Sj) minSupport )

The set of all frequent item sets is given by

L = Lk

and the set of all association rules is given by

R = { A C | A (Lk) (C = Lk – A) (A ) (C )

support(Lk) / support(A) minConfidence }

k

k

- Dynamic ProgrammingApproach
- Want proof of principle of optimality and overlapping subproblems
- Principle of Optimality
- The optimal solution to Lk includes the optimal solution of Lk-1
- Proof by contradiction

- Overlapping Subproblems
- Lemma of every subset of a frequent item set is a frequent item set
- Proof by contradiction

- Rule Generation Algorithm
Let L = k Lk

Let T = {S | S I } be the set of all transactions.

Let <A,C> be an association rule with antecedent A and consequent C.

Let confid(<A,C>) = |{B | B T

(A B) B}| /

|{B | B T A B}|

Let R1 = {<F-a,a> | F L a F

confid(F,a) ≥ min_confid)} and

k [ (k > 1) (Rk-1 ≠ )

Rk = { <A,C> |

(<A,Ci> Rk-1)

(<A,Cj> Rk-1)

(|Ci – Cj| =1 |Cj – Ci| = 1)

(S [((S Ci Cj)

(|S| = k-1)) <A,S> Rk-1])

(confide(<A, Ci Cj>) ≥ min_confi) }

then

R = Rk is the set of all confident association rules.

Given as a homework

problem on sets

- Formal Definition:
- A binary reflected Gray code is a one-to-one function mapping the integers 0 i 2n – 1 to n-bit binary numbers so that every two consecutive binary numbers differ in exactly one bit.

- Origin
- Used by Emile Baudot in telegraph in 1878.
- Used by Frank Gray in 1953 patient for pulse-code modulation tube
- Prevented large noise spikes when vacuum tube counters incremented

- Example:

- Appears in a curiously large number of applications
- Towers of Hanoi
- Robotic Arm Angle measurement
- Hamiltonian Circuits
- …

Visual Representation

- Why is it called “Binary Reflected”?
- Binary is obvious
- Strings are drawn from alphabet of 0s and 1s

- Reflected is less obvious
- Each half of the code sequence is built from a reflected copy of the other half

- Binary is obvious

- A Simple Recursive Definition
- Let G(k,n) represent the kth code in the n-bit binary reflected Gray code sequence
- Computed in Θ(n) time (for n bits)
- For single Gray code value, this is optimal
- Typically, however, desire entire code sequent

- A Naïve Implementation
- To generate the entire sequence, call G(i,n) with i going from 0 to k-1.
- A priori Analysis
- Each invocation of G requires Θ(n) time
- G is invoked k times
- k is equal to 2n
- Therefore, Θ(n*2n) time and Θ(2n) space
- Optimal is Θ(2n) time and space

- What is the source of the inefficiency?
- Repeated work.

- A Dynamic Programming Approach

- Naïve Dynamic Programming Implementation
- Requirement
- We must generate and store the entire (n-1)-bit Gray code sequence prior to starting the n-bit Gray code sequence

- Approach
- Use two-dimensional matrix to store previously calculated Gray code sequences

- Requirement

- Analysis
- Time
- Space

- Notice the classic time/space trade-off
- Naïve Iterative
- Time: Θ(n*2n)
- Space: Θ(2n)

- Naïve Dynamic Programming
- Time: Θ(2n+1)
- Space: Θ(2n+1)

- Naïve Iterative
- What are the sources of the remaining inefficiencies?
- Time: Spends too much time copying values
- 2nd half of n-bit sequence is copy (plus “0”) of 1st half

- Space: Only require previous Gray code sequence, not all previous sequences

- Time: Spends too much time copying values

Time/Space trade-off

is just a rule of thumb

- Improved Approach
- Use integers rather than strings to represent codes
- Binary representation of integer is equivalent to the string version
- Requires only 1 bit per bit of code.

- Reuse the first half of the (n-1)-bit sequence directly as the first half of n-bit sequence
- Most-significant bit is still set as it must contain leading zeros.
- To set leading one of second half, just add 2n-1

- Use integers rather than strings to represent codes

- Analysis
- Produces and stores
- Time and Space

- Revised Discrete Structures Course
- Explicit connection to curriculum
- Infusion of “real-world” applications
- Applications allow infusion of
- Dynamic Programming
- Divide-and-Conquer
- Set Theory
- Algorithm Analysis
- Recursion
- Proof Techniques
- Logic

Michael R. Wick ([email protected])

Paul J. Wagner ([email protected])

Department of Computer Science

University of Wisconsin – Eau Claire

Eau Claire, WI 54701

www.cs.uwec.edu