Frequent Word Combinations Mining and Indexing on HBase. Hemanth Gokavarapu Santhosh Kumar Saminathan. Introduction. Many projects use Hbase to store large amount of data for distributed computation
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Frequent Word Combinations Mining and Indexing on HBase
HemanthGokavarapu
Santhosh Kumar Saminathan
What is Data Mining?
How Data Mining works?
What technology of infrastructure is needed?
Two critical technological drivers answers this question.
If theminimum support is 50%, then {Shoes, Jacket} is the only 2- itemset that satisfies the minimum support.
If the minimum confidence is 50%, then the only two rules generated from this 2-itemset, that have confidence greater than 50%, are:
Shoes Jacket Support=50%, Confidence=66%
Jacket Shoes Support=50%, Confidence=100%
Database D
L1
C1
Scan D
C2
C2
L2
Scan D
L3
C3
Scan D
Min support =50%
Uses larger itemset property
Easily Parallelized
Easy to Implement
Assumes transaction database is memory resident
Requires many database scans
What is HBase?
Start
Find Frequent Items
Find Candidate Itemsets
Find Frequent Items
No
Set Null?
Yes
Generate Association Rules