170 likes | 256 Views
Explore data mining techniques like cluster analysis, neural networks, and association rules to optimize e-business processes and discover valuable insights. Learn about Apriori Algorithm, smart web search agents, and intelligent tools for efficient data analysis and modeling.
E N D
Data Mining Techniques • Cluster Analysis • Induction • Neural Networks • OLAP • Data Visualization
Association Rule • An association rule is a rule, which implies certain association relationships among a set of objects (such as “occur together” or “one implies the other”) in a database. • Given a set of transactions, where each transaction is a set of literals (called items), an association rule is an expression of the form X Y, where X and Y are sets of items. • The intuitive meaning of such a rule is that transactions of the database, which contain X, tend to contain Y.
Support The support of an item set S is the percentage of those transactions in T which contain S. • If U is the set of all transactions that contain all items in S, then support(S) = (|U| / |T|) *100%, where |U| and |T| are the number of elements in U and T, respectively.
Confidence • Confidence of a candidate rule X Y is calculated as support(XY) / support(X). • The confidence of rule X Y represents the percentage of transactions containing items in X that also contain items in Y
Example: Association Rule • In a store we might have I={cheese,ham,bread,butter,salt,coke} • A transaction could look like: t={bread,butter} for a customer who bought cheese and coke. • An association rule would be like the following bread=>butter with support 60% and confidence 80% also bought butter.
Apriori Algorithm • Find all combinations of items that have transaction support above minimum support. Call those combinations frequent itemsets. • Use the frequent itemsets to generate the desired rules.
Apriori Algorithm(cont’d) Pass 1 • Generate the candidate itemsets in C1 • Save the frequent itemsets in L1 Pass k • Generate the candidate itemsets in Ck from the frequent itemsets in Lk-1 • Join Lk-1 with Lk-1, as follows: insert intoCkselectp.item1, q.item1, . . . , p.itemk-1, q.itemk-1fromLk-1p, Lk-1q wherep.item1 = q.item1, . . . , p.itemk-1 < q.itemk-1
Apriori Algorithm(cont’d) 3. Generate all (k-1)-subsets from the candidate itemsets in Ck 4. Prune all candidate itemsets from Ck where some (k-1)-subset of the candidate itemset is not in the frequent itemset Lk-1 2. Scan the transaction database to determine the support for each candidate itemset in Ck 3. Save the frequent itemsets in Lk
Smart Web Search Agents • Data Search Engines >> Information Search Agents - Traditional searching on the Web is done using one of the following three: - Directories (Yahoo, Lycos, etc) - Search Engines (AltaVista, NorthernLight, etc) - Metasearch Engines (MetaCrawler, SavvySearch, AskJeeves, etc) All of these involve keyword searches; Drawback: not easily personalized, too many results (although many give relevancy factors)
- local cache databases (containing frequently asked queries/results; possibly updated periodically - nightly!) - local cache information base (containing mined information and discovered knowledge for efficient personal use) - domain-based agents (e.g. Job Search; Sports-NBA Stats, Bibliography-Digital Libraries)
Intelligent Tools for E-Business • Computational Intelligence, Neural Networks, Fuzzy Logic, Genetic Algorithms, Hybrid Systems • Learning Algorithms, Heuristic Searching • Data Analysis and Modeling, Data Fusion and Mining, Knowledge Discovery • Prediction & Time Series Analysis • Information Retrieval, Intelligent User Interface • Intelligent Agents, Distributed IA and Multi-Agents, Cooperative Knowledge-based Systems
Enhancing E-Business Process Through Data Mining • Quality of discovered knowledge • Having right data • Having appropriate data mining tools!!! • Traditional Data Mining Tools • Simple query and reporting • Visualization driven data exploration tools, OLAP • Discovery process is user driven
Intelligent Data Mining Tools • Automate the process of discovering patterns/knowledge in data • Require hypothesis, exploration • Derive business knowledge (patterns) from data • Combine business knowledge of users with results of discovery algorithms
Intelligent Information Agents • The Data Mining Problem: • Clustering/ Classification • Association • Sequencing • Viewed as an Optimization Problem • Tools: Genetic Algorithms
Fuzzy Rules Discovering • Rules discovering : The discovery of associations between business events, i.e. which items are purchased together • In order to do flexible querying and intelligent searching, fuzzy query is developed to uncover potential valuable knowledge • Fuzzy Query uses fuzzy terms like tall, small, and near to define linguistic concepts and formulate a query • Automated search for fuzzy Rules is carried out by the discovery of fuzzy clusters or segmentation in data