1 / 15

Data Mining Techniques

Data Mining Techniques. Cluster Analysis Induction Neural Networks OLAP Data Visualization. Association Rule. An association rule is a rule, which implies certain association relationships among a set of objects (such as “occur together” or “one implies the other”) in a database.

lovelessj
Download Presentation

Data Mining Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Techniques • Cluster Analysis • Induction • Neural Networks • OLAP • Data Visualization

  2. Association Rule • An association rule is a rule, which implies certain association relationships among a set of objects (such as “occur together” or “one implies the other”) in a database. • Given a set of transactions, where each transaction is a set of literals (called items), an association rule is an expression of the form X Y, where X and Y are sets of items. • The intuitive meaning of such a rule is that transactions of the database, which contain X, tend to contain Y.

  3. Support The support of an item set S is the percentage of those transactions in T which contain S. • If U is the set of all transactions that contain all items in S, then support(S) = (|U| / |T|) *100%, where |U| and |T| are the number of elements in U and T, respectively.

  4. Confidence • Confidence of a candidate rule X Y is calculated as support(XY) / support(X). • The confidence of rule X Y represents the percentage of transactions containing items in X that also contain items in Y

  5. Example: Association Rule • In a store we might have I={cheese,ham,bread,butter,salt,coke} • A transaction could look like: t={bread,butter} for a customer who bought cheese and coke. • An association rule would be like the following bread=>butter with support 60% and confidence 80% also bought butter.

  6. Apriori Algorithm • Find all combinations of items that have transaction support above minimum support. Call those combinations frequent itemsets. • Use the frequent itemsets to generate the desired rules.

  7. Apriori Algorithm(cont’d) Pass 1 • Generate the candidate itemsets in C1 • Save the frequent itemsets in L1 Pass k • Generate the candidate itemsets in Ck from the frequent itemsets in Lk-1 • Join Lk-1 with Lk-1, as follows: insert intoCkselectp.item1, q.item1, . . . , p.itemk-1, q.itemk-1fromLk-1p, Lk-1q wherep.item1 = q.item1, . . . , p.itemk-1 < q.itemk-1

  8. Apriori Algorithm(cont’d) 3. Generate all (k-1)-subsets from the candidate itemsets in Ck 4. Prune all candidate itemsets from Ck where some (k-1)-subset of the candidate itemset is not in the frequent itemset Lk-1 2. Scan the transaction database to determine the support for each candidate itemset in Ck 3. Save the frequent itemsets in Lk

  9. Smart Web Search Agents • Data Search Engines >> Information Search Agents - Traditional searching on the Web is done using one of the following three: - Directories (Yahoo, Lycos, etc) - Search Engines (AltaVista, NorthernLight, etc) - Metasearch Engines (MetaCrawler, SavvySearch, AskJeeves, etc) All of these involve keyword searches; Drawback: not easily personalized, too many results (although many give relevancy factors)

  10. - local cache databases (containing frequently asked queries/results; possibly updated periodically - nightly!) - local cache information base (containing mined information and discovered knowledge for efficient personal use) - domain-based agents (e.g. Job Search; Sports-NBA Stats, Bibliography-Digital Libraries)

  11. Intelligent Tools for E-Business • Computational Intelligence, Neural Networks, Fuzzy Logic, Genetic Algorithms, Hybrid Systems • Learning Algorithms, Heuristic Searching • Data Analysis and Modeling, Data Fusion and Mining, Knowledge Discovery • Prediction & Time Series Analysis • Information Retrieval, Intelligent User Interface • Intelligent Agents, Distributed IA and Multi-Agents, Cooperative Knowledge-based Systems

  12. Enhancing E-Business Process Through Data Mining • Quality of discovered knowledge • Having right data • Having appropriate data mining tools!!! • Traditional Data Mining Tools • Simple query and reporting • Visualization driven data exploration tools, OLAP • Discovery process is user driven

  13. Intelligent Data Mining Tools • Automate the process of discovering patterns/knowledge in data • Require hypothesis, exploration • Derive business knowledge (patterns) from data • Combine business knowledge of users with results of discovery algorithms

  14. Intelligent Information Agents • The Data Mining Problem: • Clustering/ Classification • Association • Sequencing • Viewed as an Optimization Problem • Tools: Genetic Algorithms

  15. Fuzzy Rules Discovering • Rules discovering : The discovery of associations between business events, i.e. which items are purchased together • In order to do flexible querying and intelligent searching, fuzzy query is developed to uncover potential valuable knowledge • Fuzzy Query uses fuzzy terms like tall, small, and near to define linguistic concepts and formulate a query • Automated search for fuzzy Rules is carried out by the discovery of fuzzy clusters or segmentation in data

More Related