Data Mining Concepts and Research Trends

KISS-SIGDB Tutorial 1998 Data Mining Concepts and Research Trends Do-Heon LEE Database Laboratory Dept. of Computer Science Chonnam National University 1998. 5. 21.

Table of Contents • Definition and Motivation of Data Mining • Classification of Data Mining Techniques • Mining Association Rules • Attribute Dependencies • Database Summarization • Data Mining Projects • DBMiner/GeoMiner/WebMiner • MineSet • Data Mining and Data Warehousing • References

Definition of Data Mining Data mining is the nontrivial extraction of implicit, : beyond databases and catalogs previously unknown, and : exclude well-known knowledge potentially useful information : application-dependent usefulness from large volume of : performance perspective actual data. : missing, erroneous data Some counter examples • The 3th attribute of table ‘EMP’ is ‘SALARY’. • Explicit information in the DB catalog • Most of college students have been graduated from high schools. • Well-known information, common sense

Motivation of Data Mining Research Growing reliance on database systems Database = operational data collection + useful resource reflecting domain characteristics Fast advance of database system technology Increasing volume of data stored in databases Mining databases for useful knowledge that can be exploited in decision making

Comparison with Machine Learning Data Mining Dynamic data Errorneous data Uncertain data Missing data Coexistence of irrelvant data Immense size Structured data Machine Learning Static data Error-free data Exact data No missing data Only relevant data Moderate size Flat collection of data • Data mining is an actual application of machine learning methodologies.

Classification of Data Mining Techniques • On knowledge types to be discovered • Characterization : generalized description of data characteristics • Classfication : description of discriminating characteristics • Clustering : grouping data having common properties • Association : co-occurence relationships among multiple events • Trend analysis : characterize evolution trend of temporal data • Pattern analysis : find specified patterns in large DB’s • Types of mining targets are continuously evolved • according to emerged application demands. ( cf. SQL evolution ) • On database types to be mined • relational, transactional, object-oriented, temporal, multi-media etc .. • On techniques adopted • statistics, symbolic learning, neural networks, visualization etc..

Association Rules : Definition and Applications • QUEST project at IBM Almaden Research Center • Association rules ( among items ) • Given a collection of transactions each of which is { item-1, ..., item-n }, an association rule has a form of { item-11, item-12, ... , item-1m} --> { item-21, item-22, ... , item-2k } antecedent items consequence items • The existence of an item(or items) implies the existence of other item(s) in the same transaction. In a POS(Point-Of-Sales) data set, 10/15/13:01 { coke, bread, hamburger } 10/15/14:21 { coke, hamburger , juice} 10/15/14:25 { milk, sandwich, juice } 10/15/15:13 { sandwich, milk, juice, bread } 10/15/16:31 { hamburger, juice, coke} ..... association rules decision making for shelf layout design, direct mailing, etc ... { hamburger } --> {coke} {sandwich, juice} --> {milk} • Customer usage patterns in public communication services • Fault co-occurence analysis in complex systems

Association Rules : Usefulness Measures • Two measures for identifying useful association rules • support : statistical significance - the fraction of transactions containing all items • confidence : rule strength - the fraction of transactions containing consequence items to • transactions containing antecedent items hamburger o o x x o o o o o x 7 coke o o x x o o o x x o 6 both o o x x o o o x x x 5 { coke, bread, hamburger } { coke, hamburger , juice} { milk, sandwich, juice } { sandwich, milk, juice, bread } { hamburger, juice, coke} { coke, bread, hamburger } { coke, hamburger , juice} { hamburger, juice } { milk, hamburger, sweater } { coke, milk, juice } For an assoication rule {coke} --> { hamburger }, support : 5 out of 10 = 50 % confidence : 5 out of 6 = 83 %

Association Rules : Mining Procedures The first phase : finding frequent item-sets ( high support ) : the threshold value for support is given as 40 % { coke, bread, hamburger } { coke, hamburger , juice} { milk, sandwich, juice } { sandwich, milk, juice, bread } { hamburger, juice, coke} { coke, bread, hamburger } { coke, hamburger , juice} { hamburger, juice } { milk, hamburger, sweater } { coke, milk, juice } { coke, juice } { coke, sweater} {coke} : 8 {bread} : 3 {hamburger} : 7 {juice} : 8 {milk} : 4 {sandwich} : 2 {sweater} : 2 {coke, hamburger} : 5 {coke, juice } : 5 {hamburger, juice} : 4 {coke, hamburger, juice} : 2 The second phase : finding strong associations (high confidence) : the threshold value for confidence is given as 70% • Blind search : 2N candidates • AIS : basic algorithm • SETM : sort-merge algorithm • Apriori : tree-structured candidate sets • AprioriTid : temprary table generation • Partition : partitioned mining • DHP : hash-based algorithm {coke} --> {hamburger} : 5 out of 8 = 62.5 % {hamburger} --> {coke} : 5 out of 7 = 71 % {coke} --> {juice} : 5 out of 8 = 62.5 % {juice} --> {coke} : 5 out of 8 = 62.5 %

Sequential Patterns CID 1 1 2 2 2 3 4 4 4 5 Time 95/06/25 95/06/30 95/06/10 95/06/15 95/06/20 95/06/25 95/06/25 95/06/30 95/07/25 95/06/12 Items 30 90 10,20 30 40,60,70 30,50,70 30 40,70 90 90 CID 1 2 3 4 5 Sequence <(30) (90)> <(10,20) (30) (40,60,70)> <(30,50,70)> <(30) (40,70) (90)> <(90)> Maximal sequential patterns with support > 25% <(30) (90)> <(30) (40,70)>

Telecommunication Network Diagnosis node-A node-B * time = 30 min (C, 123 ) ( F, 678 ) (E, 256 ) node-C node-F node-D “Co-occurence of 123 alarm in C and 256 alarm in E implies 678 alarm in F in 30 minintes.” node-E node-I node-G node-H

Attribute Dependencies • Given attributes A1, A2, ..., Am • f(A1, A2, ..., Am, a set of constants) ==> g(A1, A2, ... Am, a set of constants) where f and g are arbitrary (boolean) functions. e.g. (A1 = c1 and A2 = c2) then (A3 = c3 and A4 = c4) • Intractable problems because the number of possible functions and constants are potentially infinite. • Thus, several constraints are given to make them tractable in actual domains. e.g. LHS is a conjuction of simple predicates and RHS is an assertion of classification --> Classification problem

Classification • Symbolic classification rules(e.g. decision trees) • The most well-studied area among inductive learning problems. A1 A1 A2 C a d 1 a e 2 b f 3 b g 3 a b A2 d e 1 2 3 • Neural network approach • Weight values in edges --> symbolic description of classification rules • Still far from a practical solution <-- too costly learning time • ; Suitable for single-learning/multiple-runs problems

Major Birth_Place GPA art science Korea Foreign execellent good bad ... music math ... Chunnam Kyungbuk ... [4.0-3.5] (3.5,3.0] (3.0,0.0] painting physics computing ... ... ... Kwangju Sunchon Bottom-Up Summarization • DBLEARN project at J.Han's Lab., Simon Fraser Univ., Canada Name Lee Kim Yoon Park Choi Hong Major music physics math painting computing statistics Birth_Place Kwangju Sunchon Mokpo Yeosu Taegu Suwon GPA 3.4 3.9 3.7 3.4 3.8 3.2 vote 1 1 1 1 1 1 Major art science science art science science Birth_Place Chunnam Chunnam Chunnam Chunnam Kyungbuk Kyonggi GPA good execellent execellent good execellent good vote 1 1 1 1 1 1 Major art science science science Birth_Place Chunnam Chunnam Kyungbuk Kyonggi GPA good execellent execellent good vote 2 2 1 1 attribute-oriented substitution merging redundant records Domain Knowledge

w w engineering game developer marketer ... ... editor programmer Top-Down Summarization • CLEVER system at DB Lab. KAIST Table to be summarized : user's selection tSD = 0.4 < w, w > 1.000 PROGRAM vi emacs word gcc tetris USER John Tom Lee Park Yang < engineering, w > 0.833 < w, developer > 0.800 < w, marketer > 0.411 < engineering, developer > 0.700 < w, programmer > 0.589 Fuzzy set hierarchies PROG_01 USR_01 < editor, developer > 0.489 < engineering, programmer > 0.522 < editor, programmer > 0.456

Data Mining Projects • QUEST : IBM Almeden Research Center • a common set of operations in a unified framework • classfication, association etc.. • KDW(Knowledge Discovery Workbench) : GTE Laboratory Inc. • focus on architectural issues of data mining system • clustering, classification, summarization, deviation detection etc • IMACS(Intelligent Market Analysis and Classification System) : AT&T Bell Lab • focus on human interaction on data mining • data archaeology • CoverStory : Information Resources Incorporated • summarization on supermarket scanner data • DBMiner/GeoMiner/WebMiner : Simon Fraser Univ. • MineSet : Silicon Graphics Inc.

Graphical User Interface SQL Server Discovery Modules DB Data Concept Hierarchy DBMiner • DBMiner Research Group in Simon Fraser Univ., Canada • DMQL : a SQL-like Data Mining Query Language • Data structures : Generalized relations, multi-dimensional data cube

DBMiner(cont’d) • Functions • Characterizer : the general characteristics of a set of user-specified data • attribute-oriented induction • eg. Cold(x) => headache(x) and cough(x) • eg. Fever(x) => headache(x) and low-leucocyte-count(x) • Discriminator : features that distinguish the target class from constrasting classes • eg. Low-leucocyte-count(x) => Fever(x) • Classifier : generalization-based decision tree induction • Association rule finder : multi-level association rules • Meta-rule guided miner : confine the search to specific forms of rules • eg. Meta-rule : major(s : student, x) and p(s, y) => GPA(s, z) • Predictor : predict the possible values for missing data, after factor analysis • eg. An employee’s potential salary can be predicted based on the salary distribution of similar employees in the company • Data evolution evaluator • eg. Growth patterns of certain stocks • Deviation evaluator • eg. A set of stocks whose growth patterns deviate from the major trend.

GeoMiner/WebMiner • GeoMiner with GMQL(Geo-Mining Query Language) • An extension of DBMiner for spatial data mining • Modules • Geo-characterizer • eg. Given spatial hierarchies of Western Canada, discover general weather patterns according to region partitions • Geo-comparator(= discriminator) • eg. The differences in weather patterns between British Columbia and Alberta • Geo-associator • WebMiner with WebQL • It finds resources in the internet related to a specific topic • eg. What is the most popular document about data mining in terms of number of accesses • cf. WEB traversal pattern discovery(by Chen, Park and Yu, 1996) • eg. If a user visits h1 => h2 => h5 then he/she is apt to visit h8 => h11

MineSet • Developed by Silicon Graphics Inc. • Combine intelligent data mining algorithms and multidimensional data visualization techniques • Association rule generator/rule visualizer • Classification tools • MLC++ based classification modules • Decision tree inducer • Option tree inducer • Evidence classifier inducer • Decision table inducer • Tree/evidence visualizers • Map visualizer : spatial data analysis • Clustering module • Regressin tree inducer : predict unknown values

Rule Visualizer of MineSet Cited from the Silicon Graphics Inc. Home Pages

Decision Tree Visualizer of MineSet Cited from the Silicon Graphics Inc. Home Pages

Map Visualizer of MineSet Cited from the Silicon Graphics Inc. Home Pages

Two Perspectives on Data Mining • AI practitioner’s perspective • Extensions of machine learning technology • Focus on sophisticated measures and theories rather than efficiency improvement • DB practitioner’s perspective • Application of machine learning paradigms to massive and actual data management problems • A suggestion as a DB practitioner • First step : Blindly search possible knowledge ==> “ Data Mining” • There is no guru who could guide the search directions. • No available heuristics : Rather ignore heuristics for unknown patterns. • Second step : Validate the discovered rough knowledge in detail

Data Mining and Data Warehousing Data Mining Process-oriented Metadata Subject-oriented Relational DB-1 Data mart-1 Relational DB-2 Data warehouse builder/ manager Data mart-2 Object-oriented DB-1 Data warehouse Data mart-3 Object-oriented DB-2 Data mart-4 Legacy DB-1 Data mart-5 File system-1 Operational Data Data for Decision Support

Research Issues • Looking for useful mining targets • Associations, characteristic rules, classification, clustering • Functional dependency, regression trees • Similar sequential patterns/time series • Variations of association rules • Alternatives for simple support and confidence measures • Generalized/multilevel association rules • Performance enhancement for association rule discovery • System implementation issues • Identify core functions(eg. A tightly-coupled architecture[MEO98], MLC++) • Elicit common DBMS requirements for various data mining tasks • Integration with relational databases and/or multi-dimensional databases • Data/knowledge visualization • Extended query language or extened CLI : eg. DMQL • And so on ...

References [Data Mining General] • [FRW91] W. J. Frawley, G. Piatetsky-Shapiro and C. J. Matheus, “Knowledge Discovery in Databases : An Overview”, Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. J. Frawley Ed., AAAI Press, 1991, pp. 1-27 • [AGR93a] R. Agrawal, T. Imielinski and A. Swami, “Database Mining : A Performance Perspective”, IEEE Trans. on Knowledge and Data Enginieering, Vol. 5, No. 6, 1993, pp. 914-925 • [MAT93] C. J. Matheus, P. Chan and G. Piatetsky-Shapiro, “Systems for Knowledge Discovery in Databases”, IEEE TKDE, Vol. 5, No. 6, 1993, pp. 903-913 • [HOL94a] M Holsheimer and A. Siebes, “Data Mining : The Search for Knowledge in Databases”, Report CS-R9406, ISSN 0169-118X, CWI(Centrum voor Wiskunde en Informatica), The Netherland, 1994 [Association Rules] • [AGR93b] R. Agrawal, T. Imielinski and A. Swami, “Mining Associations between Sets of Items in Massive Databases”, Proc. ACM SIGMOD, Washington D.C., May 1993 • [AGR94] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases”, Proc. VLDB, Santiago, Sep. 1994, pp. 487-499 • [KLE94] M. Klemettien, H. Mannila, P. Ronakainen, H. Toivonen and A. Verkamo, “Finding Interesting Rules from Large Sets of Discovered Association Rules”, Proc. CIKM, Gaithersburg, Nov. 1994, pp. 401-407

References(Cont’d) • [HOT95] M. Houtsma and A. Swami, “Set-Oriented Mining for Association Rules in Relational Databases”, Proc. ICDE, Taipei, Mar. 1995, pp. 25-33 • [SAV95] A. Savasere, E. Omiecinski, S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases”, Proc. VLDB, Zurich, Sep. 1995, pp. 432-444 • [SRI95] R. Srikant and R. Agrawal, “Mining Generalized Association Rules”, Proc. VLDB, Zurich, Sep. 1995, pp. 407-419 • [HAN95] J. Han and Y. Fu, “Discovery of Multiple-level Association Rules from Large Databases”, Proc. VLDB, Zurich, Sep. 1995, pp. 420-431 • [PAR95a] J. -S. Park and Y. Fu, “An Efficient Hash Based Algorithm for Mining Association Rules”, Proc. SIGMOD, 1995, pp. 175-186 • [PAR95b] J. -S. Park, M. -S. Chen and P. S. Yu, “Efficient Parallel Data Mining for Association Rules”, Proc. CIKM, 1995 • [SRI96] R. Srikant and R. Agrawal, “Minining Quantitative Association Rules in Large Relational Tables”, Proc. SIGMOD, Quebec, Jun. 1996, pp. 1-12 • [FUK96] T. Fukuda, Y. Morimoto, S. Morishita and T.Tokuyama, “Data Mining Using Two-Dimensional Optimized Association Rules : Scheme, Algorithms, and Visualization”, Proc. SIGMOD, Quebec, Jun. 1996, pp. 13-23 • [CHE96] D. Cheung, J. Han, V. Ng and C.Wong, “Maintenance of Discovered Association Rules in Large Databases : An Incremental Updating Technique”, Proc. ICDE, New Orleans, Feb. 1996, pp. 106-114

References(Cont’d) • [BRI97a] S. Brin, R. Motwami, J. Ullman and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data”, Proc. SIGMOD, 1997, pp. 255-264 • [BRI97b] S. Brin, R. Motwami and C. Silverstein, “Beyond Market Baskets : Generalizing Association Rules to Correlations”, Proc. SIGMOD, 1997, pp. 265-276 • [HAN97] E. H. Han, G. Karypis and V. Kumar, “Scalable Parallel Data Mining for Association Rules”, Proc. SIGMOD, 1997, pp. 277-288 • [AGG98] C. C. Aggarwal and P. S. Yu, “Online Generation of Association Rules”, Proc. Int’l Conf. on Data Engineering, 1998, pp. 402-411 • [OZD98] B. Özden, S. Ramaswamy and A. Silberschatz, “Cyclic Association Rules”, Proc. Int’l Conf. on Data Engineering, 1998, pp. 412-423 • [LIN98] J. -L. Lin and M. H. Dunham, “Mining Association Rules : Anti-Skew Algorithms”, Proc. Int’l Conf. on Data Engineering, 1998, pp. 486-493 • [SAV98] A. Savasere, E. Omiecinski ans S. Navathe, “Mining for Strong Negative Associations in a Large Database of Customer Transactions”, Proc. Int’l Conf. on Data Engineering, 1998, pp. 494-502 • [RAS98] R. Rastogi and K. Shim, “Mining Optimized Association Rules with Categorical and Numeric Attributes”, Proc. Int’l Conf. on Data Engineering, 1998, pp. 503-513

References(Cont’d) [Characterization] • [HAN91] Y. Cai, N. Cercone and J. Han, “Attribute-Oriented Induction in Relational Databases”, Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. Frawley Ed., AAAI Press, 1991, pp. 213-228 • [HAN92a] J. Han, Y. Cai and N. Cercone, “Knowledge Discovery in Databases : An Attribute-Oriented Approach”, Proc. VLDB, 1992, pp. 547-559 • [HAN92b] J. Han, Y. Cai, N. Cercone and Y. Huang, “DBLEARN : A Knowledge Discovery System for Large Databases”, Proc. CIKM, 1992, pp. 473-481 • [HAN93] J. Han, Y. Cai and N. Cercone, “Data-Driven Discovery of Quantitative Rules in Relational Databases”, IEEE TKDE, Vol. 5, No. 1, Feb. 1993, pp. 29-40 • [LEE94] D.-H. Lee and M. H. Kim, “Discovering Database Summaries through Refinements of Fuzzy Hypotheses”, Proc. ICDE, Houston, Feb. 1994, pp. 223-230 • [LEE97] D.H. Lee and M.H. Kim, "Database Summarization Using Fuzzy ISA Hierarchies", IEEE Transactions on Systems, Man and Cybernetics, Vol.27, No.4, August 1997, pp. 671-680

References(Cont’d) [Sequential Patterns] • [ARG93c] R. Agrawal, C. Faloutsos and A. Swami, “Efficient Similarity Search in Sequence Databases”, Proc. the 4th Int’l Conf. on Foundations of Data Organization and Algorithms, Chicago, Oct 1993 • [FAL94] C. Faloutsos, M. Ranganathan and Y. Manolopoulos, “Fast Subsequence Matching in Time-Series Databases”, Proc. SIGMOD, Minneapolis, May. 1994, pp. 419-429 • [AGR95a] R. Agrawal and R. Srikant, “Mining Sequential Patterns”, Proc. ICDE, Taipei, Mar. 1995, pp. 3-14 • [AGR95b] R. Agrawal, K.Lin, H. Sawhney and K. Shim, “Fast Similarity Search in the Presense of Noise, Scaling, and Translation in Time-Series Databases”, Proc. VLDB, Zurich, Sep. 1995, pp. 490-501 • [AGR95c] R. Agrawal, G. Psaila, E. Wimmers and M. Zait, “Querying Shapes of Histories”, Proc. VLDB, Zurich, Sep. 1995, pp. 502-514 • [HAT96] K. Hatonen, M. Klemettinen, H. Mannila, P. Ronkainen and H. Toivonen, “Knowledge Discovery from Telecommunication Network Alarm Databases”, Proc. ICDE, New Orleans, Feb. 1996, pp. 115-123 • [SHA96] H. Shatkay and S.Zdonik, “Approximate Queries and Representations for Large Data Sequences”, Proc. ICDE, New Orleans, Feb. 1996, pp. 536-545 • [LI96] C. Li, P. Yu and V. Castelli, “HierarchyScan: A Hierarchical Similarity Search Algorithm for Databases of Long Sequences”, Proc. ICDE, New Orleans, Feb. 1996, pp. 546-555 • [CHE96] M. -S. Chen, J. S. Park and P. S. Yu, “Data Mining for Path Traversal Patterns in a Web Environment”, Proc. ICDCS, 1997, pp. 385-392 • [SHA97] J. Shafer and R. Agrawal, “Parallel Algorithms for High-Dimensional Proximity Joins”, Proc. VLDB, 1997, pp. 176-185

References(Cont’d) [Classification/Clustering] • [QUI89] J. Quinlan and R. Rivest, “Inferring Decision Trees Using the Minimum Description Length Principle”, Information and Computation, Vol. 80, 1989, pp. 227-248 • [YAS91] R. Yasdi, “Learning Classification Rules from Database in the Context of Knowledge Acquisition and Representation”, IEEE TKDE, Vol. 3, No. 3, Sep. 1991, pp. 293-306 • [CHA91] K. Chan and A. Wong, “A Statistical Technique for Extracting Classificatory Knowledge from Databases”, Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. Frawley Ed., AAAI Press, 1991, pp. 107-123 • [UTH91] R. Uthursamy, U. Fayyad and S. Spangler, “Learning Useful Rules from Inconclusive Data”, Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. Frawley Ed., AAAI Press, 1991, pp. 141-157 • [ZIA91] W. Ziarko, “The Discovery, Analysis and Representation of Data Dependencies in Databases”, Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. Frawley Ed., AAAI Press, 1991, pp. 195-209 • [PIA91] G. Piatetsky-Shapiro, “Discovery, Analysis and Presentation of Strong Rules”, Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. Frawley Ed., AAAI Press, 1991, pp. 229-248 • [MAN91] M. Manago and Y. Kodratoff, “Induction of Decision Trees from Complex Structured Data”, Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. Frawley Ed., AAAI Press, 1991, pp. 289-306

References(Cont’d) • [SMY92] P. Smyth and R. Goodman, “An Information Theoretic Approach to Rule Induction from Databases”, IEEE TKDE, Vol. 4, No. 4, Aug. 1992, pp. 301-316 • [WAN92] L. Wang and J. Mendel, “Generating Fuzzy Rules by Learning from Examples”, IEEE TSMC, Vol. 22, No. 6, Nov. 1992, pp. 1414-1427 • [AGR92] R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer and A. Swami, “An Interval Classifier for Database Mining Applications”, Proc. VLDB, Vancouver, Aug. 1992, pp.207-216 • [LU95] H. Lu, R. Setiono and H. Liu, “NeuroRule : A Connectionist Approach to Data Mining”, Proc. VLDB, Zurich, Sep. 1995, 478-489 • [HON91] J. Hong and C. Mao, “Incremental Discovery of Rules and Structure by Hierarchical and Parallel Clustering”, Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. Frawley Ed., AAAI Press, 1991, pp. 177-194 • [NG94] R. Ng and J. Han, “Efficient and Effective Clustering Methods for Spatial Data Mining”, Proc. VLDB, 1994, pp. 144-155 • [XU98] X. Xu, M. Ester, H. -P. Kriegel and J. Sander, “A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases”, Proc. Int’l Conf. on Data Engineering, 1998, pp. 324-333

References(Cont’d) [System Implementations] • [SEL96] P.Selfridge, D.Srivastava and L. Wilson, “IDEA : Interactive Data Exploration and Analysis”, Proc. SIGMOD, Quebec, Jun. 1996, pp. 24-34 • [MEO98] R. Meo, G. Psalia and S. Ceri, “A Tightly-Coupled Architecture for Data Mining”, Proc. Int’l Conf. on Data Engineering, 1998, pp. 316-323 • [HAN96] J. Han et. al., “DBMiner : A System for Mining Knowledge in Large Relational Databases”, Proc. KDD, 1996 • [HAN97] J. Han et. al., “GEOMiner : A System Prototype for Spatial Data Mining”, Proc. SIGMOD, 1997 • [HAN98] “WebMiner : A Resource and Knowledge Discovery System for the Internet”, http://db.cs.sfu.ca/WebMiner/ • [KOH96] R. Kohavi et. al., “Data Mining Using MCL++ : A Machine Learning Library in C++”, Proc. Tools with AI, 1996, pp. 234-245 • [HAL98] C. Hall ed., “MineSet 2.0 for Data Mining and Multidimensional Data Analysis”, http://www.cgi.com/Products/software/MineSet/DMStrategies/index.html

Data Mining Concepts and Research Trends

Data Mining Concepts and Research Trends

Presentation Transcript

Data Mining Concepts

Data Mining Concepts

Data Mining: Concepts and Techniques Mining Text Data

Data Mining Concepts

Data Mining Concepts

Data Mining: Concepts and Techniques

Data Mining Concepts

Data Mining Concepts

Data Mining Concepts

Data Mining Concepts

Data Mining: Concepts and Techniques Mining data streams

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques Mining data streams

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques