Tutorial on Data Mining. Workshop of the Indian Database Research Community Sunita Sarawagi School of IT, IIT Bombay. Data mining. Process of semi-automatically analyzing large databases to find interesting and useful patterns
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Workshop of the Indian Database Research Community
School of IT, IIT Bombay
Salary > 5 L
Prof. = Exec
New applicant’s data
Goal: Predict class Ci = f(x1, x2, .. Xn)
Gen_Tree (Node, data)
make node a leaf?
Find best attribute and best split on attribute
Partition data on split condition
For each child j of node Gen_Tree (node_j, data_j)
r =1, k=2
rid A1 A2 A3 C
rid to L/R hash in memory.
A2 C rid
A3 C rid
A1 C rid
More information: http://www.stat.wisc.edu/~limt/treeprogs.html
Basic NN unit
A more typical NN
Conclusion: Use neural nets only if decision trees/NN fail.
ad ad adad
0.1 0.2 0.3 0.4
Variable e independent
of d given b
0.3 0.2 0.1 0.5
EM algorithm: K Gaussian mixtures
Tea, rice, bread
Find correlated events:
Identify complex operations with specific OLAP needs in mind (what does an analyst need?) rather than looking at mining operations and choosing what fits