Data Mining. By: Thai Hoa Nguyen Pham. Data Mining. Define Data Mining Classification Association Clustering. Define Data Mining. Also known as KDD (Knowledge-Discovery in Database). Data mining is the semiautomatic process of analyzing data to find useful patterns. Why semiautomatic?
By: Thai Hoa Nguyen Pham
Manual preprocessing of data and postprocessing of data.
P, P.degree = masters and P.income > 75,000 => P.credit = excellent
P, P.degree = bachelors and P.income < 50K => P.credit = bad
Beer => Diapers
milk => screwdrivers
Higher percentage of the above association happening is worth more attention than lower percentage.
bread = > milk
For example, if the association above had a confidence of 50 percent, it just means that 50 percent of the purchases include bread and milk, but it leaves room for other items purchased with the bread.
An example of a agglomerative clustering, where we have separate elements of a set merging with each internal node until the last merge “abcdef” is achieved.
This example shows what happens when a user does a search for “Japan”. The points closer to the center of the circle has more information on Japan. We can think of the points as websites or research articles.
We could say a number of things for this example. We could say the map depicts poverty levels or which state grows more apples.