1 / 38

28.11.2007

Data mining Demo 1. 28.11.2007. Introduction. This slide set contains: Very easy and supervised introductory material for MATLAB Homework ( Task 1 and Task 2 in the last two slides)

gili
Download Presentation

28.11.2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data mining Demo 1 28.11.2007

  2. Introduction • This slide set contains: • Very easy and supervised introductory material for MATLAB • Homework (Task 1 and Task 2 in the last two slides) • If you are already expert in MATLAB then you can skip the introduction and start to work with the homeworks • By returning homeworks (a short report about the accomplished tasks) one can get credits for the final exam (total 3 x 2points, exam max is 4x6 = 24 points) • The reports for the tasks of Demo 1 must be returned not later than 10.12.2007 • The reports can be returned by e-mail (sami.ayramo@jyu.fi), or to the office room (Ag C416.2), or to mailbox (which can be found two meter away from the office door) • Some additional codes that may be useful for the homework can be found at http://users.jyu.fi/~samiayr/DM/demot

  3. Requirements • Basic computer skills (e.g., starting applications, opening, closing and saving files, cutting and pasting text, directory structures, …) • Know how to use a text editor, such as Windows Notepad, that you can use to write MATLAB programs (MATLAB also has its own built-in text editor which you can use) • Basic algebra and trigonometry • Knowledge of basic linear algebra (i.e., concepts such as matrix, vector, inverse etc.) would also be very helpful • While you are following this introduction, have MATLAB running in a separate window and perform and experiment with the examples • This introduction is extracted, modified and compressed from the one available at the Mathworks Student Center: • http://www.mathworks.com/academia/student_center/tutorials/

  4. Facts about MATLAB • MATLAB is a computer program • for solving the sorts of mathematical problems frequently encountered, for example, data mining, data analysis, statistics, simulation, engineering, mathematical modelling • Built-in features of MATLAB to enables effortless solving of a wide variety of numerical problems • from the very basic, such as a system of 2 equations with 2 unknowns: • X + 2Y = 24 • 12X - 5Y = 10 • to the more complex, such as factoring polynomials, fitting curves to data points, making calculations using matrices, performing signal processing operations such as Fourier transforms, and building and training neural networks. • MATLAB can be used to plot many different kinds of graphs, enabling the visualization of complex mathematical functions and laboratory data • The three images below have been created using MATLAB plotting functions Images are taken from www.mathworks.com

  5. Starting MATLAB • You can start MATLAB by double-clicking on the MATLAB icon • The MATLAB Desktop will then pop-up

  6. Entering commands in MATLAB • “>>” is the command prompt, • Type a command at a command prompt and MATLAB executes the command you typed in, and then prints out the result • Ex1: Enter a simple MATLAB command date to see how it works • Ex2: Try also the clc command (clear command window) • Ex3: To exit MATLAB can just enter quit at the MATLAB command prompt • To get a good feel for the kinds of things you can use MATLAB for, also many different demos are provided, all accessible from a demo window that is popped up when you type, demo, at the command prompt

  7. Getting help • MATLAB has an extensive help system built into it, containing detailed documentation and help information on all of the commands and functions of MATLAB • To obtain help on a given function there are three main functions: help, helpwin (short for help window) or doc (short for documentation). • help and helpwin give you the same information, but in a different window, the doc command returns an HTML page with a lot more information • Ex4:Find help on the date function using the different functions • Another source of help is the MATLAB help browser • you can invoke the MATLAB help browser by • typing helpbrowser at the MATLAB command prompt • clicking on the help button “?” • by selecting Start->MATLAB->Help from the MATLAB desktop • Tutorials and documents can also be found at www.mathworks.com in large amounts

  8. Working with variables • Variables are a fundamental concept in MATLAB and are used all the time • In its simplest mode of use, MATLAB can be used just like a pocket calculator • MATLAB supports all the basic arithmetic operations: +, -, *, /, ^, etc.; and you can group and order operations by enclosing them in parentheses • Ex5: Try the following calculator-like operations with MATLAB by typing • 4 + 10 • 5 *10 + 6 • (6 + 6) / 3 • 9^2 • What is ans? In short ans is short for "answer", and is used in MATLAB as the default variable name when none is specified • Ex6: Check the value of ans by typing ans • Ex7: Try to change the value of ans by typing ans + 6 • You can also define and use your own variables • Ex8: Create three variables, chech the value of the first one, and calculate the average. For instance, enter the following commands • a = 10 • b = 20 • c = 30 • a • the_average = (a + b + c) / 3

  9. Working with variables • If you have defined a lot of different variables, you probably can't remember all the variable names you have defined. Therefore, it is nice to get a list of all the variables currently defined. Simply typing whos at the command prompt will return to you the names of all variables that are currently defined. • Ex9: Try the sequence of the following commands: • clear • a = 5 • b = 6 • whos • Typing clear at the command prompt will remove all variables and values that were stored up to that point. • Ex10: For example, continue from the above example: • whos • clear • whos

  10. Working with variables • If a command is followed by a semicolon (;) then MATLAB evaluates the expression and store the result internally, but will not print put the result • The user is mainly concerned only with some final result in your MATLAB sessions, which will be calculated by combining many temporary, intermediate variables and by appending a semicolon to the expressions that assign values to the temporary, intermediate variables causes their results to not be printed • Ex11: Compare the following expressions: • a = 4 + 5 • b = 5 + 6;

  11. Working with variables • In MATLAB, there are some specific rules for what you can name your variables • Only use primary alphabetic characters (i.e., "A-Z"), numbers, and the underscore character (i.e., "_") in your variable names. • You cannot have any spaces in your variable names • For example, using "this is a variable" as a variable name is not allowed, but "this_is_a_variable" is fine • MATLAB is case sensitive. • For example, "A_VaRIAbLe", "a_variable", "A_VARIABLE", and "A_variablE" would all be considered distinct variables in MATLAB • Using single quotes one can also assign pieces of text to variables • Ex12: For example, try: • some_text = 'This is some text assigned to a variable!'; • some_text • Be careful not to mix up variables that have text values with variables that have numeric values in equations

  12. MATLAB – the matrix laboratory • Three fundamental concepts in MATLAB, and in linear algebra, are: • A scalar is simply just a fancy word for a number (a single value) • A vector is an ordered list of numbers (one-dimensional) • In MATLAB they can be represented as a row-vector or a column-vector • A matrix is a rectangular array of numbers (multi-dimensional) • In MATLAB, a two-dimensional matrix is defined by its number of rows and columns • Both scalars and vectors can be considered a special type of matrix. • A scalar is a matrix with a row and column dimension of one (1-by-1 matrix) • A vector is a one-dimensional matrix: one row and n-number of columns or n-number of rows and one column • All calculations in MATLAB are done with "matrices". Hence the name MATrix LABoratory.

  13. MATLAB – the matrix laboratory • In MATLAB matricies are defined inside a pair of square braces ([]) • A comma (,) and semicolon (;) are used as a row separator and column separator, respectfully • Note: you can also use a space as a row separator, and a carriage return (the enter key) as a column separator as well • Ex13: Try the examples to see how a scalar, and row and column vectors, can be created • my_scalar = 3.1415 • my_vector1 = [1, 5, 7] • my_vector2 = [1; 5; 7]

  14. MATLAB – the matrix laboratory • What about a two dimensional matrix? • Ex14: Create a 4-by-3 matrix called my_matrix with the numbers 8, 12, and 19 in the first row, 7, 3, 2 in the second row, 12, 4, 23 in the third row, and 8, 1, 1, in the fourth row by typing the following command: • my_matrix = [8, 12, 19; 7, 3, 2; 12, 4, 23; 8, 1, 1] • You can also combine different vectors and matrices together to define a new matrix • Remember that the output needs to be a valid rectangular matrix • Ex15: Construct a matrix from row vectors by typing the following lines • row_vector1 = [1 2 3] • row_vector2 = [3 2 1] • matrix_from_row_vec = [row_vector1 ; row_vector2] • Ex16: Construct a matrix from column vectors by typing the following lines • column_vector1 = [1;3] • column_vector2 = [2;8] • matrix_from_col_vec = [column_vector1 column_vector2] • Ex17: Construct a matrix from a 4x3 matrix by typing the following lines • my_matrix = [8, 12, 19; 7, 3, 2; 12, 4, 23; 8, 1, 1] • combined_matrix = [my_matrix, my_matrix]

  15. Indexing vectors and matrices • Once a vector or a matrix is created you might needed to extract only a subset of the data, and this is done through indexing. • In a row vector the left most element has the index of one. • In a column vector the top most element has the index of one. • Ex17: Create vectors “my_vector1” and “my_vector2” and try to index into its values • my_vector1 = [1 5 7] • my_vector2 = [1; 5; 7] • my_vector1(1) • my_vector2(2) • my_vector1(3) • my_vector2(1) • my_vector2(2) • my_vector2(3) • The process is much the same for a two-dimensional matrix. The only difference is that you have to specify both the row and column indices. • Ex18: Access the value of 4 in my_matrix • my_matrix = [8, 12, 19; 7, 3, 2; 12, 4, 23; 8, 1, 1] • my_matrix(3,2) • Note: The row number is first, followed by the column number.

  16. Indexing vectors and matrices • You can also extract any contiguous subset of a matrix, by referring to the row range and column range you want. • Ex19: Try the following examples: • mat = [1 3 2 3 5 6 5; 7 4 8 1 2 3 4; 3 2 8 4 7 3 2; 3 2 3 4 1 4 2] • mat(2:4,4:7) • mat(1:2,[1:3 5:6]) • You can change a number in a matrix by assigning to it • Ex20: Try to change the value of an element by the following commands: • mat = [1 3 2; 2 3 4; 7 3 2; 1 4 2] • mat(2,2) = 999

  17. Element-by-element operations • “Element-by-element“ operations are performed on two vectors or matrices of the same size to get the result of the same size • For example, "element-by-element multiplication" of two vectors [1 2 3] and [4 5 6] would give you [4 10 18]. • The element-by-element operators in MATLAB are as follows: • element-by-element multiplication: ".*" • element-by-element division: "./" • element-by-element addition: "+" • element-by-element subtraction: "-" • element-by-element exponentiation: ".^" • Ex21: Try the following operations (which of these works?) • a=[1 2 3] • b=[4 5 6] • c=[6; 7; 8] • d=[6; 7; 8] • a.*b • a.*c • c.*d • c.^d

  18. Element-by-element operations • An additional note about element-by-element operators is that you can use them with scalars and vectors together: • Ex22: Try the following operation: • a = [1 2 3 4 5 6] • b = a .* 2 • You can similarly use ".^", "+", and "-" with a vector and scalar. • Ex23: Try some examples: • c = a .^ 2 • d = a + 2 • e = a – 2 • The reason that element-by-element multiplication and exponentiation operators have "." appended to the front of them, while the element-by-element addition and subtraction operators do not, is that there are other kinds of multiplication, division, and exponentiation operators (denoted by "*" , "/"and "^“) for matrices, which are not element-by-element

  19. Matrix operations • Element-by-element operations allow us to compute things on an element-by-element basis, but matrix operations allow us to perform matrix-based computation. • For example, the multiplication of two matrices, represented by "*", performs a dot product of the two matrices. What the dot product does is that it first multiplies the corresponding elements (i.e., same position elements) of the two vectors, similar to what element-by-element multiplication does, and then adds up all the results of these multiplications to get a single, final number as the answer. • Ex24: Try the following matrix multipilication: • a = [1 2 3] • b = [4 ; 5 ; 6] • a * b • To get the answer "32", what MATLAB first performs the multiplications of the corresponding elements of the two vectors: "1*4 = 4", "2*5=10", and "3*6=18". Then, to get the final answer of "32", MATLAB adds all these multiplications together: "4+10+18=32". • The length of vectors and the size of matrices can be found by length and size functions: • Ex25:Try the following examples: • a = [1 2 3] • length(a) • mat = [1 3 2; 2 3 4; 7 3 2; 1 4 2] • size(mat)

  20. Plotting • The most basic plotting command in MATLAB is the plot command. The plot command, when called with two same-sized vectors X and Y, makes a two-dimensional line plot for each point in X and its corresponding point in Y. In other words, it will draw points at (X(1),Y(1)), (X(2),Y(2)), (X(3),Y(3)), etc., and then connect all these points together with lines. • Ex26: Try a very simple example to illustrate what the plot command does: • simple_x_points = [1 2 3 4 5] • simple_y_points = [25 0 20 5 15] • plot(simple_x_points, simple_y_points); • The ordering of the vectors in the plot command is important • Ex27: Try the reversed order for the previous simple example: • plot(simple_y_points, simple_x_points);

  21. Plotting • To add text to a plot, you need to keep the figure window open (i.e., type the commands in the MATLAB command window while the figure window is still open). • The xlabel/ylabel command prints out a text string describing the x-axis/y-axis; The title command prints out a title for your plot. Typing "grid on" at the command prompt, the grid lines will be added to the open figure window (typing "grid off" will get rid of the grid lines). • Ex28: Try to use these commands on the previous plot: • simple_x_points = [1 2 3 4 5] • simple_y_points = [25 0 20 5 15] • plot(simple_x_points, simple_y_points); • xlabel('this is text describing the x-axis'); • ylabel('this is text describing the y-axis'); • title('this is text giving a title for the graph'); • grid on;

  22. Plotting a parabola • Ex29: Let's look at a more practical example of plotting. First you need to create a vector of regularly spaced points and a vector of function values at those points for some function. Do this for the function "y = x^2" (i.e., a parabola) for x values between -5 and 5 and with regular spacing of .1: • x_points = [-5 : .1 : 5]; • y_points = x_points .^ 2; • % Then plot the x_points against the y_points, and get the familiar plot of a parabola • plot(x_points,y_points); • xlabel('x-axis'); ylabel('y-axis'); title('A Parabola'); • grid on • Note: The result is very smooth: you can't really see any of the individual line segments like you could for the simple example previously. That is because the points are so close together (at regular spacings of .1) --- MATLAB is still drawing line segments between the points, but your eye just can't see them because they are so small, and so the result seems to be a smooth curve.

  23. Multiple plots • Using the hold command, you can add multiple plots in the same figure window, to compare the plots for example. (Normally, when you type a plot command, any previous figure window is simply erased, and replaced by the results of the new plot.) • If you type "hold on" at the command prompt, all line plots created after that will be superimposed in the same figure window and axes. Like wise the command "hold off" will stop this behavior, and revert to the default (i.e., new plot will replace the previous plot). • Ex30: Try the following example of how to plot several different exponential functions in the same axes (you need to define the points on x-axis only once): • x_points = [-10 : .05 : 10]; • plot(x_points, exp(x_points)); • grid on • hold on • plot(x_points, exp(.95 .* x_points)); • plot(x_points, exp(.85 .* x_points)); • plot(x_points, exp(.75 .* x_points)); • xlabel('x-axis'); ylabel('y-axis'); • title('Comparing Exponential Functions');

  24. Subplots • In order to have multiple plots in the same window, but each in a separate part of the window (i.e., each with their own axes), you use the subplot command. If you type subplot(M,N,P) at the command prompt, MATLAB will divide the plot window into a bunch of rectangles --- there will be M rows and N columns of rectangles --- and MATLAB will place the result of the next "plot" command in the Pth rectangle (where the first rectangle is in the upper left). • Ex31: Try this example of a line plot, a parabola, an exponential, and the absolute value function into four rectangles in the same figure window • x_points = [-10 : .05 : 10]; • line = 5 .* x_points; • parabola = x_points .^ 2; • exponential = exp(x_points); • absolute_value = abs(x_points); • subplot(2,2,1);plot(x_points,line); • title('Here is the line'); • subplot(2,2,2);plot(x_points,parabola); • title('Here is the parabola'); • subplot(2,2,3);plot(x_points,exponential); • title('Here is the exponential'); • subplot(2,2,4);plot(x_points,absolute_value); • title('Here is the absolute value');

  25. Line Plots in Three Dimensions • MATLAB cover two different kinds of three-dimensional plots you can do in MATLAB, 1) three-dimensional line plots and 2) surface mesh plots. • The three-dimensional line plots are analagous to the two-dimensional line plots created with the plot command. The only difference is that the command has a "3" added to it, plot3, and that it requires an extra input, Z, for the third dimension. • Ex32: A simple example of using the plot3 command, and the resulting output figure window (notice that you can also here use hold and subplot in the same way too): • X = [10 20 30 40]; • Y = [10 20 30 40]; • Z = [0 210 70 500]; • plot3(X,Y,Z); grid on; • xlabel('x-axis'); ylabel('y-axis'); zlabel('z-axis'); • title('Pretty simple');

  26. Three-Dimensional Surface Mesh Plots • The mesh and meshgrid commands can be used to create surface mesh plots, which show the surface of three-dimensional functions, such as "z = x^2 + y^2" • The way it works is that: • Generate a grid of points in the xy-plane using the meshgrid command • Evaluate the three-dimensional function at these points • Create the surface plot with the mesh command • Ex33: Try to generate the meshgrid and generate the surface mesh plot : • x_points = [-10 : 1 : 10]; • y_points = [-10 : 4 : 10]; • [X, Y] = meshgrid(x_points,y_points); • Z = X.^2 + Y.^2; • mesh(X,Y,Z); • xlabel('x-axis'); • ylabel('y-axis'); • zlabel('z-axis');

  27. MATLAB scripts • A MATLAB script is an ASCII text file that contains a sequence of MATLAB commands • the commands contained in a script file can be run, in order, in the MATLAB command window simply by typing the name of the file at the command prompt • Any text editor, such as Microsoft Windows Notepad, or wordprocessor, such as Microsoft Word, can used to create scripts, but the scripts must always be saved as simple text documents (i.e., in the "Save As" dialogue box, choose "Text Document" or its equivalent for "Save as type:"). • It is easiest to create scripts using MATLAB's built-in text editor, which automatically just saves files as ASCII text files for you. • When naming script files, you need to append the suffix ".m" to the filename, for example "my_script.m". • Scripts in MATLAB are also called "M-files" because of this, and the ".m" suffix tells MATLAB that the file is associated with MATLAB.

  28. Creating MATLAB script • Ex34: Create a simple script that calculates the average of five numbers that are stored in variables. Start with typing “edit average_script.m” after the command prompt. Then add the following contents of the script file "average_script.m" in the MATLAB's built-in text editor: • % a simple MATLAB m-file to calculate the average of 5 numbers. • % first define variables for the 5 numbers: • a = 5; • b = 10; • c = 15; • d = 20; • e = 25; • % now calculate the average of these and print it out: • five_number_average = (a + b + c + d + e) / 5; • five_number_average • NOTE! Save the above script for the later use! • The text in green (i.e., the lines starting with % --- all comment lines must start with %) are comments.

  29. Running MATLAB script • If you saved the above script "average_script.m" into the present working directory, then it can be run simply by typing average_script at the MATLAB command prompt. • Ex35: Try to run it using the following sequence of commands in the command prompt: • clear • whos • pwd • dir • average_script • whos

  30. Saving variables 1 • The save command can be used to save all or only some of your variables into a MATLAB data file type called MAT-file • If you want to choose the name of the file yourself, you can type save followed by the filename you want to use. • MATLAB will then save all currently defined variables in a file named with the name you chose followed by the suffix ".mat" (for example, if you chose the name my_variables MATLAB would save as "my_variables.mat" in your present working directory). • Before saving you should change your present working directory to one of your own directories (such as some directory on your floppy diskette), or specify the complete path to where you want MATLAB to save your variables (for example "a:\my_variables\my_vars"). • Ex36: Try this example of using save: • clear • who • cd c:\my_variables %(replace this with your own folder) • pwd % present working directory • a = 10; • b = 20; • c = 30; • d = sqrt((a + b + c)/pi) • who • save my_chosen_filename %(replace this with your own filename) • dir • clear • who

  31. Saving variables 2 • The above use of the save command saved all the MATLAB workspace defined variables. If you just want to save some of your variables, you simply list the variables you want to save after typing, save and the filename. • Ex37: Try to save only the variables a and c: • clear • who • a = 10; • b = 20; • c = 30; • who • pwd • save some_of_my_variables a c %(replace this with your own filename) • dir • clear • who

  32. Loading variables 1 • The load command is used for loading variables back in later to use them again. Typing load followed by a filename (without the ".mat" suffix) will search the MATLAB path (refer to the next lesson regarding the MATLAB path) for the file, "filename.mat", and load all the variables saved in that file (for example, typing load my_vars would cause MATLAB to search for "my_vars.mat" and load the variables saved in it). • Ex38: Try this example of loading variables back into MATLAB • clear • who • cd c:\my_variables %(replace this with your own folder) • dir • load my_chosen_filename %(replace this with your own filename) • who • a • clear • who • load some_of_my_variables %(replace this with your own filename) • who • c

  33. Loading variables 2 • You can also choose to load in only some of the variables that are saved in a MATLAB data file (MAT-file). To load only some of the variables saved in a file back into MATLAB, just type the names of the variables you want loaded back in after typing load and the filename (without ".mat") at the command prompt. • Ex39: Assuming that variables "a", "ans", "b", "c", and "d" are all saved in a file, you can use the load command to load only "a" and "c" back in: • who • dir • whos -file my_chosen_filename %(replace this with your own filename) • load my_chosen_filename a c %(replace this with your own filename) • who • a

  34. Working with Files, Directories and Paths • In general, files are managed, organized, and accessed in MATLAB in the same way as in Microsoft Windows, that is, in a hierarchical file system. • How MATLAB Finds Files? • MATLAB always look inside your present working directory (type pwd at the MATLAB command prompt to see your present working directory) • If the file is not located in the present working directory MATLAB will also search in other directories that are stored in the MATLAB path (The present working directory can also be thought of as part of the MATLAB path) • Ex40: To print out the current MATLAB path type, matlabpath or path, at the command prompt: • If you want to store your MATLAB files in some directory that does not exist in the matlabpath, add the complete path to your directory to the MATLAB path. • Ex41: There are two ways you can append your own paths to the MATLAB path: • use the addpath command - type addpath followed by the complete path to your directory • use the path tool of MATLAB - type pathtool at the command prompt, or select File->Set Path… • addpath a:\my_stuff\letters • matlabpath

  35. Useful functions • pwd - present working directory • dir, or ls - List directory • what - List MATLAB-specific files in directory • cd - Change current working directory • path, or matlabpath - List the MATLAB search path • addpath - Add directory to search path • pathtool - Invoke the path tool interface • help general - List of general MATLAB commands

  36. Exercises… • Download the well-known Iris data to your working directory from http://www.ics.uci.edu/~mlearn/MLSummary.html • Import the data into MATLAB by choosing from menu: File->Import data->… • Perform some explorative DM for the Iris data set. • Make a global summarization for the Iris data (for example, compute the mean, median, variance and range of the variables) • Explore data by plotting 2-dimensional scatter plots for each pair of variables (e.g., plotmatrix) • Find the two most correlating variables in the data (corrcoef) • Plot histogram (using, for example, 10 bins) for each variable (hist) • Compute attribute means and medians for each class • Compute the variance of all the variables (var) • Compute the covariance matrix of the whole data (cov) • Construct histograms of 10 bins for each Iris variable • Make 2-dimensional scatter plots for each pair of variables on Iris data. Use different markers with different colors for different classes Use help commands and documentation at http://www.mathworks.com/access/helpdesk/help/techdoc/matlab.html!!!

  37. Task 1 • Load the Iris data set from UCI repository http://www.ics.uci.edu/~mlearn/MLSummary.html • A short description of the data is found from the lecture slides, Tan et al. Chapter 3: Exploring data, slide number 4. • Expect that you do not know the class names, the number of classes etc. of the Iris data set. What you know is that you have some data about flowers and the attribute names. Then, ”without prior assumptions and knowledge”, analyse the data set using the available (or self-implemented) explorative and summarizing MATLAB tools. Document and explain all you can learn from the data by exploring. The documentation should contain figures and interpretation of useful and interesting visual views (different plots, colors, histograms,... see the techniques in the lecture slides Tan et al. Chapter 3: Exploring data). For example: • Can you determine the number of classes by exploring (assumed to be unknown)? How? • Behavior of attributes (correlations, scatters (variance/MAD/covariance), ranges,… ). What kind of preprocessing might be needed? Are there redundant attributes? Outliers? And so on.. • …other findings? • Explain what you can learn about data (that represents the three flower types). Describe carefully your findings and compare your results with respect to the known class labels of the flowers. Did you find the classes from the data?

  38. Task 2 • Load the synthetic cluster data set from the file clusterdata1.data. The data set contains a set of generated 7-dimensional clusters. Try to find the best possible prototypes for the clusters using the MATLAB implementation of the dckmeans.m. Before this, exploring and preprocessing the data set (see Task 1), try to find all possible information for the clustering step (for example, data may contain errors, noise, redundancies, moreover, you must determine the number of clusters and so on). You may also modify the dckmeans code (e.g., replace the sample mean estimate wih a more robust one such as median). Remember that K-means is a local seach method (results depend on the initial prototypes, you may find the good ones). You can also utilize the PCA code in exploration and/or clustering. • Document and explain all the steps and all the significant facts you can learn from the data by exploring, summarization, visualization, clustering etc. The documentation should contain plots, histograms, etc. with interpretations. If you do some prepocessing, transformations, scaling for data, report and explain them carefully. The most important thing is to document the final clustering results (prototypes and clusterlabels) that is your refinement for the data set. • Remember not to only report the findings, but also how did you proceed (your mining process)! Exploit frequently the help commands and documentation at http://www.mathworks.com/access/helpdesk/help/techdoc/matlab.html!!!

More Related