1 / 7

Administrative Lab sections meeting Thurs/Friday "Working with datasets, part 2"

CS 109 C/C++ Programming for Engineers with MATLAB. Administrative Lab sections meeting Thurs/Friday "Working with datasets, part 2" Final project handout now available Part 1: submit dataset Part 2: submit final project Topic for Today? working with datasets….

wayde
Download Presentation

Administrative Lab sections meeting Thurs/Friday "Working with datasets, part 2"

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 109 C/C++ Programming for Engineers with MATLAB • Administrative • Lab sections meeting Thurs/Friday • "Working with datasets, part 2" • Final project handout now available • Part 1: submit dataset • Part 2: submit final project • Topic for Today? • working with datasets… CS 109 -- 23 April 2014

  2. Reading files in MATLAB: • Does file contain just numbers? load() • Does file contain mixed data (strings, numbers)? dataset() • e.g. spreadsheet-like data with names and values • Reading an Excel spreadsheet?xlsread() • Reading some other file format? • image files? imread( ) • other format? google to see if MATLAB supports… • use low-level "C" functions: fopen( ), fscanf( ), fclose( ) >> imread('cake.ppm'); CS 109 -- 23 April 2014

  3. Dataset( ) function • can read files with different types • can read files with differing numbers of values per line header row Name, Ex1, Ex2, Ex3 Tejas, 100, 98, 100 Venky, 88, 82 Hong, 100, 100, 100 Kaiser, 60, 59, 61 . . missing data data rows dataset('File', 'data.txt', 'Delimiter', ',') data.txt Name Ex1 Ex2 Ex3 CS 109 -- 23 April 2014

  4. Datasets are not matrices… • Dataset( ) function yields a “dataset”, not a matrix • Some advantages — e.g. use column names! But beware ( )… Name, Ex1, Ex2, Ex3 Tejas, 100, 98, 100 Venky, 88, 82 Hong, 100, 100, 100 Kaiser, 60, 59, 61 . . Using ( ) with a dataset yields another dataset >> data = dataset('File', 'data.txt', 'Delimiter', ','); >> exam1 = data(:, 2); % exam1 is column 2: >> mean(exam1) >> whos('exam1') … MATLAB reports exam1 is a dataset … >> exam1 = data.Ex1% column name yields data as a vector: >> whos('exam1') … MATLAB reports exam1 is an 8x1 double (i.e. column vector) … >> mean(exam1) ans = 88.1250 X Error: undefined function 'sum' for input arguments of type 'dataset' Ö

  5. What if you need data from multiple columns? Subset? Name, Ex1, Ex2, Ex3 Tejas, 100, 98, 100 Venky, 88, 82 Hong, 100, 100, 100 Kaiser, 60, 59, 61 . . >> data = dataset('File', 'data.txt', 'Delimiter', ','); >> exams = data(:, 2:4); % ==> dataset >> exams = double(exams); % ==> matrix >> mean(exams) ans= 88.1250 89.5000 NaN >> where = isnan(exams); % logical index of NaN locations: >> exams(where) = 0; % set every NaN to 0: >> mean(exams) ans = 88.1250 89.5000 80.1250 >> nanmean(exams) ans = 88.1250 89.5000 91.5714 CS 109 -- 23 April 2014

  6. Searching datasets works like matrices… • Example: • Output names of students who have failed an exam… >> data = dataset('File', 'data.txt', 'Delimiter', ','); >> where = data.Ex1 < 60 | data.Ex2 < 60 | data.Ex3 < 60; >> students = data(where, 'Name'); % copy from Name column: >> [rows, cols] = size(students); % how many rows matched? >> fori=1:rows fprintf('This student failed an exam: %s\n', students{i, 1}); end This student failed an exam: Kaiser This student failed an exam: Das Name Use { } when accessing a single element of a dataset…

  7. In-class exercise… • Output name of state with largest total rainfall in 2013? Hint #1: google about sum( ) function, it can sum columns or rows… id,name 1,Alabama 2,Arizona 3,Arkansas . . . 50,Wyoming id,jan,feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec 1,3.18,2.57,2.22,7.95,6.47,3.12,2.19,2.52,1.93,5.69,2.94,1.54 2,2.16,1.39,2.17,2.63,4.32,1.07,3.78,6.06,1.61,3.21,1.04,2.09 3,0.71,3.46,2.32,5.73,5.32,7.41,5.44,3.94,3.89,2.24,3.65,2.51 . . . 50,3.88,0.60,4.72,3.87,2.26,5.75,2.12,1.36,3.20,2.31,1.28,2.04 >> rainfall = dataset('File', 'rain2013.txt', 'Delimiter', ','); >> states = dataset('File', 'states.txt', 'Delimiter', ','); rain2013.txt states.txt CS 109 -- 23 April 2014

More Related