Lab 4
Download
1 / 21

Lab 4 - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

Lab 4. MBAC 611. Portions of this lab are based on data & notes from Hadley Wickham (http://stat405.had.co.nz/). Lab Preparation. Create a lab4 folder in your private network folder. Download the data.zip file from Moodle and save it to your lab4 folder.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Lab 4' - keagan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Lab 4

Lab 4

MBAC 611

Portions of this lab are based on data & notes from Hadley Wickham (http://stat405.had.co.nz/)


Lab preparation
Lab Preparation

Create a lab4 folder in your private network folder.

Download the data.zip file from Moodle and save it to your lab4 folder.

From within Windows navigate to your lab4 folder.

Right-click on the data.zip folder and select “Extract All”. This should create a data folder that contains the datasets for this lab.


Data cleaning
Data Cleaning

Frequently the data sets you receive will not be in an ideal format for analysis.

You may need reformat the data and fix errors.

In this lab we will look at some methods to find and fix problems in your dataset.


Setting your working directory
Setting Your Working Directory

Start Mathematica

Use the SetDirectory[] function to the data folder in your lab4 folder.

To make sure you are in the right folder use the FileNames[] function. You should see the following:


Examining the files
Examining the Files

Execute the following function:

FilePrint["test1.csv"]

This function displays the content of the specified file (test1.csv). You should see the following:


This file is a standard CSV file. Note that the first line contains the header.

According to the header there are five fields.

This means we expect each record (row) to contain five elements - each separated by a comma.

However, we should verify that the data indeed conforms to this expected structure.


Import options
Import Options contains the header.

Mathematica allows you to customize how files are imported.

One of these options allows you to automatically remove the header when importing data.

Execute the following function:

Import["test1.csv", "HeaderLines" ->1]


You should see the following output: contains the header.

The "HeaderLines" ->1 option tells Mathematica to ignore the first line as it contains header information – not data.

The number following the -> indicates the number of rows that should be skipped when reading in the file.

Assignment 1

Assign the result of the previous Import function to a variable named test1.


Each record is stored in its own list. contains the header.

All imported records are members of one big list – a list we just assigned to a variable named test1. Therefore the number of elements in test1 indicates the number of records.

As you will recall, we can determine the number of elements in a list using the Length function.

Execute the following function: Length[test1].

The result should be 10.


Checking records
Checking Records contains the header.

We should check that every record has the correct number of attributes (field values).

In this case the correct number is five.

Lets check the first record – enter the following expression: Length[test1[[1]]]

The result should be 5.


First element of the list contains the header.

List name

We would like to do this check for each record.

One way to do this is using the Do function we used in our previous lab.

As you may recall the Do function has the following syntax:

Do[body, {i, min_value, max_value}]


Defining a function
Defining A Function contains the header.

We will create our own function that will display an error message if a given row doesn’t have a specified number of attributes.

Enter the following expression:

rowCheck[x_]:=If[Length[test1[[x]]]!=5,Print["Record ",x]]


Define a new function named contains the header.rowCheck. It takes one argument named x_.

Use the If function to test the Length of the specified record. If the record length does not equal (!=) to 5 then display (Print) the number of the record (x).

We can try our new function on the first row of test1.

Enter the following expression:

rowCheck[1]

You won’t see any output as the first record’s length is five.

The same will be displayed if we try it on any of the records in this dataset.


Assignment #2 contains the header.

Scroll back to where you defined the function and change the number 5 to 4 and re-evaluate the function definition (press shift-enter after making the change).

You should see the following:

Now execute the following expression:

rowCheck[1]

You should see the following output:

This perfectly matches the arguments of our Print function - so it looks like our function is working.


Do loop
Do Loop contains the header.

We would like to execute the rowCheck function against every record in test1.

We can accomplish this with a Do Loop.

Assignment #3

Undo the change we made to the rowCheck function (change the 4 back to 5). Remember to re-evaluate the function.


  • Enter the following expression: contains the header.

  • Do[rowCheck[i],{i,1,Length[test1]}]

  • You won’t see any output from this function execution because all rows have five attributes.

  • Assignment #4

  • Import the CSV file named test2.csv.

  • Assign the records list to the variable named test2.

  • Make sure there are 10 records in the file.

  • Modify the rowCheck function to check elements of test2.

  • Using the Do function check that all records contain five attributes – records 3 and 6 do not.

  • Display the third and sixth records (hint: test2[[3]] to view 3rd record).


Dat files
DAT Files contains the header.

Sometimes you have a file that has an unusual field or record delimiter. In this case we may simply refer to the file as a “dat” file – a generic file with tabular data.

Mathematica allows you to customize how this type of file is read.

More info on this type of file and its options can be found at the following URL:http://reference.wolfram.com/mathematica/ref/format/Table.html


We will take a look at a “. contains the header.dat” file that uses the “|” (vertical bar) as a field separator.

Enter the following expression:

FilePrint["test3.dat"]

You should see the following output:

The first record contains the header.

Note the | separator between fields.


Importing dat files
Importing contains the header.Dat Files

Enter the following expression:

Import["test3.dat","FieldSeparators“->"|"]

The following output should appear:


  • Assignment #5 contains the header.

  • Modify the previous Import statement such that the header line is ignored (not imported into the data list)

  • Assign the resulting list to the variable test3.

  • Assignment #6

  • Import the file test4.csv such that the header line is ignored (not imported into the data list)

  • Assign the resulting list to the variable test4.

  • Rewrite the rowCheck function such that the record number will be displayed if the 3rd field of the record specified by x_ is greater than 20.

  • Using the Do function, and the above rowCheck function, check that the third field of every record is less than 20. The record number of any record violating this rule should be displayed.

  • The output should be


Submit lab 4
Submit Lab 4 contains the header.

Save your lab as a notebook file. Remember to save the file to your lab4 folder in your private network folder.

Submit the notebook to the lab4 submission link in Moodle.


ad