Download
mbac 311 lab 1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
MBAC 311 Lab 1 PowerPoint Presentation
Download Presentation
MBAC 311 Lab 1

MBAC 311 Lab 1

126 Views Download Presentation
Download Presentation

MBAC 311 Lab 1

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. MBAC 311Lab 1 NYC MTA Data File

  2. Lab Preparation In this class you will need to save files to your private network folder (directory). Every time you login to a computer on campus you can access this folder. The name of the folder is your username (e.g. smithj01.student). Double click on the icon and then double click on your private network folder. Within this folder create a folder named mbac311. This is where you will store your class files.

  3. Data Files In this lab we will look at data stored in a text file – sometimes called a “flat-file”. Text files are regular files that only contain letters, number, punctuation and white space (space, tab, and newline) Text files are considered “portable” as they can be read by many different software packages and computers.

  4. Start your favorite web browser • In the browser enter the following URL: http://nycopendata.socrata.com/ • The above URL specifies a web site that stores data for the NYC Open Data Project. • Many NYC government agencies place their data on this so site so that it may be analyzed or used in the creation of software • Software is another name for a computer program or application (app)

  5. Scroll down the web page and click on the MTA Data Hyperlink

  6. This page contains some general visitor information about the web page. Click the HTML Button on the right-hand side of the web page.

  7. The types of data the MTA publishes is listed on this page. Click on the Fare Data hyperlink.

  8. Brief description of the data accessible from this page. Click on the Fare Type Description hyperlink.

  9. Field abbreviations used in used in the file This field (column) TCMC is an abbreviation for TRANSIT CHECK METROCARD

  10. Click on the web brower’s back button. You should now see a web page similar to the following:

  11. Viewing The Data File Click on the Saturday, June 30, 2012 hyperlink

  12. Comma Separated Values (CSV) File Field Names The start of the R005 record. Fields values are separated by commas. Records are separated by newlines (Enter key)

  13. Click on the web brower’s back button. You should now see a web page similar to the following:

  14. Saving The Data Files • Create a new folder (directory) named lab1 within your mbac311 folder. • We will save the Saturday, June 30, 2012 data to the lab1 folder. Right-Click on the Saturday, June 30, 2012 link. Select Save link as… from the pop-up menu.

  15. Default file name is fares_120630. Notice that the number is in year, month, day format (2012, 06, 30) Browser assumes the data is a CSV file that can be read by Excel (and many other programs). Make sure to specify lab1 as your save folder. Then click

  16. Excel 2010 • Select the fares_120630 file then select the button Start Excel 2010 Click the tab. Click the option. In the dialog box select your lab1 folder. Make sure is specified.

  17. CSV Files In Excel When reading CSV files Excel uses the following rules: Records are separated by a newline (Enter key on the keyboard) All fields are separated by a common delimiter – typically a comma or tab character (Tab key on the keyboard).

  18. “Fares_120630” Data File • Excel has now imported this data into Excel. • CSV format is not the native format that Excel uses to store a workbook. • Before we make any changes lets save this workbook in Excel’s native file format (xlsx format). • Click the tab. • Click the option.

  19. In the top part of the “Save As” dialog box select your lab1 folder Name the File Name tolab1_excel Change the Save as type to Then click the button We have now saved the file in Excel native format (the file name will have a .xlsxextension).

  20. Excel In this class we will not be using Excel for our data analysis. Excel was not meant to store and analyze very large data sets (at least a million records). However, lets use Excel to examine this small data set. Notice that the first two rows are really just comments about the data.

  21. Header The third row is frequently referred to as the header. The header names the data that is stored in each column. The header really identifies all the field names. Each row contains information about a particular subway station. A row that contains data is usually referred to as a record. Each column is called a field.

  22. The name of this field is 30-D UNL The name of this field is REMOTE Each row is a record. Each column is a field.

  23. Questions Start MS Word Save the empty MS Word file to your lab1 folder under the name lab1_ques. Switch to the Excel spreadsheet. As you will recall, you can move around the spreadsheet using the arrow keys. Answer the questions on the next slide. Place your answers in the lab1_ques file. Number each answer.

  24. Question 1) How many records are in the file? Hint: remember the comments, etc. Question 2) How many fields are in the file? Hint: 26 letters in the alphabet. Question 3) What is the name of the fourth field? Question 4) Look at the sixth record. What is the station field value for this record? Remember to save the WORD file after answering the questions.

  25. Notepad Close Excel. Start the Notepad program (located under Accessories). Notepad is a text editor. It allows you to read(open) and write(save) text files. It is NOT meant to produce professional looking documents – one would use MS Word or similar software for such documents.

  26. Examining The Data Files In Notepad select Open from the File pull-down menu. Navigate to your private lab1 folder Select in the lower right-hand selection box.

  27. Select the file then click . The CSV file should now be displayed in Notepad. You can use the scroll bars at the bottom and sides of the Notepad window to examine the data. Note that each record(row) is on one line by itself. Each field (column) value separated from the next via a comma. Notice that the station names (second field) contain extra spaces. This extra space is known as white space and is usually ignored by data analysis programs. White space is sometimes used to make data easier to read.

  28. Excel Files In Notepad select New from the File pull-down menu. This will close the CSV file. Select Open from the File pull-down menu. If necessary navigate to your lab1 folder. Select in the lower right-hand selection box.

  29. Select the file then click . As you will recall lab1_excel was created when we used EXCEL to save our data. The contents of this file look very different from the CSV file yet they essentially hold the same data. This file is unreadable. This file is called a binary file. It contains symbols that one does not see in a regular text file.

  30. Binary Files Binary files are meant to be read by only software that understands how to interpret the file. This is an example of a proprietary file format. The designers of the software (EXCEL) created a unique way of storing data that may not be known to other software. Proprietary file formats are sometimes created to allow data to be store efficiently or securely. However, binary files are not portable. They typically cannot be easily transported to different types of computers.

  31. Lab Submission Save your lab1_ques MS Word file. Close MS Word. Submit your lab1_ques to Moodle.