COSC 1306—COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS
950 likes | 1.1k Views
COSC 1306—COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS. Jehan-François Pâris jfparis@uh.edu. Module Overview. We will learn how to read, create and modify files Pay special attention to pickled files They are very easy to use!. The file system.
COSC 1306—COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS
E N D
Presentation Transcript
COSC 1306—COMPUTER SCIENCE AND PROGRAMMINGPYTHON FUNCTIONS Jehan-François Pâris jfparis@uh.edu
Module Overview We will learn how to read, create and modify files Pay special attention to pickled files They are very easy to use!
The file system • Provides long term storage of information. • Will store data in stable storage (disk) • Cannot be RAM because: • Dynamic RAM loses its contents when powered off • Static RAMis too expensive • System crashes can corrupt contents of the main memory
Overall organization • Data managed by the file system are grouped in user-defined data sets called files • The file system must provide a mechanism for naming these data • Each file system has its own set of conventions • All modern operating systems use a hierarchical directory structure
Windows solution • Each device and each disk partition is identified by a letter • A: and B: were used by the floppy drives • C: is the first disk partition of the hard drive • If hard drive has no other disk partition,D: denotes the DVD drive • Each device and each disk partition has its own hierarchy of folders
Second diskD: Windows solution Flash driveF: C: Windows Users Program Files
UNIX/LINUX organization • Each device and disk partition has its own directory tree • Disk partitions are glued together through theoperation to form a single tree • Typical user does not know where her files are stored
UNIX/LINUX organization Root partition / Other partition usr The magicmount bin Second partition can be accessed as /usr
Mac OS organization • Similar to Windows • Disk partitions are not merged • Represented by separate icons on the desktop
Accessing a file (I) • Your Python programs are stored in a folder AKA directory • On my home PC it is C:\Users\Jehan-Francois Paris\Documents\Courses\1306\Python • All files in that directory can be directly accessed through their names • "myfile.txt"
Accessing a file (II) • Files in subdirectories can be accessed by specifying first the subdirectory • Windows style: • "test\\sample.txt" • Note the double backslash • Linux/Unix/Mac OS X style: • "test/sample.txt" • Generally works for Windows
Why the double backslash? • The backslash is an escape character in Python • Combines with its successor to represent non-printable characters • ‘\n’ represents a newline • ‘\t’ represents a tab • Must use ‘\\’ to represent a plain backslash
Accessing a file (III) • For other files, must use full pathname • Windows Style: • "C:\\Users\\Jehan-Francois Paris\\Documents\\Courses\\1306\\Python\\myfile.txt"
Accessing file contents • Two step process: • First we open the file • Then we access its contents • Read • Write • When we are done, we close the file.
What happens at open() time? • The system verifies • That you are an authorized user • That you have the right permission • Read permission • Write permission • Execute permission exists but doesn’t apply and returns a file handle /file descriptor
The file handle • Gives the user • Direct access to the file • No directory lookups • Authority to execute the file operations whose permissions have been requested
Python open() • open(name, mode = ‘r’, buffering = -1)where • name is name of file • mode is permission requested • Default is ‘r’ for read only • buffering specifies thebuffer size • Use system default value (code -1)
The modes • Can request • ‘r’ for read-only • ‘w’ for write-only • Always overwrites the file • ‘a’ for append • Writes at the end • ‘r+’ or ‘a+’ for updating (read + write/append)
Examples • f1 = open("myfile.txt") same asf1 = open("myfile.txt", "r") • f2 = open("test\\sample.txt", "r") • f3 = open("test/sample.txt", "r") • f4 = open("C:\\Users\\Jehan-Francois Paris\\Documents\\Courses\\1306\\Python\\myfile.txt")
Reading a file • Three ways: • Global reads • Line by line • Pickled files
Global reads • fh.read() • Returns whole contents of file specified by file handle fh • File contents are stored in a single string that might be very large
Example • f2 = open("test\\sample.txt", "r") bigstring = f2.read()print(bigstring)f2.close() # not required
Output of example • To be or not to be that is the questionNow is the winter of our discontent • Exact contents of file ‘test\sample.txt’
Line-by-line reads • for line in fh : # do not forget the column #anything you wantfh.close() # not required
Example • f3 = open("test/sample.txt", "r") for line in f3 : # do not forget the column print(line)f3.close() # not required
Output • To be or not to be that is the questionNow is the winter of our discontent • With one or more extra blank lines
Why? • Each line ends with an end-of-line marker • print(…)adds an extra end-of-line
Trying to remove blank lines • print('----------------------------------------------------')f5 = open("test/sample.txt", "r") for line in f5 : # do not forget the column print(line[:-1]) # remove last charf5.close() # not requiredprint('-----------------------------------------------------')
The output • ----------------------------------------------------To be or not to be that is the questionNow is the winter of our disconten----------------------------------------------------- • The last line did not end with an EOL!
A smarter solution (I) • Only remove the last character if it is an EOL • if line[-1] == ‘\n’ : print(line[:-1]else print line
A smarter solution (II) • print('----------------------------------------------------')fh = open("test/sample.txt", "r")for line in fh : # do not forget the column if line[-1] == '\n' : print(line[:-1]) # remove last char else : print(line)print('-----------------------------------------------------')fh.close() # not required
It works! • ----------------------------------------------------To be or not to be that is the questionNow is the winter of our discontent-----------------------------------------------------
Making sense of file contents • Most files contain more than one data item per line • COSC 713-743-3350UHPD 713-743-3333 • Must split lines • mystring.split(sepchar)where sepchar is a separation character • returns a list of items
Splitting strings • >>> text = "Four score and seven years ago">>> text.split()['Four', 'score', 'and', 'seven', 'years', 'ago'] • >>>record ="1,'Baker, Andy', 83, 89, 85">>> record.split(',')[' 1', "'Baker", " Andy'", ' 83', ' 89', ' 85'] Not what we wanted!
Example # how2split.py print('----------------------------------------------------') f5 = open("test/sample.txt", "r") for line in f5 : words = line.split() for xxx in words : print(xxx) f5.close() # not required print('-----------------------------------------------------')
Output • ----------------------------------------------------Tobe…ofourdiscontent-----------------------------------------------------
Other separators (I) • Commas • CSV Excel format • Values are separated by commas • Strings are stored without quotes • Unless they contain a comma • “Doe, Jane”, freshman, 90, 90 • Quotes within strings are doubled
Other separators (II) • Tabs( ‘\t’) • Advantages: • Your fields will appear nicely aligned • Spaces, commas, … are not an issue • Disadvantage: • You do not see them • They look like spaces
Why it is important • When you must pick your file format, you should decide how the data inside the file will be used: • People will read them • Other programs will use them • Will be used by people and machines
An exercise • Converting our output to CSV format • Replacing tabs by commas • Easy • Will use string replace function
First attempt • fh_in = open('grades.txt', 'r') # the 'r' is optionalbuffer = fh_in.read()newbuffer = buffer.replace('\t', ',')fh_out = open('grades0.csv', 'w')fh_out.write(newbuffer)fh_in.close()fh_out.close()print('Done!')
The output • Alice 90 90 90 90 90Bob 85 85 85 85 85Carol 75 75 75 75 75 becomes • Alice,90,90,90,90,90Bob,85,85,85,85,85Carol,75,75,75,75,75
Dealing with commas (I) • Work line by line • For each line • split input into fields using TAB as separator • store fields into a list • Alice 90 90 90 90 90becomes[‘Alice’, ’90’, ’90’, ’90’, ’90’, ’90’]
Dealing with commas (II) • Put within double quotes any entry containing one or more commas • Output list entries separated by commas • ['"Baker, Alice"', 90, 90, 90, 90, 90] becomes"Baker, Alice",90,90,90,90,90
Dealing with commas (III) • Our troubles are not over: • Must store somewhere all lines until we are done • Store them in a list
Dealing with double quotes • Before wrapping items with commas with double quotes replace • All double quotes by pairs of double quotes • 'Aguirre, "Lalo" Eduardo'becomes'Aguirre, ""Lalo"" Eduardo'then'"Aguirre, ""Lalo"" Eduardo"'
General organization (I) • linelist = [ ] • for line in file • itemlist = line.split(…) • linestring = '' # empty string • for each item in itemlist • remove any trailing newline • double all double quotes • if item contains comma, wrap • add to linestring
General organization (II) • for line in file • … • for each item in itemlist • double all double quotes • if item contains comma, wrap • add to linestring • append linestring to stringlist
General organization (III) • for line in file • … • remove last comma of linestring • add newline at end of linestring • append linestring to stringlist • for linestring in in stringline • write linestring into output file
The program (I) • # betterconvert2csv.py""" Convert tab-separated file to csv"""fh = open('grades.txt','r') #input filelinelist = [ ] # global data structurefor line in fh : # outer loop itemlist = line.split('\t') # print(str(itemlist)) # just for debugging linestring = '' # start afresh