npp_ch1_2

npp_ch1_2

Filehandles: • connections to outside world: disk file, hardware device, local process, remote process, “bit bucket” • STDIN, STDOUT, STDERR are opened automatically redirects the STDOUT filehandle inside myprog.pl from the terminal to thebit bucket $ ./myprog.pl > /dev/null the bit-bucket

finding out about a process and its filehandles: • [root@joyous fd]# cd /proc • [root@joyous proc]# ls • 1 11230 1791 2346 27177 33 4611 execdomains mounts • 1052 11236 1792 2395 27180 3353 5 fb mtrr • 1 1240 1820 2410 … • [root@joyous proc]# cd 2410 • [root@joyous 2410]# ls -F • attr/ cmdline environ fd/ maps mounts stat status wchan • auxv cwd@ exe@ loginuid mem root statm task/ • [root@joyous 2410]# cat cmdline • xinetd–stayalive–pidfile/vst/run/xinetd.pid • [root@joyous 2410]# cd fd • [root@joyous fd]# ls -la • total 6 • dr-x------ 2 root root 0 Jan 8 13:25 . • dr-xr-xr-x 3 root root 0 Dec 27 15:22 .. • lr-x------ 1 root root 64 Jan 8 13:26 0 -> /dev/null • lr-x------ 1 root root 64 Jan 8 13:26 1 -> /dev/null • lr-x------ 1 root root 64 Jan 8 13:26 2 -> /dev/null • lr-x------ 1 root root 64 Jan 8 13:26 3 -> pipe:[5522] • l-wx------ 1 root root 64 Jan 8 13:26 4 -> pipe:[5522] • lrwx------ 1 root root 64 Jan 8 13:26 7 -> socket:[5524] char** argv file handles

Examples of STDIN and STDOUT: strings in perl don’t drop the \newline character so chomp() does this (if it exists) $input = <STDIN>; chomp($input); printf STDOUT “You said, \”$input\”\n”; chomp($input = <>); printf “Again you said, \”$input\”\n”; STDIN is the default = returns an lvalue and not, as in C, the value of the expression on the right-hand side of the operator

redirection: • The internal filehandles, STDIN and STDOUT, are disconnected from the the keyboard and terminal and attached to data_in.txt and data_out.txt respectively. STDERR is untouched. • In this case STDERR is also redirected to the same file as STDOUT (filehandle 2 follows filehandle 1). $ myprog.pl < data_in.txt > data_out.txt $ myprog.pl < data_in.txt > data_out.txt 2>&1

IO operations 1: $line = <FILEHANDLE>; # scalar context @lines = <FILEHANDLE>; # list context • <> is context sensitive. This means the function “knows” what kind of variable is waiting for its return value and returns the correct type of value. ignore case while ( <> ) { printf “ found a gnu\n” if /GNU/i; } comparison made against $_ void context; input assigned to $_

IO operations 2: $bytes = read(FILEHANDLE, $buffer, $length [, $offset]); $bytes = sysread(FILEHANDLE, $buffer, $length [, $offset]); • Both read a maximum of $length bytes into $buffer from FILEHANDLE after skipping $offset bytes. • The difference is that read() blocks until exactly $length bytes are read or end-of-file is encountered; sysread() returns immediately after reading at least one byte. So it blocks only if no input at all is available. = actual number of bytes read or $bytes = 0 if end-of-file and no data = undef if error occurred

IO operations 3: filehandle name optional; default STDOUT print blocks until all data is written; writes to buffer no comma $result = print FILEHANDLE $data1, $data2 …; $result = print $data1, $data2 …; $result = printf $format, $data1, $data2 …; $bytes = syswrite( FILEHANDLE, $data [, $length [, $offset]]); read man –S 3 printf man pages to understand formatting writes $length bytes from $data to FILEHANDLE starting at location $offset; returns the number of bytes actually written writes what it can and returns in case syswrite() fails to write everything in a single operation; no buffering $bytes = 0; while ( $bytes < $length ) { $len = syswrite(FILEHANDLE, $data, $length - $bytes, $bytes); $bytes += $len; }

IO operations 4: $previous = select FILEHANDLE; • select() changes the default output filehandle for print() open(FILEOUT, “>myfile”); print “Hello, world!\n” $previous = select FILEOUT; print “Hello, world!\n” select $previous; prints to STDOUT prints to FILEOUT makes STDOUT default again

End of File: • What does EOF mean when reading from a filehandle? • How we see the EOF condition depends on whether we are reading one line at a time or in a byte stream. file: end-of-file keyboard: ^D (Unix), ^Z (Windows) socket: other end closes the socket

EOF Byte Stream: • read() and sysread() return 0 on EOF, undef if error. However you should always test for failure first while (1) { my $bytes = read(STDIN, $buffer, 100); if ( ! defined($bytes) ) { printf “Error: $!\n”; # $! contains error message exit 1; } # die “Error: $!” unless defined($bytes); # same thing last unless $bytes > 0; } same as break in C++ and Java

EOF line-at-a-time: • In line-mode, using <>, EOF and error are the same; both return undef clear $! before series of reads undef $!; while (defined(my $line = <STDIN>) ) { $data.= $line; } die “Error: $!” if defined($!); check if any reads really failed undef $!; while ( my $line = <STDIN> ) { $data .= $line; } if ( defined($!) ) { printf “Error: $!\n”; exit 1; } still need to check $! since the loop may have stopped because of EOF you can drop use of the defined() function since while-loop fails on false and undefined.

eof() function: $eof = eof(FILEHANDLE); • eof() will return true if the next read to FILEHANDLE will return end-of-file.

End-of-Line (\newline, \n) Anarchy: Windows: EOL == CRLF \015\012 Unix: EOL == LF \012 Mac: EOL == CR \015 network: EOL == CRLF \015\012 • In perl, $/ is the current value of \newline (\n) • <> reads until it finds $/ • changing the value of $/ changes behaviour of <> • chomp() tries to drop the value of $/ • \n is the “logical” \newline character. for example, moving lines of text between two Linux servers you still should write each \newline as two characters.

more \n • You can’t define since on Windows this would define a 3-character \newline symbol. You need to define • The Socket and IO::Socket modules define $CRLF and CRLF() as exported globals that return the right thing. $/ = \r\n; $/ = “\015\012”;

more on CRLF • Text mode: Automatic conversion • Binary mode: Automatic conversion could corrupt data unix unix network LF CRLF CRLF LF binmode(FILEHANDLE); disables character translation

Opening files: open(FH, “< myFile”); # opens myFile to read open(FH, “> myFile”); # truncates myFile and opens it to write open(FH, “>> myFile”); # opens myFile to write without truncation open(FH, “+> myFile”); # truncates myFile and opens it for read/write open(FH, “<+ myFile”); # opens myFile for read/write; no truncation filehandle of opened file true/false $result = open(…); open(…) or die “file failed to open”; # check $! for specific error

Closing files: close(FILEHANDLE);

Do you believe in magic? #!/usr/bin/perl # redirect.pl printf "Redirecting STDOUT\n"; open(SAVEOUT, ">&STDOUT"); open(STDOUT,">myfile.dat"); printf STDOUT "STDOUT is redirected\n"; system("date"); open(STDOUT, ">&SAVEOUT"); printf "STDOUT restored\n"; $ ./redirect.pl Redirecting STDOUT STDOUT restored $ cat myfile.dat STDOUT is redirected Mon Jan 9 21:23:45 EST 2006 prints to STDOUT duplicates STDOUT filehandle to write close STDOUT connection to terminal; “redirects” it to myFile prints to STDOUT; ie myFile close new STDOUT connection to myFile; “redirects” it to old STDOUT writes to STDOUT; ie terminal

Alternatives to open(): $result = sysopen(FILEHANDLE, $filename, $mode ); # $result == (true/false); $! gives the reason for failure Modes available to sysopen() O_RDONLY Open read only O_WRONLY Open write only O_RDWR Open read/write O_CREAT Create file if it doesn’t exist O_EXCL (O_EXCL | O_CREAT) creates file if it doesn’t exist but fails if file already exists O_TRUNC If file exists; truncate to zero length O_APPEND Open in “append” mode O_NOCTTY If file is terminal, it won’t the process’s controlling terminal O_NONBLOCK Open file in nonblocking mode O_SYNC Open file in synchronous mode; all writes block until physical write takes place

Buffering and Blocking: • print() to a filehandle usually involves intermediate buffers while syswrite() does not. • processing speed and IO speed are mismatched; hence buffered IO • buffering decouples IO calls from IO activity fast slow os write print() buffer disk program memory

How buffering works: from program write ptr . . . X X X X X X X X X read ptr to device write ptr printf “Hello\n”; . . . H e l l o \n X X X read ptr write ptr opsys slowly writes to device . . . X X X l o \n X X X read ptr

Things to think about: • What is the buffer discipline? • How much “free space” is to be found in the buffer? • Under what circumstances does printf “block”? • Prepare a similar diagram for input and answer the same questions.

Standard IO Buffering • multiple buffers at different layers (write to disk): • disk hardware buffer • IDE controller driver • file system driver (OS) • C library (stdio) • only stdio buffers until it has enough; the other layers try to get rid of any data they receive asap

stdio buffering problem (write): print printf program syswrite() wait until buffer full before write stdio $| = 1; turns off buffering file tcp buffered write buffer IDE IP unbuffered write ether hw write asap Q: Why use a buffer if the write is unbuffered? A: makes layer functionality asynchronous

code snipits for turning off write buffering: my $prev_handle = select(FH); # makes FH new default handle and # saves the old default handle $| = 1; # sets special variable $| to true so writes for default handle # ( currently FH) are not buffered select($prev_handle); # makes old handle the default again. use IO::Handle; FHautoflush(1); ## OO syntax for turning off buffering or

stdio buffering problem (read): unfortunately, reads at least 1 byte so it can “block” read program sysread() waits for exactly the right amount of data stdio blocks until exact number of bytes available or EOF file tcp buffered read buffer IDE IP unbuffered read ether hw reads whatever available How does <STDIN> work in this picture?

sysread() blocking problem: • sysread() blocks if no data available. If non-blocking behaviour is required you must either • use separate read thread • use select() or poll() to determine if data is available • TCP is not record-oriented so data structures need to be built; sysread() is ideal.

Filehandles: • internal names for external entities • names for filehandles are “unadorned”; not scalars • filehandles are stuck in a single package; to move references to filehandles from package to package we need to turn them into a typeglob $fh = *MY_FH; # $fh is a typeglob $fh = \*MY_FH; # $fh is a typeglob reference # passing a typeglob (reference) to a routine &hello_world($fh); # ok, even if hello_world() from another package &hello_world(\*MY_FH); # the author’s favourite style

Filehandles and typeglobs: • filehandles and filehandle typeglobs are interchangeable. my $fh = &get_fh(); sub get_fh() { open(FOO,”<foo.txt”) or die “foo: $!”; return \*FOO; } … printf $fh “hello, world!\n”; Could use a filehandle here too.

fileno(): $X = &fileno(FH); • fileno() returns a file descriptor (0, 1, 2, …) if FH is a valid filehandle; undef otherwise die “not a filehandle” unless defined fileno($fh);

Detecting errors: • all IO functions return undef on failure • $! contains specific error info; string or number depending on context constant tags imported explicitly from Errno package splits string into list of words use Errno qw(EACCES ENOENT); my $result = open(FH, “>/etc/passwd”); if (!$result) { #something went wrong if ($! == EACCES) { warn “no permissions”; elsif ($! == ENOENT) { warn “file or directory no found”; } else { warn “some other error: $!”; } } numeric context string context

OO Syntax: • two OO extensions: IO::Handle and IO::File $a = “hi there”; $a_ref = \$a; @b = (‘this’, ‘is’, ‘an’, ‘array’); $b_ref = \@b; %c = ( first_name => ‘Fred’, last_name => ‘Freud’); $c_ref = \%c; creating references referencing components dereferencing $a = $$a_ref; @b = @$b_ref; %c = %$c_ref; $b_ref->[2] eq “an”; $c_ref->{last_name};

bless me! • An object is a reference that is blessed – it knows what class it belongs to. • A class is a package with methods that deal with object references. • A method is a subroutine that expects an object reference as its first argument. • same as OO sugar coating $obj_ref->method_name($p1, $p2) ClassName::method_name($obj_ref, $p1, $p2); package name

constructors: • constructors can be called anything: usually called new() • constructors are class methods • same as $obj_ref = ClassName->new(); $obj_ref = ClassName::new(‘ClassName’);

class hierarchy: holds generic methods common to all filehandles super class user friendly towards file handling all three kinds of filehandles are accessible via the same set of Handle methods

OO example: #! /usr/bin/perl # file: count_lines.pl use strict; use IO::File; my $file = shift; my $counter = 0; my $fh = IO::File->new($file) or die “Can’t open file: $file”); while ( defined (my $line = $fh->getline)) { $counter++; } STDOUT->print(“Counter $counter lines\n”); lazy evaluation of or; die only if new() returns undef. built-in Handle object so it can use Handle methods simple wrapper around standard perl print() subroutine

IO::File Methods 1: $fh = IO::File->new($filename [, $mode [,$perms]]) • main constructor; replaces open(), same rules for $mode and $perms • Still need to look at $! if it fails • invisible in file system; file goes away when object destroyed • called automatically if you forget; an IO::Handle method $fh = IO::File->new_tmpfile; $result = $fh->close;

IO::File Methods 2: $result = $fh->open($filename [,$mode [,$perm]]); • used to reopen a file in a redirection situation; comes from IO::Handle STDOUT->open(“>log.txt”) or die “Can’t reopen STDOUT: $!”;

IO::File Methods 3: $fh = IO::File->new($filename [,$mode [,$perm]]); • main constructor for IO::File; w/ 1 argument it acts as the 2 argument version of open(). Returns undef and $! if an error occurs • just in case you need a temporary file • happens automatically if you forget $fh->IO::File->new_tmpfile; $result = $fh->close();

IO::File Methods 4: $result = $fh->print(@args); $result = $fh->print($fmt. @args); $bytes = $fh->write($data [,$length [,$offset]]); $bytes = $fh->syswrite($data [,$length [,$offset]]); $bytes = $fh->read($buffer,$length[,$offset]); $bytes = $fh->sysread($buffer,$length[,$offset]); • These work just like their standard counterparts • replace <> • $| = [0|1]; but for any filehandle $line = $fh->getline @lines = $fh->getlines $previous = $fh->autoflush([$boolean]);

IO::File Methods 5: $boolean = $fh->opened; # same as defined fileno($fh); • Returns true if a file handle is valid. • Returns true if next read of the filehandle will return EOF • Performs a one-time flush of the filehandle buffer. If write buffer then write occurs; if read buffer then data discarded. • Turns blocking on and off (Chapter 13). $boolean = $fh->eof $fh->flush $boolean = $fh->blocking([$boolean]);

IO::File Methods 6: $fh->clearerr; . . . $boolean = $fh->error; • Used together these report on any error in the intervening code (. . .).

Copying Filehandles: $fh = IO::File->new_from_fd($fd,$mode); • Creates a duplicate handle for an existing filehandle object, previously opened with the same $mode. • $fd can be an IO::Handle object, an IO::File object, a regular filehandle or a numeric file descriptor. • similar to $saveout = IO::File->new_from_fd(STDOUT,”>”); open(SAVEOUT, “>&STDOUT”);

Copying Filehandles 2: $result = $fh->fdopen($fd,$mode); • Reopens an existing handle ($fh) as a copy of another existing filehandle object ($fd), previously opened with the same $mode. • $fd can be an IO::Handle object, an IO::File object, a regular filehandle or a numeric file descriptor. • Used with new_from_fd() to restore a saved filehandle $saveout = IO::File->new_from_fd(STDOUT,”>”); STDOUT->open(‘>log.txt’); … STDOUT->print “Yippie yie yay!\n”; … STDOUT->fdopen($saveout,”>”); STDOUT filehandle saved STDOUT redirected to file STDOUT redirected back to where it came from

npp_ch1_2

npp_ch1_2

Presentation Transcript