Introducing perl
Download
1 / 110

Introducing Perl - PowerPoint PPT Presentation


  • 89 Views
  • Uploaded on

Introducing Perl. P ractical E xtraction and R eport L anguage First developed by Larry Wall, a linguist working as a systems administrator for NASA in the late 1980s, as a way to make report processing easier. Since then, it has moved into a large number of roles:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Introducing Perl' - palmer-park


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Introducing perl
Introducing Perl

  • Practical Extraction and Report Language

  • First developed by Larry Wall, a linguist working as a systems administrator for NASA in the late 1980s, as a way to make report processing easier.

  • Since then, it has moved into a large number of roles:

    automating system administration

    acting as glue between different computer systems;

    the language of choice for CGI programming on the Web?

  • Free: Source code, documentation, use

  • Portable: more than 50 operating system platforms


Introducing perl1
Introducing Perl

  • Perl is often called the Swiss Army chainsaw of languages: versatile, powerful and adaptable - resembles the Swiss Army knife

  • Perl is an interpreted language optimised for

    • Scanning arbitrary text files

    • extracting information from those text files

    • Printing reports based on that information

  • Perl is intended to be practical: easy to use, efficient and complete rather than tiny, elegant and minimal

  • Perl’s slogan is “There is more than one way to do it” and its philosophy is to “make easy things easy while making difficult things passible”


Introducing perl2
Introducing Perl

A little history:

  • December 1987, release 1.0

  • Current major release 5.0 in October 1994

  • July 1998 release 5.005

  • March 1999 release 5.005_03

  • March 2000 release 5.6

  • Latest version ?

    Portable

  • Unix: AIX, HSD, HP-UX, IRIX, Linux, Solaris …

  • MS Windows

  • MacOS

  • Others: AmigaOS, OS2, VMS ...


Introducing perl3
Introducing Perl

Programs, Scripts, Compilers, and Interpreters

  • Perl programs are call “Scripts”. There is no difference. It is just source code

  • Perl “compiler” is also call “interpreter”.

  • The source code is compiled into bytecode which is executed in the main memory(rather high level bytecode: “sort a list” is one operation)


Introducing perl4
Introducing Perl

Perl Internet References Resources

  • Home page of Perl: http://www.pwrl.com

  • Perl user groups: http://www.perl.org

  • CPAN(Comprehensive Perl Archive Network) http://cpan.org

    Getting Perl

  • Unix: www.cpan.org/ports

  • Windows: http://www.activestate.com/

  • MacOS: http://www.macperl.com/


Writing and running perl programs
Writing and Running Perl Programs

Writing Perl scripts:

  • Code is plain text, any text editor will do

  • UNIX: Emacs

  • Windows: Notepad

    Running Perl

  • from command line: perl myprog.pl

  • with option: perl -w myprog.pl

  • in UNIX, make the first line of your program to be #!/usr/bin/perl, and add execute permission using chmod


Documentation and online help
Documentation and online help

  • Extensive documentation comes with the standard distribution

    UNIX: man perl

    Windows: Programs->ActivePerl->Online Documentation

  • Online http://www.perl.com/pub/v/documentation

  • FAQs http://www.perl.com/pub/v/fags

  • The Perl Journal http://www.tpj.com/


Hello world
Hello World!

In C

#include <stdio.h>

main()

{

printf(“Hello World\n”);

}

In Java

public class Hello {

public static void main (String [] args)

{

System.out.println(“Hello World!”);

}

}

In Perl

print “Hello World!\n”;

More than one way to do it:

print (“Hello World!\n”);

print “Hello world!”, “\n”;

print “Hello”, “”, “World!”, “\n”


Some simple scripts one liners
Some Simple Scripts/One-liners

Example 1: Print lines containing the string ‘Shazzam!’

  • #/usr/bin/perl

  • while (<STDIN>) {

  • print if /Shazzam!/

  • };

  • #<STDIN> is a bit of Perl magic that delivers the next line of input each time round the loop

    Example 2: The same thing the hard way

  • #/usr/bin/perl

  • while ($line = <STDIN>) {

  • print $line if $line =~/Shazzam!/

  • };


Some simple scripts one liners1
Some Simple Scripts/One-liners

Example 3: A script with arguments

  • #/usr/bin/perl

  • $word = shift;

  • while (<>) {print if /$word/};

  • If we put the script in a file called match, we can invoke the script

    • match Shazzam!

    • Match Shazzam! file1 file 2

  • The shift operator returns the first argument from the command, and move others up one position.

  • Called with one argument, match reads standard input and prints those lines which contains the word given as the argument.

  • Called with two or more arguments, the first argument is the word to be searched for, and second and subsequent arguments are filenames of fles that will be searched in sequence for the target word.


Some simple scripts one liners2
Some Simple Scripts/One-liners

Example 4: Error Messages

  • #/usr/bin/perl

  • die “Need word to search for\n” if @ARGV ==0;

  • $word = shift;

  • while (<>) {print if /$word/};

  • @ ARGV is a special array which holds the command line parameters. A program is executed as a result of a system command, which consists of the executable program file, followed by a command tail, e.g. :

  • C:> program param1 param2 ... paramn

  • Then $ARGV[0] = "program", $ARGV[1] = "param1", $ARGV[2] = "param2" ... $ARGV[n] = "paramn".


Some simple scripts one liners3
Some Simple Scripts/One-liners

Example 5: Reverse order of lines in a file

  • #/usr/bin/perl

  • open IN, $ARGV[0] or die “Cannot open $ARGV[0]\n”;

  • @file = <IN>;

  • for ($I = @file-1; $I >= 0; $I--)

  • {

  • print $file[$I];

  • }

  • Do the same in C, Java ?


Variables and datatypes
Variables and Datatypes

  • $ Scalar

  • @array

  • %hash

  • The type of a variable is marked by the type prefix ($ @ %), which is always used.

    • $x = $y +3

  • Variable names are arbitrary long, which can consist of characters a - z, A- Z, the underscore _ (and must begin with any of these), and digits from 0 - 9. It is case sensitive: uppercase and lowercase are different.

    • $No_of_Students, @StudentList, %StudentRecord_01

  • Special control variables have “punctuation” in their names, e.g., the $^O variable which tells the name of the current operation system.


  • Variables and data types scalars
    Variables and Data types (Scalars)

    • A $scalar holds a single data item. The data can be

    • string, numeric, boolean depending on context.

    • When scalars are understood as numbers (that is, used as numbers), they are double precision floating point numbers.

    • Scalars are given a value by the assignment operator = . For example

    • $a = “The University of Nottingham, UK”;

    • $b = 129867445;

    • $c = 3.14159;

    • $d = 03776; #octal

    • $e = 0x3fff; #hex


    Variables and data types aggregates
    Variables and Data types (Aggregates)

    • There are two aggregate datatypes, @array and %hash. Both can hold an unlimited number (as long as there is memory) of scalars, there is no explicit declaration, allocation, deallocation, or any explicit memory management

      • @array:

      • Arrays are ordered and they are indexed by a number (a scalar in numeric context)

      • %hash:

      • Hashes are unordered and they ar indexed by a string (a scalar in string context)


    Variables and data types arrays
    Variables and Data types (Arrays)

    • An @array is an aggregate for storing scalars and indexed by a number (a scalar) inside square brackets [].

    • The indexing is zero-based.

    • Also negative indices can be used, they count from the end of the array.

      • @a = (“one”, “two”, “three”);

      • print “$a[1] $a[0] $a[-1]\n”;

    • will result in

      • two one three


    Variables and data types arrays1
    Variables and Data types (Arrays)

    • If you enclose the array in double quotes, the scalars of the array are printed separated by space.

      • @a = (“one”, “two”, “three”);

      • print “@a\n”;

        will give one two three

  • Note that while an array has a type prefix of @ and element of the array is a scalar and therefore has a prefix of $:

    • @a = (“one”, “two”, “three”);

    • $a[3]= “four”;

    • print “@a $a[0]\n”;

      will give one two three four one


  • Quoting basic
    Quoting: Basic

    • Inside double quotes variables and \-constructs (like \n) are expanded, inside single quotes they are not expanded.

    • The difference between single quotes and double quotes is that single quotes mean that their contents should be taken literally, while double quotes mean that their contents should be interpreted.

    print "This string\nshows up on two lines.";

    will show: This string

    shows up on two lines

    print 'This string \n shows up on only one.';

    will show: This string \n shows up on two lines

    @a = (“one”, “two”, “three”);

    print “@a\n”, ‘@a’, “\n”;

    will show one two three

    @a


    Quoting basic1
    Quoting: Basic

    • Inside double quotes, the following are the most common \-constructs

    • \n The logical newline

    • \t The tabulator

    • \$ The dollar

    • \@ The at sign

    • \xHH Character encoded in hexadecimal

    • \010 Character encoded in octal

    • \” The double quotes

    • \\ The backslash itself


    Quoting the qw operator
    Quoting: The qw operator

    • There is a special quoting construct for quoting “words”, or strings consisting only of alphanumeric characters.

      • @a = (“one”, “two”, “three”);

      • can be written as

      • @a = qw(one two three);

      • The qw stands for “quote words”

      • Note also that all the quoting constructs are operators.


    Variables and data types arrays2
    Variables and Data types (Arrays, $#)

    • The notation $#arrayname returns the index of the farthest array element ever modified.

    • @a = qw(one two three);

    • print “The las index of \@a is $#a. \n”;

    • will show

    • The last index of @a is 2


    Variables and data types undef
    Variables and Data types (undef)

    • What is in the aggregate elements that have not been assigned a value? The undef value (the value of all uninitialized scalars, not just of uninitialized aggregate elements).

    • This can be tested using defined, explicitly assigned by using the undef() function, and should be always caught by using -w switch.

    • @a = qw(one two three);

    • if (defined $a[1]) {print “Oh, yes\n”};

    • if (defined $a[9]) {print “Impossible. \n”};

    • Usin the undef() a scalar can be “returned” to an uninitialized state.

    • @a=qw(one two three);

    • undef $a[1];

    • if (defined $a[1]) {print “Impossible. \n”}


    Important w
    Important: -w

    • The following script will print The tenth element is . And you would waste your time by wondering what went wrong

    • @ = qw (one two three);

    • print “The tenth element is $a[9]. \n”;

    • But, by adding the -w switch you would have seen this:

    • Use of uninitialized value ….

    • Use -w is strongly encouraged.

    • The -w catches not only the use of uninialized values but also other mistakes and problems, such as using a variable only once (usually indicative of a typo)


    Scalar context strings or numbers
    Scalar Context: Strings or Numbers

    • Numbers in Perl can be manipulated with the usual mathematical operations: addition, multiplication, division and subtraction. (Multiplication and division are indicated in Perl with the * and / symbols, by the way.)

      $a = 5;

      $b = $a + 10; # $b is now equal to 15.

      $c = $b * 10; # $c is now equal to 150.

      $a = $a - 1; # $a is now 4.

    • You can also use special operators like ++, --, +=, -=, /= and *=. These manipulate a scalar's value without needing two elements in an equation.

      • $a = 5;

    • $a++; # $a is now 6; we added 1 to it.

    • $a += 10; # Now it's 16; we added 10.

    • $a /= 2; # And divided it by 2, so it's 8.


    Scalar context strings or numbers1
    Scalar Context: Strings or Numbers

    • Strings in Perl don't have quite as much flexibility. About the only basic operator that you can use on strings is concatenation. The concatenation operator is the period . Concatenation and addition are two different things:

    • $a = "8"; # Note the quotes. $a is a string.

      $b = $a + "1"; # "1" is a string too.

      $c = $a . "1"; # But $b and $c have different values!

    • Remember that Perl converts strings to numbers transparently whenever it's needed, so to get the value of $b, the Perl interpreter converted the two strings "8" and "1" to numbers, then added them. The value of $b is the number 9. However, $c used concatenation, so its value is the string "81".

    • Just remember, the plus sign adds numbers and the period puts strings together.


    Context scalar v s list
    Context: Scalar v.s. List

    • A very pervasive concept is different context. Certain constructs and functions behave differently depending on the context they are used. For example, the context of the left side of the assignment operator (=) forces the right side to comply:

    • @x = qw (adc de f)

    • @a = @x;

    • $a = @x;

    • print “@a: $a \n”;

    • will print abc de f: 3

    • In scalar context the value of an array is the size of the array ($#array plus one)


    Array versus list
    Array versus List

    • The difference between an array and a list is that an array has a name and the @ type prefix, while a list is a parenthesis-enclosed comma-separated entity. In

      @ x = qw(abs de f);

      • a list is assigned to an array

        Separate Name Space

    • The name space of scalars, arrays, and hash are completely separated because the type prefix explicitly tells which one we are talking about

      • $ x = “Tyreytio”;

      • @x = qw(asd df f);


    Hashes i
    Hashes (I)

    • A hash is an unordered aggregate which holds scalars, the values of the hash, indexed by strings (scalars), the keys of the hash. The index is enclosed in curly brackets

    • %a = qw ( Nottingham 0115

    • Sheffield 0114

    • Leeds 0113);

    • print “$a{Leeds}\n”;

    • will output

    • 0113


    Hashes ii
    Hashes (II)

    • The keys and the values of a hash can be returned by the keys and values functions.

    • %a = qw (Nottingham 0115

    • Sheffield 0114

    • Leeds 0113);

    • @k = keys %a;

    • @v = values %a

    • print “@k\[email protected]\n”;

    • will possibly (the order is pseudo random) output

    • Nottingham Sheffield Leeds

    • 0115 0114 0113


    Hashes iii
    Hashes (III)

    • The existence of a key-value pair in a hash can be verified using the exists function.

    • %a = qw( Nottingham 0115

    • Sheffield 0114

    • Leeds 0113);

    • $b = exists $a{Leeds} ? 1:0;

    • $c = exists $a{Birmingham} ? 1:0;

    • print “$b $c \n”

    • will print 1 0

    • The exists cares only about the existence of the key: the value is irrelevant.


    Hashes iv
    Hashes (IV)

    • The key-value pairs of a hash can be returned iteratively (in a loop) by each function

    • %a = qw(Nottingham 0115

    • Sheffield 0114

    • Leeds 0113);

    • while (($k, $v)= each %a){

    • print “$k $v\n”;

    • }

    • will possibly print (again the order is pseudo random)

    • Nottingham 0115

    • Sheffield 0114

    • Leeds 0113


    Hashes v
    Hashes (V)

    The => operator

    • The => operator is a variant of the , (comma) operator which as a side effect forces its left operand to be a bare word, effectively a string constant with implicit single quotes around it. This is a convenient notation which is most often used when specifying the key-value pairs for a hash

    • %b = (‘English’, ‘one’);

    • %b = (English => ‘one’);

    • these are equivalent.


    Hashes vi
    Hashes (VI)

    • Hash elements or groups of hash elements (slices) can be deleted using the delete function

    • %a =(English =>”one”, French => “un”,

    • German => “ein”, Finish => “yksi”

    • Japanese => “ichi”, Chinese “yi”);

    • delete $a{German};

    • delete $a{‘French’, ‘Finish’};

    • print values %a, “\n”;

    • will print the values one ichi yi in some order


    Slices of aggregates
    Slices of Aggregates

    • In addition to accessing the aggregates either as a whole or per element, it is possible to access them by groups of elements. These groups are called slices. The syntax is @ variable indices, or in other words, @array[number] or @hash{strings}. The slice is the list of scalars at the specified indices.

    • %a =(English =>”one”, French => “un”,

    • German => “ein”, Finish => “yksi”

    • Japanese => “ichi”, Chinese “yi”);

    • @s = @a{“German”,”English”};

    • print “@s\n”;

    • Should result in: ein one

    • The order of the returned list is well defined because of the order of the indices is well-defined.


    Operators
    Operators

    • A fairly standard set of mathematical, logical, and relational operators exists, the only somewhat exceptional one is the power operator **, for example 2**3 is 8.

    • Scalars can be either numbers or strings depending on the context they are used in - and this is exactly what operators do: they force a contexr on their operands. For example, while the + forces numeric context on its operands and sums them, the . (dot) operator forces string context on its operand and concatenates them.

      • (B, $c) = (2, 3);

      • $a = $b + $c;

      • $d = $b.$c;

      • print “$a $d \n”

  • will print: 5 23


  • Operators specialties
    Operators: Specialties

    • Perl has these

    • separate sets of comparison operators for string and numeric context

    • generalized comparison operators cmp and <=>

    • low-precedence and, or and not (&&,|| and ! Are high precedence)

    • string concatenator . (dot) and string/list repeater x

    • left-quoting pseudo-comma =>, range generator ..

    • Quoting operators

    • file/directory input operator < >

    • Pattern matching, substitition, and biding operators, m,s,=~, and !~


    Operators boolean
    Operators: Boolean

    • The && and || are short-circuiting as in C: they stop evaluating their operands as soon as the first decisive value is met (The first false for &&, the first true for||)

    • There are also variants of lower precedence: and, or, xor and not


    Operators precedence
    Operators: Precedence

    • Precedence rules are much like in C, C++, or Java

    • Some confusion may stem from the fact that when calling functions (either built-in or user defined), the parentheses are not required. In other words there are equivalent:

      • $n = length ($header);

      • $n = length $header;

    • This would be easy enough to comprehend, but things gets interesting when the functions have more than one argument, and especially interesting when the number of arguments varies.

    • When in doubt, parenthesize


    Operators assignments valued and modifiable
    Operators: Assignments, Valued and Modifiable

    • Assignments have values.

      • $a += ($b=$c);

      • This copies the value of $c to $b and then adds that to $a.

    • Assignments are modifiable.

    • This copies the value of $d to $c and then adds the number of elements in @e to $c.

    • Or, in other words, the left side of an assignment can be used in further assignment (this property is often called “lvalue”, left-value).


    Operators assignments list context
    Operators: Assignments, List Context

    • Assignment also works in list context.

      • ($a, $b) = ($b, $a);

      • This swaps the values of $a and $b


    Operators comparing
    Operators: Comparing

    • For comparing scalars with each other and with literal values all the usual comparison operators exist. The catch is that there are two sets of comparison operators: One for comparing in string context, one for comparing in numeric context.

    The cmp and < = > return -1, 0, or 1 depending on whether the

    comparands fullfil the less-than, equal-to, or greater than relation


    Operators concatenate and repeat
    Operators: Concatenate and Repeat

    • Scalars can be concatenated (as string) using the . (dot) operator

      • $a = “con”;

      • $b = “nate”;

      • $c = $a . “cate”. $b;

      • will result in $c being “concatenate”.

    • Scalars and lists can be repeated using the x operator. The repetition count comes after the operator

    • $a = “yes”; @b = ($a, “No”); $c = $a x 3;@c = @bx2;

    • print “@c, $c!\n”;

      • Will print: Yes No Yes No, No No No


    Operators range generator
    Operators: Range Generator . .

    • The . . Operator can be used to generate ranges of values (as lists) between two scalar endpoints. This works both for numbers and strings. The rules ffor “incrementing strings” to generate ranges are as with the ++ operator.

      • @a = 0 . . 4;

      • @b = “aa” . . “zz”;

      • print “@a @b[0 .. 4] @b[-4 .. -1]\n”;

    • will result in 0 1 2 3 4 aa ab ac ad ae zv zw zx zz


    Operators quoting
    Operators: Quoting

    • Single quotes do not extrapolate (do not expand) variables, double quotes do.

    • Double quotes have also several special constructs triggered by using the backquotes (\), for eaxmple \n for a logical new line and \U for uppercasing the value of the following variable.

      • If ($need eq ‘urgent’) { print “\U$task\n”}

      • else {print “$task\n”}


    Operators here documents i
    Operators: Here-Documents (I)

    • A quoting mechanism called here-documents (inherited from UNIX shell scripts) enables one to easily include multiline blocks of text.

    • The syntax is <<terminator, (the terminator can be any “word”: alphanumerics and underscores) followed by the lines of text, and terminated by the terminator string at the beginning of a line and alone on the line.

      • $message = <<HERE_IS_THE_MESSAGE;

      • One

      • Two

      • Three

      • HERE_IS_THE_MESSAGE

    • This is equivalent to $ message = “One\nTwo\nThree\n”;.


    Operators here documents ii
    Operators: Here-Documents (II)

    • The here-document mechanism can be used either in single quoted way (variables not extrapolated) or in double quoted way (variable extrapolated)

    • If there are no quotes around the terminator after the <<, the singlequoted case is implicitly assumed. Explicit single quoteing can be achieved by enclosing the terminator inside single quotes.

      • $x = <<EOF

      • no @expansions

      • EOF

      • $y = <<‘EOF’

      • no @expansions

      • EOF

    • After this, $x and $y are equivalently “no \@expansions\n”


    Operators here documents iii
    Operators: Here-Documents (III)

    • If there are scalars or arrays in the here-block that need extrapolated, double quotes around the terminator will help.

      • @x = qw (a be def);

      • $x = << “EOH”;

      • This

      • @x

      • is it.

      • EOH

    • Will have $x equal to “This \na bc def\nis it\n”.


    Operators input operator
    Operators: Input Operator < >

    • After having opened a file or a directory one gets a handle. The “next item” can be read from the handle using the diamond operator < >.

      Open (X, “FileName”) or die “$0:failed to open FileName: $!\n”;

      • $line = <X>;

    • will read the first line of the file called “FileName” into $line.

    • The $0 is a special variable that contains the name of the script. The $! Is a special variable that contains the error message caused by the latest failed system call, such as trying to open a file. In numeric context, $! is the numeric error code.

    • The definition of “next item” for files is the next line and for directories, the next filename.


    Control structures
    Control Structures

    • A very rich selection of flow control structures is available.

    • Control structures control blocks, statements enclosed in { }.

      • If ($a >= $b +$c) {

      • $a = 0;

      • } else {

      • $a++;

      • }


    Control structures if unless elsif else
    Control Structures: if, unless, elsif, else

    • The if control structure works as might be expected, the following block is executed if the condition is true.

      • If ($x <=100) { …..}

    • The unless is the logical oposite of if: the block follwing it is chosen if the condition is false

      • unless ($x>100) {……}

      • if ($x <10) {……)

      • elsif ($x <20) {…..}

      • elsif ($x <30) {….}

      • else {….}


    Control structures boolean
    Control Structures: Boolean

    • If you want to use a “bare” scalar as the condition (in Boolean context), note that there is no need for a separate Boolean type. Any scalar will do as the condition test of the control structures. The rules are as follows:

    • Everything is interpreted as a string.

    • Numbers are interpreted as strings

    • undef is interpreted as the empty string “ ”

    • The strings “0” and “ ” are false

    • All other strings are true


    Control structures while until do
    Control Structures: while, until, do

    • The if and unless are one shot; if you want to keep on executing the block of statements, use the while and until

      • while ($x < 100) {…..}

      • until ($x >102) { ….}

    • If you want to test the condition after at least one round of execution has been completed, use the do

      • do {…} while ($x <100);

      • do {…} until ($x >182);


    Control structures for foreach
    Control Structures: for, foreach

    • The for can be used as in C to initialize a state, to test for termination of the loop, and do something at the end of each round

      • for ($i =0; $i <100; $i++) {…..}

    • The foreach can be used to iterate over a list of scalars

      • @a = qw(as we qwe);

      • foreach (@a) {print $_, “\n”)

    • what happens here is that the default scalar $_ is aliased in turn to each of the values in the list

    • You can use some other variable

      • @a = qw(as we qwe);

      • foreach $b (@a) {print $b, “\n”)


    Control structures foreach alaising
    Control Structures: foreach Alaising

    • The aliasing that takes place in the foreach when iterating over a list is very real: if you change the loop variable, you change the scalar in the list:

      • @a = qw(abc def ghi);

      • foreach (@a) {

      • if ($_ eq “def”){ $_=“xyz”}

      • print “@a\n”;

    • This will output: abc xyz ghi


    Control structures while handle
    Control Structures: while (<handle>)

    • A very common construct is while (<handle>) which means “while there are lines in handle keep reading them into the default scalar, $_”

      Open (X, “Input”)

      or die “$0:failed to open Input: $!\n”;

      • while (<X>) {

      • print;

      • }

      • close X;

    • The print without an explicit argument implicitly outputs the default scalar (to the standard output). The above therefore copies (prints) the contents of the file called “Input” to the standard output, and finally close the file. The above loop is shorthand for

    • wile($_= <X>){print $_;}


    Subroutines i
    Subroutines (I)

    • A subrountine is defined with the sub keyword, and adds a new function to your program's capabilities. When you want to use this new function, you call it by name. For instance, here's a short definition of a sub called boo:

      • sub boo {

    • print "Boo!\n";

    • }

    • boo(); # Eek!


    Subroutines ii
    Subroutines (II)

    • In the same way that Perl's built-in functions can take parameters and can return values, your subs can, too.

    • Whenever you call a sub, any parameters you pass to it are placed in the special array @_. You can also return a single value or a list by using the return keyword.

    • sub multiply{my (@ops) = @_;

    • return $ops[0] * $ops[1];

    • }

    • for $i (1 .. 10) {

    • print "$i squared is ", multiply($i, $i), "\n";}

    The my indicates that the variables are private to that sub, so that any existing

    value for the @ops array we're using elsewhere in our program won't get

    overwritten.


    Formats
    Formats

    • If you want to produce text reports that have a certain layout, a feature called formats allows you to create “picture” or templates of the required output formats. The templates contain both the layout and the variables to fill in the layout.

    • printf, sprintf

    • The most common formatted printing tool is printf, and its string-producing twin, sprintf

      printf “%5.3f [%-5s] [%5s]\n”, 1/7, ‘abc, ‘def’;

      will produce: 0.143 [abc ] [ def].

      $s=sprintf“%5.3f [%-5s] [%5s]\n”, 1/7, ‘abc, ‘def’;

      print $s;

    • The sprintf returns the formatted output string.


    Formats defining i
    Formats: Defining (I)

    • Formats bare defined using the format keyword.

      • Format REPORT =

      • ….

      • ..

      • .

    • The REPORT is the name of the filehandle for which the format is to be used. If not specified, STDOUT is assumed.

    • The format definition ends with a .(dot) alone at the beginning of a line


    Formats defining ii
    Formats: Defining (II)

    • Format REPORT =

    • ….

    • ..

    • .

  • The actual format definition, between the format line and the terminating dot, consists of three kinds of lines:

  • comment lines, which start begin with #

  • picture lines, which contain format strings

  • argument lines, which the arguments for the formatting strings


  • Formats defining iii
    Formats: Defining (III)

    • The picture lines are printed as-is, except for certain special strings that begin with a @ (this is not an array) or ^

      • @<<<< to left-align a string

      • @|||| to center a string

      • @>>>> to right-aling a string

      • @#.## to right-align and format a number

    • The <<<<, ||||, and >>>> can naturally be as wide as required, not just width of four as shown here. The aligning and centering will be adjusted to match the width of the field.

    • The usage of the ^-prefixed field is more complex and explained in more detail in Perl formats, perlform


    Formats defining iv
    Formats: Defining (IV)

    • The argument line contain enough variables, separated with commas, to fill in the picture lines above them

      • @<<<<<<<<<< @|||||||||| @###.###

      • $month, $project, $budget_millions


    Formats using
    Formats: Using

    • Using the formats is much easier than defining them. The built-in write function fetches the template and the variables to be used and then outputs the filled-in template.

    • The write can be called with or without a filehandle name. With a filehandle name, the format of the said filehandle is used. Without a filehandle, the format of the currently selected default output handle is used. The default output filehandle is STDOUT, but this can be changed using the select() built-in function.


    Formats using1
    Formats: Using

    A simple example using formats:

    • format STDOUT =

    • [@<<<<<<<<<<<] @#.### [@>>>>>>>>>>>>>>>]

    • $x, $y, $z

    • .

    • ($x,$y,$z) = (‘left2, 1/7, ‘right’);

    • write;

      This will produce

    • [left ] 0.143 [ right]


    List processing
    List Processing

    • For inspecting and processing lists several useful functions are available.

      • grep

      • map

      • sort

      • reverse


    List processing1
    List Processing

    • To find elements of a list that satisfy some criterion, use the grep function. It has two slightly different syntax, which work the same way. The two statements below compute the same result.

    • @b = grep b($_> $max, @a);

    • @c = grep {$_>$max} @a; #Note: no comma after the block

    • Inside the expression or the block the default scalar $_ is aliased to each element of the list (here, an array) in turn. The expression or the expressions in the block are evaluated, and if the last value is true, the element value is returned.

    • @ = (31,14,15,92,65,35,89,79);

    • @b = grep {$_ > 50} @a;

    • print “@b\n”;

    • This will results in 92 65 89 79


    Aside the command line argument @argv
    Aside: The command Line argument: @ARGV

    • This will echo those command line arguments that are greater than the first argument.

      • my $I = shift;

      • foreach (@_) {print “$_\n” if $_>$I}

    • Outside any subroutine definitions the shift without an explicit array argument array argument will process the @ARGV array, the arguments of the program.


    List processing map transform
    List Processing : map-Transform

    • To get a copy of a list with the elements transformed through some mapping, use the map function. It has, like grep, two forms, with a block (and no comma) and with an expression (and a comma). The last value will be returned for each element, and from those values a new list can be constructed.

      • @ = qw(this little piggie went to the market);

      • @b = map {length} @a;

      • print “@b\n”;

    • This will print 4 6 6 4 2 3 6

    • The length without an argument works on the $_. The $_ is again an alias, so if you modify it you’ll modify the elements of the original list.


    List processing foreach transform
    List Processing : foreach Transform

    • If you want to modify (distructively) the elements of a list, don’t wast map on it, use foreach

      • @ = 1 .. 5;

      • for each (@a) {$_**=$_}

      • print “@a\n”;

    • This will output 1 4 27 256 3125


    List processing sort
    List Processing : sort

    • One of the most common tasks in computing is arranging information, sorting. It is so important there is a highly tuned and highly tunable built-in function for it, sort. It has three alternate syntax

      • @b = sort subname @a

      • @b = sort { …. } @a;

      • B= sort @a;

    • The first two are equivalent, they carry the comparison as their first argument, either as a name of a subroutine or as an inlined block.

    • The third one has an implied sorting order: stringwise comparison.


    List processing sort1
    List Processing : sort

    • The subroutine or the block get “magically” (meaning that you don’t have to care how they appear in there) passed two arguments called $a and $b. The subroutine or the block must then compare them, and finally returned something less than zero, a zero, or something greater than zero, depending on how the comparands are ordered.

    • Examples:

    • @b = sort {$a <=> $b} @a; #Sort as numbers.

    • @c = sort {$b<=>$a} @a ; #sort as descending numbers

    • @d = sort {lc($a) cmp lc($b)} @a; #Sort case insensiively

    • #sort keys by numeric values.

    • @e = sort {$d{$a} <=> $d{$b}} keys %a;

    • The $a and $b are passed as aliases, so don’t modify them.


    List processing reverse
    List Processing : reverse

    • A list can be reversed by the reverse function.

      • @a = qw(function reverse the by reversed be can list A);

      • @b = reverse @a;

      • print “@b.\n”;

    • Will print:

      • A list can be reversed by the reverse function.


    String processing index rindex
    String Processing: index, rindex

    • To find a substring from a string, use the index function. It returns the offset of the substring (more precisely, the offset of the first occurrence). If the substring cannot be found, -1 is returned.

      • print index(“foobar”, “bar”), “\n”;

      • print index(“foobar”, “baz”), “\n”;

      • This will print:

      • 3

      • -1

    • The index doesn’t do any pattern matching, just literal strings. To find the last occurrence of a substring in a string, use the rindex function.


    String processing substr
    String Processing: substr

    • To extract and modify substrings by position and length the substr function is available

    • $a = “I would like to by a camel, please.\n”;

    • print “I like “, substr($a, 22, 5), “s, of course.\n”;

    • substr {$a, 22, 5) = “llama”;

    • print “But $a\n”;

      This will output

    • I like camels, of course.

    • But I would like to buy a llama, please.

    • The offset and length parameter can be negative, which means from the end of the string.


    String processing chomp
    String Processing: chomp

    • The < > operator leaves the newlines to lines it returns. To remove the newlines use the chop operator:

      • while (STDIN>) {

      • chomp; #Operate on $_

      • print STDERR “[$_]\n”;

      • }

    • This will remove the possible newline, and then print out the reformated lines to the standard error stream.


    String processing split
    String Processing: split

    • With the split you can cut a string into a list. The first argument of split is the separator pattern. The second, optional, argument is the string to be split.

    • @b = split(/,/,$a);

    • @c = split /:/;

    • Because the first argument is a pattern, full understanding of the split requires understanding of patterns, and therefore we will revisit split later.

    • There is a commonly used special case, however, using ‘ ‘ as the separator.

    • @b = split(‘ ‘, @a);

    • This will split @a by any number of ant whitespace characters and any leading and tailing empty fields (resulting from leading or tailing whitespace) will be dropped (Not unlike the qw operator).


    String processing join
    String Processing: join

    • You can combine several scalar strings into a single string by using the join function.

      • $record = join (“, “, @field);

    • Note: split and join are NOT symmetrical because the separator of split is a pattern, not a literal string, as the separator of join is.


    String processing reverse
    String Processing: reverse

    • The reverse function applied on a scalar produces a scalar reversed characterwise.

    • @b = reverse (“yes or no.”);

    • print $b, “\n”;

    • This will print

    • .on ro sye


    Pattern matching
    Pattern Matching

    Matching

    • Perl has very powerful pattern matching capabilities.

    • The basis of pattern matching are the regular expressions.

    • Regular expressions are a language of their own within Perl.


    Matching the m operator
    Matching: The m operator

    • The m operator is used to match a pattern against a piece of text, a scalar. It returns true if a match can be found, false if not.

    • A strange thing about the m operator is that the m is quite often not written at all, but instead written like this

      • /pattern/

    • That is, the pattern between forward slashes. The pattern can contain even whitespaces, and the whole thing is still understood as a pattern matching operator.

    • To better understand how the m operator works and how it can be expressed without the m, we need to take another look at the quoting rules.


    Aside generalized quoting i
    Aside: Generalized Quoting (I)

    • The ‘single quotes’, “double quotes”, and ‘backquotes’, are in fact just “syntactic sugar” for the real quoting operators, q, qq, and qx.

    • The syntax of the quoting operators is very flexible. Because which characters should delimit the quote depends on the contents in it (you do not want to use as a delimiter a character that appears in the quote, unless you are fond of using backslashes). The delimiters are selected “dynamically” based on which character follows the quoting operator. Either the delimiters can be of the paired kind (like the parentheses), or they can be the same character (like the usual quotes). Examples:

      • $a = q(this is a single-quoted);

      • $b= qq[this is double quoted, @variables expand];

      • $c = qx ! Echo this goes to the operating system!;

    • The qw we have met earlier is another example of the quoting operators, a quoting construct produces a list.


    Aside generalized quoting ii
    Aside: Generalized Quoting (II)

    • The matching operator m has one double-quotish argument.

      • m {pattern}

      • m /pattern/

    • The substitution operator s has two double-quotish arguments

      • s /pattern/substitute/

      • s (pattern)(substitute)


    Matching binding
    Matching: Binding

    • The string to be matched against is either the default scalar $_ or it can be something else, by using the binding operator =~ or its logical oposite, the !~.

    • By using the default scalar and leaving out the m of m, we arriuve at the very common idiom

      • if (/pattern/) {……}

    • written out in full that would be

      • if ($_ =~ m/pattern/) {……}


    Matching the basic i
    Matching: The Basic (I)

    • The basic set of regular expression matching operators is the triad

    • . any character

    • * repeat

    • | alternative

    • With these and the parentheses for grouping, you can already construct perfectly fine regular experessions.

    • Alphanumeric characters like a, b, 1, 2, match themselves

    • The . Does not really match any character: rather, it matches any non-newline character.


    Matching the basic ii
    Matching: The Basic (II)

    • Regular expressions matches sub-strings, not whole strings, unless anchored to do otherwise.

    • In other words, the pattern need not match all of the target string. For example, /q/ will match the string “sesquipedalian” just fine.

    • Neither do regular expressions care about “words” (unless instruct to), the input string is just a flat string of characters with no implicit tokenization.


    Matching the basic iii
    Matching: The Basic (III)

    • Some examples:

    • /apple/ #matches ‘apple’

    • /bana*/ #matches “ban”, “bana”, “banaa”, “banaaaa”, ….

    • /ba(na)*/ #matches “ba”, “bana”, “banana”, “bananana”, ….

    • /(orange|cheery)/ # matches “orange” or “cherry”

    • /(pear|plum)*/ #matches e.g. “pearpear”, or #“plumpear”, ….

    • Beware of further punctuation characters: many of them have special meanings in regular expressions, collectively they are called metacharacters. If you want to match a punctuation character as itself, protect it by prefixing it with a backslash: \* matches the literal *


    Matching the anchors
    Matching: The Anchors

    • If you want to match only at the beginning of the target string, or only at the end, or at both (this is what you might call “match whole words”), use the anchoring meta-characters.

      • ^ beginning

      • $ end

    • For example /^abc/ will match abc only at the beginning of the string. The $ matches either at the end of the string (if there is no newline at the end) or before the last newline, if that is the last character of the string. If you want to match for a new line, use /n.

    • Again, if if you need to have these character matched literally, backslash them: \^, \$

    • Anchors are the most often used assertions: they do not themselves extend the match but they test for the validity of a (set of) conditions.


    Matching character classes
    Matching: Character Classes

    • You could use the alternation, |, for single characters, but that would be awefuuly inefficient. If you have a known set of single characters, use character classes.

    • Enclose the set of characters inside square brackets, [ ], and you are done. Inside those brackets the metacharacters lose their specialness, for example the dot, ., is just a dot, not any more matche-any-character. For example [ab*] will match any of the a, b, or *.

    • If you need to match “any character except these”, you can negate the character class by putting ^ as the first thing after the opening bracket, for example [^abc] will match any character that is not a, b, or c.


    Matching built in character classes
    Matching: Built-In Character Classes

    • For certain commonly used character classes and their negations there are shorthands available.

      • \w any alphanumeric or the underscore

      • \d any decimal digit

      • \s any space character

      • \W any non-\w

      • \D any non-\d

      • \S any non-\s

    • These can be used either inside or outside the [] character classes.

    • Note that the definition of \w(and \W) depend on you locale. If you want, say, to be understood as a \w character, you need to have you locale correctly set up and you must say use locale, at the top of your script. (The locale system is a UNIX only feature)


    Matching further repetition
    Matching: Further Repetition

    • With the basic repetition operator * and alternating (|) with an empty string you can construct repetitions of zero or more, one or more, …, but the expression can get rather ugly. Use instead the following handy postfix notations:

      • ? Not at all or once

      • + Once or more

      • {} {n}, {min,}, {min, max}


    Matching capturing sub matches
    Matching: Capturing Sub-matches

    • Often you will want to save parts of the match to process them further.

    • Parentheses let you do that.

    • Each sub-match corresponding to a parenthesized sub-pattern is saved a way to a special variable baned $1, $2, and so on.

      • “thx1138”=~/([a-z]+) - (\d+)/

    • After this:

      • $1 = “thx”, and $2 = “1138”


    Matching word boundary assertions
    Matching: Word Boundary Assertions

    • If you need to match at word boundaries, in other words, at places where a “word”, a string of alphanumeric characters begins or ends, you can use the word boundary assertion \b.

      • “to be or not to be” = ~/\b(..t)\b/;

    • This will save “not” to the $1 because that is the only place where word boundaries are three positions apart and there is an t at the third mid-position.

    • If you want to match at a non-boundary, use \B.


    Matching not so fast stingy matching
    Matching: Not so fast: Stingy Matching

    • The default behaviour of a regular expression is to gobble up as much as possible, that is, as long a match as possible. In other words, each repetition operator extends as far as possible. This is called greedy matching.

    • Sometimes this is not what you want. If you want to stop as soon as possible, you can use stingy matching. Changing the repetition operators to stingy is as simple as appending a ? to them: *?, ??, +?.

    • Example:

      • “(a,b) (c,d)” = ~ / \((.*?)\) /

    • This will set $1 = ‘(a,b)’, not ‘(a,b) (c,d)’. The * will stop at the first ) instead of continuing to match all the way until the last possible ) at the end of the string.


    Matching case in sensitiveness
    Matching: Case (In)Sensitiveness

    • Often the case of letters may different from what you would expect. The matching operator can have modifies after the “closing quote” that affect its behaviour. One possible effect is to ignore any case differences. For example so that Perl and perl are considered equal, the modifier for this is i.

    • This

      • /pc/i

    • will match

      • pc, Pc, pC and PC


    Matching non capturing sub matches
    Matching: Non-Capturing Sub-matches

    • Sometimes you need to use parentheses just for grouping (to use the | alternation, for example) but you do not want to collect the sub-matches. The (?: …) construct is for this situation: it is exactly like the ordinary parentheses but the sub-matches are not recorded into the $1 etc.

      • While (<>) {

        • print if /^(?:de|in|im|un)/;

        • }

    • This is not set $1.

    • Note also the special < > structure. Its meaning is: “iterate over all command line arguments (@ARGV), interpret them as filenames, open the files, and iterate over all the lines in those files”.


    Matching i want them all global match
    Matching: I want them all: Global Match

    • Normally a regular expression matches only once, returning the first successful match.

    • However, if you want to keep matching, you can do that using the “global” modifier, /g. The matching remembers the position it was in a scalar which means that you can use the /g match as the control condition of a repetitive control structure such as while

      • while (/(\d+)/g) {

        • print $1, “\n”;

        • }

  • In addition to the usual Boolean (scalar) beharviour of matching operator, the /g modifier introduces a new behaviour: in list context all the posiible matches are returned.

    • @m = /(\d+)/g;

  • This will return all the numbers of $_ and assign them to @m.


  • Matching iterative matching
    Matching: Iterative Matching

    • If you need to match exactly where the previous /g (if any) left off, you can use the \G assertion.

      • while (/ \G(\d+)./g){

      • print $1, “\n”;

      • }

    • This will match only as long as the numbers are separated by one character. The offset of a match can be returned (and forged!) with the pos function.


    Matching multi line matching
    Matching: Multi-line Matching

    • Normally, regular expression match only within one line, but it is possible to match over multiple lines.

    • First you need to get a “multi-line” scalar. If you are reading your text to be matched against from file, the easiest way to do that is to set the special variable $/ as necessary. This variable controls what is a “line”. If you set it to ‘’ (an empty string), the whole paragraph (text block separated by empty lines) are read. If you set it to undef, the whole file is read as one big scalar. You will most probably want to use local to localize your change of /$.

    • Secondly, you need to decide what do you actually mean by multi-line matching. Do you mean that the dot (.) should match also newlines? Or do you mean that ^ and $ should match at the “internal” new lines, not just at the string beginning and end?

    • For these two interpretations of “multi-line”, two different modifiers are available: /s and /m. They can be used simultaneously, as is usual with modifier.: /gims is perfectly fine.

    • /m Treat target string containing newline characters a multiple lines. In this case, the anchor ^ and $ are the start and end of a line: \A and \Z anchor to the start and end of string, respectively

    • /s Treat a target string containing newline characters as a single string: I.e. dot matches any character including newline.


    Matching multi line assertions
    Matching: Multi-Line Assertions

    • Because the /m changes definitions of ^ and $ but you might still need the old definitions, they are available via alternative assertions:

    • \A beginning of the string

    • \Z end of the string, or before newline at the end

    • \z end of the string

    • The first two are like the old ^ and $, the last one is like (\Z|\n)


    Matching split revisited
    Matching: split Revisited

    • Now we know enough of matching to review split

    • The first argument of split is a pattern, like the operand of the m operator. For example:

      • @f = split (/,\s*/, $s);

    • will split on a comma, followed by any amount of white space.

    • Normally the delimiters corresponding to the pattern are thrown away, but if you want to include the delimiters, surround the pattern with parentheses:

      • $s = “foo:bar; zap:foo”;

      • print join (“, “, split(/([:;])/, $s)), “\n”;

    • This will split on colons and semicolons, but it will also return those delimiters as list elements: foo, :, bar,;,zap,:,foo.


    Matching limits of regular expressions
    Matching: Limits of regular expressions

    • Though regular expressions are powerful, there are some (seemingly) simple tasks they cannot do.

    • Most importantly, they cannot match “balanced expressions”. For example ordinary mathematical expressions, which can have arbitrarily nested structures, cannot be matched using regular expressions

    • To do that kind of paring, more than just patterns is needed: the context (for example, how deep are we right now, and inside which structure) need to be known, and regular expressions do not give that.


    Substitution the s operator
    Substitution: The s operator

    • The substitution operator s operator has two quoted arguments: the pattern and the substitution. The pattern part is exactly as with the m operator, the substitution is a double-quotish string in which you can use sub-matches like $1 from the pattern side.

    • At the pattern side you can also use constructs like \1, \2, and so on, that refer to the same sub-matches as $1, $2. The \n refer to the on going match, while the $n would refer to the sub-matches of the previous match. Do not use the \n constructs outside the pattern side of s.

    • For example:

      • s/(gold|silver|platinum)/precious metal/;

    • This will replace in $_ the first occurrence of either gold, silver or platinum with precious metal.


    Substitution global substitution
    Substitution: Global Substitution

    • For the substitution operator the “global” modifier /g means to substitute all occurrences of the pattern with the substitution, not just the first one.

    • s/(gold|silver|platinum)/precious metal/g;

    • This will replace in $_ all occurrences of either gold, silver, or platinum with precious metal.

    • You will probably want to use \b here to avoid false substitutions like precious metalfish and quickprecious metals.


    Substitution evaluating substitution
    Substitution: Evaluating Substitution

    • The substitution side of the s operator is already in double-quotish context so all variables are extrapolated, but sometimes you need to evaluate a piece of code. For this you can use the /e modifier

      • s/(\w+)/reverse($1)/eg;

    • This will replace all “words” with their reverses

    • The /e even stacks: you can have arbitrarily many of them: /eeee


    Translation i
    Translation (I)

    • If you want to map characters, you can use the tr operator

    • Like the s operator, the tr has two quoted arguments, but unlike with the s, both of the arguments of the tr are single-quotish. They also are not regular expressions.

    • The first argument is called search list, and the second argument is the replacement list.

    • Both can have a range by simply putting a - between two characters.


    Translation ii
    Translation (II)

    • To complete the search list, add the /c modifier.

    • If the translated character is not found in the replacement list (because the replacement list is too short), and the /d modifier is used, the translated character is deleted from the result. If the /s modifier is used, duplicated translated characters squashed into one.

    • #A Caesar cipher, a.k.a. rot13, in $_

    • tr/A-MN-Za-mn-z/N-ZA-Mn-za-m/;

    • $a=~tr/ //s; #squash multiple spaces into one in $a.

    • tr/0-9/ /d;#Replace all non-digits with space in $_

    • You can count the number of times a character occurs by translating it ito itself, because the tr returns the number of translations made.

    • $bangs = ($text = ~/tr/!/!/);


    File i o section 6 of notes
    File I/O (Section 6 of Notes)

    To read from or write to a file, you have to open it.

    When you open a file, Perl asks the operating system if the file

    can be accessed - does the file exist if you're trying to read it

    (or can it be created if you're trying to create a new file),

    and do you have the necessary file permissions to do what you want?

    If you're allowed to use the file, the operating system will prepare

    it for you, and Perl will give you a filehandle.

    The following statement opens the file log.txt

    using the filehandle LOGFILE:

    open (LOGFILE, "log.txt");


    File i o section 6 of notes1
    File I/O (Section 6 of Notes)

    open (LOGFILE, "log.txt") \

    or die "I couldn't get at log.txt";

    $title = <LOGFILE>;

    print "Report Title: $title";

    for $line (<LOGFILE>) {

    print $line;

    }

    close LOGFILE;


    File i o section 6 of notes2
    File I/O (Section 6 of Notes)

    Writing files

    You also use open() when you are writing to a file.

    There are two ways to open a file for writing: overwrite and append.

    To indicate that you want a filehandle for writing,

    you put a single > character before the filename you want to use.

    open (OVERWRITE, ">overwrite.txt") \

    or die "$! error trying to overwrite";

    # The original contents are gone.

    This opens the file in overwrite mode.


    File i o section 6 of notes3
    File I/O (Section 6 of Notes)

    To open it in append mode, use two > characters.

    open (APPEND, ">>append.txt") \

    or die "$! error trying to append";

    # Original contents still there,

    #we're adding to the end of the file

    Once our filehandle is open, we can use the humble print statement

    to write to it. Specify the filehandle you want to write to and

    a list of values you want to write:

    print OVERWRITE "This is the new content.\n";

    print APPEND "We're adding to the end here.\n”;

    print APPEND "And here too.\n";


    ad