1 / 42

Introduction to Perl

Introduction to Perl. Pawel Sirotkin 28.11-01.12.2008, Riga. Overview. About programming Why Perl? How to write, how to run Variables Operations Basic input and output Conditionals and loops Regular expressions. About programming. Working with algorithms

rclayton
Download Presentation

Introduction to Perl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga

  2. Overview • About programming • Why Perl? • How to write, how to run • Variables • Operations • Basic input and output • Conditionals and loops • Regular expressions Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  3. About programming • Working with algorithms • Program needs to contain exact commands • (Mostly) not: Go buy some bread • But: Put on your coat and shoes, open the door, go through it, close the door, go down the stairs… • Has a certain input • Processes it • Produces a certain output Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  4. Why Perl? • Easy to learn • Simple syntax • Good at manipulating text • Good at dealing with regular expressions Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  5. How to write a Perl program • Perl programs can be written in any text editor • Notepad, vim, even Word… • Recommended: A simple text editor with syntax highlighting • Write the program code • Save the file as xxx.pl • .pl extension not necessary, but useful Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  6. What is a Perl program like? # This *very* simple program prints "Hello World!“ print "Hello World!"; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  7. What is a Perl program like? • The content of a line after the # is commentary. It is ignored by the program • What are commentaries for, then? • They are for you, and others who will have to read the code • Imaging looking at a complex program in a few months and trying to figure out what it does • Write as much commentary as you can # This *very* simple program prints "Hello World!“ print "Hello World!"; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  8. What is a Perl program like? • This is a Perl command • In this case, for printing text on the screen • Every command should start at a new line • Not a Perl requirement, but crucial for readability • Every command should end with a semicolon; • Many commands take arguments • Here: “Hello World!” # This *very* simple program prints "Hello World!“ print "Hello World!"; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  9. What to do with the program? • Perl works from the command line • Windows: „Start“  „Run…“ • Go to the directory where you saved the program • E.g.: cd C:\Perl\MyPrograms • Run the program: • perl myprogram.pl • See the results of your labours! Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  10. Exercise (1) • Create a folder for your Perl programs • Open the editor of your choice and write the „Hello World“ program • The command is print „Hello World!“; • Don‘t forget the commentary! • Save the program • Run it! • What happens if you misprint the print command? Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  11. Variables • The „Hello World“ program always has the same output • Not a very useful program, as such • We need to be able to change the output • Variables are objects that can hold different values Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  12. Defining variables # We define a variable „a“ and assign it a value of „42“ $a = 42; • To define a variable, write a dollar sign followed by the variable’s name • Names should consist of letters, numbers and the underscore • They should start with a letter • Variable names are case-sensitive! • $a and $A are different variables! • Generally, a variable’s name should tell you what the variable does Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  13. Defining variables # We define a variable „a“ and assign it a value of „42“ $a = 42; • Variables can be assigned values • String: text (character sequence) in quotes/double quotes • Numbers • $a = 42; • $a = “some text”; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  14. Changing variables • Arithmetic operations • $a = 42 / 2; # division • $a = 42 + 5; # addition • $a = $b * 2; # multiplication • $a = $a - $b; # subtraction • Also useful: • $a += 42; # the same as $a = $a + 42; • The same for +, -, / • String operations • $a = “some“ . “ text“; # concatenation • $a = $a . “ more text“; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  15. Basic output • We have already seen an output command • print “text“; • print $a; • print “text $a“; • print “text “ . $a+$b . “ more text.“; • Special characters: • \n – new line • \t – tabulator Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  16. Exercise (2) • Define a variable • Assign it a value of 15 • Print it • Double the value • Print it again • Define another variable with the string „apples“ • Print both variables • Change the first variable to its square and the second to „pears“ • Print both variables Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  17. Basic input • The <> operator returns input from the standard source (usually, the keyboard) • Syntax: • $a = <>; • Don’t forget to tell the user what he’s supposed to enter! • Try the following program: # This program asks the user for his name and greets him print "What is your name? "; $name = <>; print "Hello $name!"; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  18. Input, output and new lines • As the user input is followed by the [Enter] key, the string in $nameends in a new line • The chomp function deletes the new line at the end of a string • Try the following, modified program: # This program asks the user for his name and greets him print "What is your name? "; $name = <>; chomp($name); print "Hello $name!"; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  19. Exercise (3) • Let the user enter the radius of a circle • Tell him the diameter (2r), circumference (2πr) and area (πr²) of the circle • Try doing this using one variable for each measure • Try doing this using only one variable Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  20. If, else • Until now, the course the program runs is fixed • The if clause allows us to take different actions in different circumstances # Let‘s try out a conditional clause print "Please enter password: "; $password = <>; if ($password == 42) { print "Correct password! Welcome."; } else { print "Wrong password! Access denied."; } Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  21. If, else • Note: = is the assignment operator, == is the comparison operator • Else is an optional operator triggering if the if condition fails # Let‘s try out a conditional clause print "Please enter password: "; $password = <>; if ($password == 42) { print "Correct password! Welcome."; } else { print "Wrong password! Access denied."; } Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  22. Exercise (4) • Try out the password program. • Why doesn‘t it work correctly? Fix it. • Tell the user if the number he entered is too large or too small • Hint: The comparison operators you’ll need are < and > • Ask the user for a geometrical form (circle or square), and then for a radius or side length. Return the area and perimeter. Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  23. While • What if we want to do checks until something happens? • The while loop repeats commands until its criteria are met • Note: in the example below, $password has no value, so it specifically doesn’t have the value 42 # Now on to a "while" loop while ($password != 42) { print "Access denied.\n"; print "Please enter password: "; $password = <>; chomp($password); } print "Correct password! Welcome."; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  24. Exercise (5) • Write a small game: take a number, and make the user guess it. Tell him if it‘s too high or too low. If the user gets it right, the program terminates. • If you like, you can take a random number: $random = int (rand(10) ); Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  25. Perl regular expressions • Regular expressions very useful for text processing • Perl matching character: =~ • Perl non-matchingcharacter: !~ • The regular expression must be in backslashes: /regex/ • The program below accepts any password that contains the characters „42“ anywhere # A "while" loop with regular expressions while ($password !~/42/) { # While the entered line doesn’t contain “42” print "Access denied.\n"; print "Please enter password: "; $password = <>; chomp($password); } print "Correct password! Welcome."; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  26. Perl regular expressions • Simple string: some text • One of a number of symbols: [aA] • Matches a or A • Also possible: [tT]he, matching the or The • One of a continuous string of symbols: [a-h][1-8] • Matches any two-character string from a1 to h8 • Special characters • ^ matches the beginning of a line • $matches the end of a line Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  27. Perl regular expressions • More special characters • Wildcard: the dot . Matches any single character • b.d matches bad, bed, bid, bud… • Don‘t forget: it also matches forbid, badly… • +matches one or more of the previous character • re+d matches red and reed (and also reeedand so on!) • * matches zero or more occurrences of the previous character • bel*matches be, beland bell (and belll…) • ? matches zero or one occurrences of the previous character • soo?n Matches son or soon Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  28. Perl regular expressions • Character classes • \d: digits • Rule \d+matches Rule 1, Rule 2, ..., Rule 334... • \w: “word characters” – letters, digits, _ • \w \w – any two “words” separated by a blank • \s: any whitespace (blanks, tabs) • ^\s+\d– any line where the first character is a digit • Capitalize the symbols to get the opposite • \S is anything but whitespace, \D are non-digits… Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  29. Exercise (6) • Write a program which asks the user for his e-mail address. • Check if the address is syntactically correct. • Possible rules: • Must contain an @ character • At least one symbol before it • Must contain a dot • At least two symbols between @ and . • At least two symbols after . • No fancy symbols like {§* • Do you accept addresses with more than one dot? Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  30. Perl regular expressions • Switches • Tell Perl how to deal with the regular expression • /regex/i: ignore lower/upper case • /wiebke/imatches Wiebke and wiebke • s/regex/regex2/: substitute regex with regex2 • $text =~ s/Mark/Euro/ • /regex/g: repeat match until end of the line # What the //g switch does $text = “The meat costs 10 Mark, the fish costs 15 Mark.”; $text2 = $text1; $text =~ s/Mark/Euro/; # “The meat costs 10 Euro, the fish costs 15 Mark.” $text2 =~s/Mark/Euro/g; # “The meat costs 10 Euro, the fish costs 15 Euro.” Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  31. Perl regular expressions • Grouping • Allows us to use matched string • /(text)/ matches text and stores it in a variable • The first group is stored in $1, the second in $2... # Substitution and grouping $sum = 0; # initializing the variable with zero $text = “The meat costs 10 Mark, the fish costs 15 Mark.” while ($text =~ s/(\d+) Mark/$1 Euro/) { # numbers-spaces-”Mark” $sum = $sum + $1; # adding amount to $sum value } print “Substituted $sum Mark for Euro!”; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  32. Reading files • What if we want to have input from a file, not from the user? • Open file for reading: • open(INPUT, "<file.ext"); • Read a line: • $line = <SOURCE>; • $line = <>; # is just a special case Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  33. Writing files • What if we want to print to a file, not to the screen? • Open file for writing: • open(OUTPUT, “>file.ext"); • Write: • print OUTPUT “Some text...”; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  34. Reading files • A program for testing e-mail addresses • Note: If we want to use a special character literally, we need to escape it with a backslash • In strings : " • In regular expressions: . + * ^ $ and the backslash \ itself open(INPUT, "<test.txt"); while ($line = <INPUT>) { chomp($line); if ($line =~ /^.+@..+\...+$/) { # testing for e-mail: x@xx.xx print "\"$line\" is a valid e-mail address.\n"; } else { print "E-mail address \" $line\" not valid.\n"; } } Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  35. Exercise (7) • Make a text file and fill it with a Wikipedia article • Count the number of definite and indefinite articles • Count the number of numbers and digits • Insert a <number!> tag before every number Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  36. Arrays • Arrays contain lists of variables • Syntax: • @days = [“Monday“, “Tuesday“, “Friday“]; • $days[0] = “Saturday“; • $day = $days[2]; • Useful for storing linear sequences of variables • Note: @ for whole lists, $ for single variables Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  37. Arrays • Useful array commands • push(@array, “element“); • Adds a new element to the end of the array • Creates the array if necessary • $element = pop(@array); • Moves the last value of @array to $element # Trying out arrays @tags = (“N”, “V”, “Adj”); $tag1 = pop(@tags); # $tag1 is now “Adj”, @tags is (“N”, “V”) $tag2 = pop(@tags); # $tag2 is now “V”, @tags is (“N”) Push(@tags, „V“, $tag2); # @tags is now again (“N”, “V”, “Adj”) Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  38. Hashes • Hashes are associative arrays • They are lists where the elements are not ordered, but identified by a „name“ • Syntax: • %probability = (”verb“, 0.32, “adjective“, 0.02, “adverb“, 0); • $probability{“noun”} = 0.52; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  39. Exercise (7) • What happens if you try to print an array? • What about a hash? • What happens if you convert an array into a hash, or the other way round? Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  40. Practical: Tokenizer • Take a Wikipedia article and put it into a text file • Clean it up if necessary • Tokenize it! • We only want one word per line • Insert a „sentence boundary“ symbol where appropriate • The output should be another file • Think about what choices you make and why! Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  41. Practical: Tagger • Take the POS-annotated corpus from treebank.txt • Clean and tokenize it • Count the tag-token probabilities • Count the transition probabilities • For the first time, I strongly recommend bigrams • Apply the Viterbi algorithm and tag an input file of your choice! Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

  42. Practical: Tagger++ • If it‘s still too easy, or if you want a long-term aim: • Implement smoothing: words can have tags you haven‘t seen them with, or appear in contexts you never saw them before • Try to figure out a way to guess the tags for unknown words better • Write a program to train on 9/10 of the corpus, and test it on the rest. • Compare your results to the actual annotations • Do this 10 times for every 9/10 • Still too easy? Implement trigrams and compare the results. Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin

More Related