150 likes | 281 Views
This guide explores the use of regular expressions (regex) in Perl, focusing on string matching and substitution techniques. It delves into matching patterns such as ".exe" files and demonstrates the use of operators for checking string patterns. The document includes practical examples, such as writing regex for Canadian postal codes, parsing log files, and managing file outputs. Additional exercises challenge users to create complex regex patterns, enhancing their understanding of string manipulation in programming. Ideal for developers looking to refine their Perl scripting skills.
E N D
Regular Expressions CISC/QCSE 810
Recognizing Matching Strings • ls *.exe • translates to "any set of characters, followed by the exact string ".exe" • The "*.exe" is a regular expression • ls gets a list of all files, and then only returns those that match the expression "*.exe"
In Perl • In Perl, can see if strings match using the =~ operator $s = "Cat In the Hat"; if ($s =~ /Cat/) { print "Matches Cat"; } if ($s =~ /Chat/) { print "Matches Chat"; }
Exercise 1 • Write a regexp that matches only on Canadian postal codes
Exercise 2 • Write a regexp that matches typical intermediate files (.o, .dvi, .tmp) • helpful if you want a systematic way to delete them
String Substitution • Found an input file (*.dat), looking for a matching output file (<same>.out) @input_files = <*.dat> foreach $input_file (@input_files) { # Copy to output name $output_file = $input_file; # replace .dat with .out $output_file =~ s/.dat/.out/; if (! -f $output_file) { print "Need to create output for $output_file\n"; } }
Translating • $s = "Alternate Ending"; • $s =~ tr/[a-z]/[A-Z]; • Can also use 'uc' and 'lc' (more generic for non-English languages)
Grabbing Substrings • Get root URL $url = "http://www.mast.queensu.ca/~math224/Slides/Week_09/driven_spring2.m"; $url =~ /(www[\w.]*)/; $short_url = $1; print "Full URL: $url\n"; print "Site URL: $short_url\n";
End options • s/a/A/g – global; swap all matches • changes "aaaba" to "AAAbA" • Compare with s/a/A/ • changes "aaaba" to "Aaaba" • /tmp/i - case insensitive • recognizes "tmp", "Tmp", "tMP", "TMP"…
Exercise • Write a regexp line that returns all the integers in the text • Can it be extended to handle floating point values?
Functions with Regex • split • split /\s+/, $line; • split /,/, $line; • split /\t/, $line • split //, $line; • grep • @v = qw( aaa bba bbc); • @matches = grep /bb/, @v;
Longer example – Log files • Parsing log files 195.5.23.103 - - [25/Mar/2003:02:22:11 -0800] "GET /gcs/new.gif HTTP/1.1" 200 926 195.5.23.103 - - [25/Mar/2003:02:22:11 -0800] "GET /gcs/update.gif HTTP/1.1" 200 971 proxy.skynet.be - - [25/Mar/2003:02:40:54 -0800] "GET /gcs/gc1hint.html HTTP/1.1" 200 16358 j3194.inktomisearch.com - - [25/Mar/2003:03:13:12 -0800] "GET /~gcs/K-12.html HTTP/1.0" 200 3235 kittyhawk.hhmi.org - - [25/Mar/2003:03:17:20 -0800] "HEAD /gcs/ HTTP/1.0" 200 0 j3104.inktomisearch.com - - [25/Mar/2003:03:54:43 -0800] "GET /gcs/pa.html HTTP/1.0" 200 5614 crawl11-public.alexa.com - - [25/Mar/2003:04:51:41 -0800] "GET /gcs/clinical.html HTTP/1.0" 200 20132 … livebot-65-55-208-64.search.live.com - - [24/Jul/2007:22:16:58 -0700] "GET /gcs/webstats/usage_200602.html HTTP/1.0" 200 128720 203.129.234.42 - - [24/Jul/2007:22:22:39 -0700] "GET /gcs/status/statuscheck.html HTTP/1.1" 200 1522624 livebot-65-55-208-65.search.live.com - - [24/Jul/2007:22:47:32 -0700] "GET /gcs/webstats/usage_200610.html HTTP/1.0" 200 132580 …
Alternate uses • If you write your own program, with many print statements, can • make print statements meaningful • "Time spent on loading: 23.5s" • can parse afterwards to process/store values • $line = m/: ([\d.])+s/; • $time = $1;
Resources • Any web search for "perl regular expression tutorial" • Perl reg exp by example • http://www.somacon.com/p127.php • Reference card • http://www.erudil.com/preqr.pdf • Perl site reference • http://perldoc.perl.org/perlre.html