1 / 55

96-Summer 生物資訊程式設計實習 ( 二 )

96-Summer 生物資訊程式設計實習 ( 二 ). Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯. Schedule. Regular expression. File handle. File handle. Reserved file handle File manipulation File test operator File status Localtime. Reserved file handle. STDIN STDOUT STDERR DATA

melva
Download Presentation

96-Summer 生物資訊程式設計實習 ( 二 )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 96-Summer生物資訊程式設計實習(二) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

  2. Schedule

  3. Regular expression File handle

  4. File handle • Reserved file handle • File manipulation • File test operator • File status • Localtime

  5. Reserved file handle • STDIN • STDOUT • STDERR • DATA • ARGV • ARGVOUT

  6. File handle - open • Input • open SEQ, “seq.txt”; • open SEQ, “< seq.txt”; • Output • open SEQ, “> seq.txt”; • Appended output • open LOG, “>> log.txt”;

  7. File handle - close • Input/Output • close SEQ; • close LOG;

  8. File handle - die • Error handling • die “<your error message>”; • $! : system error message • Example #!/usr/bin/perl -w #log.pl : write the read-only file open LOG, ">> disorder.fa" or die "LOG ERROR:$!\n"; # write log close LOG;

  9. File handle - warn • Warning handling • warn “<your error message>”; • $! : system error message • Example • open LOG, “>> disorder.txt” orwarn “LOG ERROR:$!”;

  10. File copy #!/usr/bin/perl -w #copy1.pl : copy data from the input file into the output file open INPUT, "<disorder.fa" or die "disorder.fa can't be opened\n"; open OUTPUT, ">temp.fa" or die "temp.fa can't be created\n"; my $line; while ( $line = <INPUT> ) { chomp $line; print OUTPUT "$line\n"; } close INPUT; close OUTPUT;

  11. File test operators (1/3)

  12. File test operators (2/3)

  13. File test operators (3/3)

  14. File copy + #!/usr/bin/perl -w #copy2.pl : copy data from the input file into the output file if (not -e "disorder1.fa") { die "disorder1.fa isn't existed\n"; print "continue to open disorder1.fa\n"; } open INPUT, "<disorder1.fa" or die "disorder1.fa can't be opened\n"; if (-e "temp.fa") { warn "temp.fa is existed\n"; print "continue to write temp.fa\n"; } open OUTPUT, ">temp.fa" or die "temp.fa can't be created\n"; my $line; while ( $line = <INPUT> ) { chomp $line; print OUTPUT "$line\n"; } close OUTPUT; close INPUT;

  15. Exercise File handle

  16. File size • Get the size of a file • my $size = -s “disorder.fa”; • Check file size • if ( -s “disorder.fa” > 5*1024) { … } • if ($size=-s “disorder.fa” > 5*1024) { print “disorder.fa has $size bytes\n”;} • What’s the value of $size ? Why ?

  17. Exercise – linenumber.pl • Input (disorder.fa) >GCN4_YEAST (P03069) General control protein GCN4 - Saccharomyces cerevisiae (Baker's yeast). MSEYQPSLFALNPMGFSPLDGSKSTNENVSASTSTAKPMVGQLIFDKFIKTEEDPIIKQD TPSNLDFDFALPQTATAPDAKTVLPIPELDDAVVESFFSSSTDSTPMFEYENLEDNSKEW ... EHAYSRARTKNNYGSTIEGLLDLPDDDAPEEAGLAAPRLSFLPAGHTRRLSTAPPTDVSL GDELHLDGEDVAMAHADALDDFDLDMLGDGDSPGPGFTPHDSAPYGALDMADFEFEQMFT DALGIDEYGG • Output 1 >GCN4_YEAST (P03069) General control protein GCN4 - Saccharomyces cerevisiae (Baker's yeast). 2 MSEYQPSLFALNPMGFSPLDGSKSTNENVSASTSTAKPMVGQLIFDKFIKTEEDPIIKQD 3 TPSNLDFDFALPQTATAPDAKTVLPIPELDDAVVESFFSSSTDSTPMFEYENLEDNSKEW ... 128 EHAYSRARTKNNYGSTIEGLLDLPDDDAPEEAGLAAPRLSFLPAGHTRRLSTAPPTDVSL 129 GDELHLDGEDVAMAHADALDDFDLDMLGDGDSPGPGFTPHDSAPYGALDMADFEFEQMFT 130 DALGIDEYGG

  18. Regular expression File status, localtime

  19. File information - stat

  20. File status #!/usr/bin/perl -w #stat.pl : show the information of the file my $fn = shift @ARGV; die "please enter a filename\n" if(not defined($fn)); die "$fn isn't existed\n" if(not -e $fn); my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, $atime,$mtime,$ctime,$blksize,$blocks) = stat($fn); print "device = $dev\n"; print "inode = $ino\n"; print "mode = $mode\n"; print "node link = $nlink\n"; print "user id = $uid\n"; print "group id = $gid\n"; print "rdev = $rdev\n"; print "size = $size\n"; print "atime = $atime\n"; print "mtime = $mtime\n"; print "ctime = $ctime\n"; print "block size = $blksize\n"; print "blocks = $blocks\n";

  21. Local time #!/usr/bin/perl -w #localtime1.pl : show the readable time of the file my $fn = shift @ARGV; die "please enter a filename\n" if (not defined($fn)); die "$fn isn't existed\n" if (not -e $fn); my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, $atime,$mtime,$ctime,$blksize,$blocks) = stat($fn); my $alocal = localtime $atime; my $mlocal = localtime $mtime; my $clocal = localtime $ctime; print "atime = $alocal\n"; print "mtime = $mlocal\n"; print "ctime = $clocal\n";

  22. Local time + #!/usr/bin/perl -w #localtime2.pl : show the user-defined time of the file my $fn = shift @ARGV; die "please enter a filename\n" if (not defined($fn)); die "$fn isn't existed\n" if (not -e $fn); my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, $atime,$mtime,$ctime,$blksize,$blocks) = stat($fn); my ($sec,$min,$hour,$day,$mon,$year,$wday,$yday,$isdst) = localtime $mtime; print "mtime = ($year/$mon/$day $hour:$min:$sec ($wday;$yday;$isdst)\n";

  23. Local time • $sec : 0~59 • $min : 0~59 • $hour : 0~23 • $day : 1~31 • $mon : 0~11 • $year : +1900 • $wday : 0 (Sunday) ~ 6 (Saturday) • $yday : 0 (Jan 1) ~354 or 355 • $isdst: daylight saving time (positive or zero)

  24. Exercise localtime

  25. Quiz – localtime my ($sec,$min,$hour,$day,$mon,$year,$wday, $yday,$isdst) = localtime $mtime; print "mtime = ($year/$mon/$day $hour:$min:$sec ($wday;$yday;$isdst)\n"; mtime = (107/7/2 10:10:16 (4;213;0) my $mlocal = localtime $mtime; print "mtime = $mlocal\n"; mtime = Thu Aug 2 10:10:16 2007 my ($mlocal) = localtime $mtime; ?

  26. Exercise • How to show the time information of disorder.fa like “ 2007/8/2 10:10:16 (Thu) “ ? • Hint: year, month and weekday • @weekDays = qw(Sun Mon Tue Wed Thu Fri Sat Sun); • How to show the time information of disorder.fa like “Aug 2 2007 10:10:16 (Thu)“ ? • @months = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);

  27. Regular expression Basic

  28. How to search a word in a text file ? • Unix command • grep • Perl • Regular expression

  29. An example of Regular expression #!/usr/bin/perl -w #google1.pl : check string with/without a certain pattern while (1) { print "Please enter your query:"; $line = <>; if ($line =~ /google/) { print "Found!!!\n"; } else { print "No match\n"; } }

  30. If we want to find the following words • google, g01gle, g12gle, gabgle, …, gxxgle • ggle, gogle, google, gooogle, …, go…ogle • gogle, google, gooogle, …, go…ogle • google, goooogle, goooooogle, …, goo…oogle • ggle, gogle, google, gooogle, …, go…ogle, gagle, gaagle, gaaagle, gbgle, gbbgle, …

  31. Meta-character • Wildcard (.) • Except for “\n” • Quantifier • ? : one character or none • * : one character ~  or none • + : one character ~ 

  32. If we want to find the following words • google, g01gle, g12gle, gabgle, …, gxxgle • /g..gle/ • ggle, gogle, google, gooogle, …, go…ogle • /go*gle/ • gogle, google, gooogle, …, go…ogle • /go+gle/ • google, goooogle, goooooogle, …, goo…oogle • /g(oo)+gle/ • ggle, gogle, google, gooogle, …, go…ogle, gagle, gaagle, gaaagle, gbgle, gbbgle, … • /g.*gle/

  33. Character class • [ ] • - • ^ • Examples • [abcdefghijklmnopqrstuvwxyz] or [a-z] • [0123456789] or [0-9] • [abcxyz] • [02468] or [^13579] • [A-Za-z0-9]

  34. Character class simplicity • [\d] : [0-9] • [\w] : [A-Za-z0-9_] • [\s] : [\f\t\n\r ] • Something you don’t want • [\D] : [^\d] • [\W] : [^\w] • [\S] : [^\s] • How about [\s\S] ? • What’s different between . and [\s\S] ?

  35. Please think … • /google/ • /g[\d][\d]gle/ • /g..gle/ • /g[\w]*gle/ • /g.*gle/ • /g[\d\D]*gle/ • /g……….gle/

  36. Additional quantifiers • | • { n, m } • Examples • /(google|Google)/ or /(G|g)oogle/ • /g……….gle/ or /go{10}gle/ • /go{0,100}gle/ • /g(oo)+gle/ or /g(oo){1,}gle/

  37. Additional quantifiers • ^ : beginning of the string • $ : end of the string • \b : boundary of a word • \B : [^\b] • Examples • /^google$/ • /\bgoogle\b/

  38. Additional quantifiers • ( ) • \1, \2, … : backreference • Examples • /g(o)\1gle/ • /g([\S])\1gle/ • Output (matched variable) • $1, $2, …

  39. Exercise Basic regular expression

  40. Exercise • How to extract these words ? • gogle, gooogle, gooooogle, gooooooogle (No ggogles) • g11gle, g33gle, g55gle, g77gle, g99gle (excluding gg99gles) • What do those mean ? • /g[\d]+gle/ • /go?gle/ • /g([\w])([\w])\2\1gle/

  41. Magic variable - $_ • Magic while (<>) { chomp; if (/google/) { print “$_\n”; } } • Original while ($line = <>) { chomp($line); if ($line =~ /google/) { print “$line\n”; } }

  42. Magic variable - $_ #!/usr/bin/perl -w #google2.pl : check string with/without a certain pattern print "Please enter your query:"; while (<>) { chomp; if (/google/) { print "Found!!!\n"; } else { print "No match\n"; } print "Please enter your query:"; }

  43. Regular expression Flags

  44. Regular Expression • String matching • m// or // • String substitution • s/// • String transliteration • tr/// or y///

  45. Matching • Complete syntax • m// • Examples • m/google/ • m/g(oo){0,}gle/ • Others • m<google>, m[google], m!google!, …

  46. Flag options • /i : case insensitivity • /s : let . become [\d\D] • /m : multiple lines • Examples • google, Google, GOOGLE, gOOGLE, GooGle, … • m/google/i

  47. Matched patterns • $& : the last matched patterns • $` : prefix-string of $& • $’ : suffix-string of $& • Examples $string = "Microsoft google Yahoo"; $string =~ m/google/i; print “[$`][$&][$‘]\n"; [Microsoft ][google][ Yahoo]

  48. Matched pattern - $&, $`, $’ #!/usr/bin/perl -w #google3.pl : check string with/without a certain pattern print "Please enter your query:"; while (<>) { chomp; if (m/google/i) { print "Match:[$&]\n"; print "prefix : [$`]\n"; print "suffix : [$']\n"; } else { print "No match\n"; } print "Please enter your query:"; }

  49. Substitution • Complete syntax • s/// or s### • Examples • $string =~ s/google/GOOGLE/ • s/(google|GOOGLE)/Microsoft/ • Others • s#^https://#http://#;

  50. Flag options • /i : case insensitivity • /s : let . become [\d\D] • /g : multiple replacement • Examples • s/google/yahoo/sg • s/\s+/ /g • s/^\s+// • s/\s+$// • s#^.*/##s

More Related