1 / 43

Managing complexity (Advanced Perl)

Managing complexity (Advanced Perl). Using perl for specific tasks with help from Bioperl and others. Login. Username: bioinfouser Password: loginbioinfo. Funny?. Goals. I already assume you know perl basics -- some more advanced features Learn how to write OO code More flexible modules

paul2
Download Presentation

Managing complexity (Advanced Perl)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Managing complexity(Advanced Perl) Using perl for specific tasks with help from Bioperl and others

  2. Login • Username: bioinfouser • Password: loginbioinfo

  3. Funny?

  4. Goals • I already assume you know perl basics -- some more advanced features • Learn how to write OO code • More flexible modules • Understand other modules • Some API’s that you may need. • Bioperl • PerlDBI

  5. What I assume you already know • Scalars • Arrays • Hashes • Control structures (if-then, for, foreach, while, etc.) • File IO

  6. Managing complexity • By managing complexity • Make hard tasks easy(er) • Perl itself does this • Regular expressions, text manipulations • Extensions (modules) do this • May come at the expense of execution speed • You may not care • Consider the big picture • Development time • Errors • Extremely custom software • Some things need speed

  7. How complex is it now? • Perl is a very compact language in terms of human languages • Perl is large compared with other languages • TMTOWTDI • Perl has approximately 233 reserved words • Java has approximately 47 reserved words • Both are easy to learn harder to use effectively

  8. General practices • Always use #!/usr/bin/perl –w or use warnings; • Consider use strict; for scripts longer than 10 lines • You can’t have too many comments • # • =head • =cut • perldoc

  9. Getting values into the program or subroutine. • Perl is pass by value • A scalar can have as a value a “pointer” to an array, hash, function etc. • The args to a program or function arrive in a special variable called @_ • my $first_value = shift @_; • my $first_value = $_[1]; • my $first_value = shift;

  10. References my @array = (“one”, “two”, “three”, “four”); function_call(@array); function_call(\@array); function_call([“one”,”two”,”three”]); sub function_call{ my $passed = shift @_; print $passed; } Output one ARRAY(0x80601a0) ARRAY(0x804c9a0)

  11. Debugging complex data structures. • Print the reference • It will tell you a little bit of information • Use the Dumper module. • This will give you a snapshot of the whole data structure

  12. Some more advanced features

  13. Not Perl specific Very useful What they do: String comparisons String substitutions Substring selection Regular expressions

  14. Regex Could put ‘m’ $string =~ /find/ $string =~ /find$/ $string =~ /^find/ $string =~ /^find$/ . Match any character \w Match "word" character (alphanumeric plus "_") \W Match non-word character \s Match whitespace character \S Match non-whitespace character \d Match digit character \D Match non-digit character \t Match tab \n Match newline \r Match return

  15. Repetition $string =~ /(ti){2}/ $string =~ /A*T+G?C{3}A{3,}T{4,6}/ Character Classes $string =~ /[ATGCN]/ $string =~ /[^ATGCNatgcn]/i

  16. Selection/Replacement $string =~ /(A{3,8})/; print $1; $string =~ s/a/A/ $string =~ tr/[atgc]/[ATGC]/

  17. Additional syntax $string =~ /AT*?AT/ $string =~ m#/var/log/messages# $_ = “ATATATAGTGTGCGTGATATGGG”; ($one,$two,$three) =~ /AT..AT/g;

  18. What is a module • Two types • Object-oriented type • Provides something similar to a class definition • Remote function call • Provides a method to import subroutines or variables for the main program to use

  19. Howto: Making a module Create a file called workSaver.pm ########### package workSaver; sub doStuff { print “Stuff done\n”; } 1; #statement that evaluates to true ########### Now you can use with “use workSaver;”* *Some restrictions apply

  20. Howto:Making a module cont. • This method would work very well for subroutines that are used in several programs. • Reduces the “clutter” in your program • Provides one maintenance point instead of unknown number. • Eases bug fixes • Careful of boundaries

  21. More Complete method: • Allows you to “pollute” the namespace of the original program selectively. • Makes the use of functions and variables easier • Still used about the same way as the simple method but things are clearer

  22. More Complete package functional; use strict; use Exporter; our @ISA = ("Exporter"); our @EXPORT = qw (); our @EXPORT_OK = qw ($variable1 $variable2 printout); our $VERSION = 2.0; our $variable1 = "var1"; our $variable2 = "var2"; my $variable3 = "var3"; sub printout { my $passed_variable = shift; print "Your variable is $passed_variable mine are $variable1 , $variable2, $variable3 \n"; } 1;

  23. CPAN • Wouldn’t it be nice to have a place where: • You could find a bunch of perl modules • It would be brows able • Searchable • Big pipe for people to download stuff • Other people would be encouraged to submit fixes and updates • And it was all free

  24. Sources of modules/Information • www.CPAN.org • www.bioperl.org • www.perl.com • www.cetus-links.org/oo_infos.html

  25. Bioperl • Set of modules that are extremely useful for working with biological data. Actively maintained. • www.bioperl.org is a very good place to get the basics of bioperl • We will go through an example to see a typical use

  26. Bioperl has several basic types of objects: • Seq: a sequence the most common type Bio::Seq • Location objects: where it is how long it is etc. • Interface objects: Bio::xyzI No implementation mostly a documentation

  27. Bioperl documentation • Several different ways to find out about a module • perldoc Bio::Seq • bioperl.org/usr/lib/perl5/site_perl/5.8.0/bptutorial.pl 100 Bio::Seq • Data::Dumper to print the data structure • Print the variable

  28. Bio perl demo

  29. Why use a database • Transaction control - only one user can modify the data at any one time. • Access control - some people can modify data, some can read data, others can create data-structures. • Fast handling of lots of data • Precise definition of data (mostly). • Easy to share data resources with others

  30. Many choices • There are many types: MS Access, Excel(sortof), sybase, oracle, postgres, msql, mysql … • They each have their niche and function best in certain cases, there is also considerable overlap. • SQL – structured query language is a common thread

  31. MySQL is better than YourSQL • Free on Unix • Good developer support • Constant bug fixes and feature addition • Good scalability to medium size and load, OK performance. • Easy to install. • Used at Ensemble and UCSC genome browsers, so a lot of information is readily available in that format.

  32. Table Structure - Schema Gene table Gene_ID Name Gene: ATP7B Aliases: Wilson disease-associated protein Copper-transporting ATPase 2 References: Enzyme Commission: 3.6.3.4 UniGene: Hs.84999 AffyProbeU133: 204624_at AffyProbeU95: 37930_at RefSeq: NM_000053 GenBank: AF034838 GenBank: U11700 LocusLink: 540 Alias table Alias_ID Gene_ID Alias Reference table Reference_ID Gene_ID Reference DataSource

  33. SQL (MySQL dialect) • SELECT col_name FROM table WHERE col_name = value; • SELECT COUNT(*) FROM table WHERE col_name is like ‘%value%’; • SELECT count(distinct(col_name)) FROM table where col_name is not null; • CREATE, UPDATE, DELETE, INSERT have similar forms

  34. SQL cont. • USE database_name • Also can be specified on the command line –D • SHOW TABLES – lists all the tables in that database (also SHOW DATABASES). • DESCRIBE table_name – lists the columns and datatypes for each column or SHOW COLUMNS FROM table_name

  35. More advanced SELECTS SELECT (column_list) FROM (table_list) WHERE (constraints) GROUP_BY (grouping columns) ORDER_BY (sorting columns) LIMIT (limit number); SELECT col_name from (table1, table2) where table1_val = table2_val and table1_val2 > value; • Example of a equi-join

  36. Getting the names right • If you only have one table you only need to use the column name • When you are using joins this may not be adequate. • If two tables have the column primary you would need to call the column table1.primary or table2.primary

  37. Data Types • INT • Tinyint –128 to 127 • Smallint –32768 to 32767 • Mediumint –8388608 to 8388607 • Int –2147683648 to 2147483647 • Bigint –9223372036854775808 to 9223372036854775807 • FLOAT • Float 4 bytes • Double 8 bytes

  38. CHAR • Char(n) character string of n n bytes • Varchar(n) character string up to n long L+1 bytes • Text upto 2^16 bytes • BLOBs Binary Large OBjects

  39. Perl DBI • Method for perl to connect to a database (virtually any database) and read or modify data. • The statements are constructed very similar to SQL statements that would be entered on the command line so learning SQL is still necessary

  40. Statements in DBI • Connect • Used to establish initial connection • Prepare • Prepare a statement to execute • Execute • Execute the statement • Do • prepare a statement that does not return results and execute it

  41. Fetch • Several types used to get returned data • Disconnect • Disconnect from the server

  42. Types of fetch • “fetchrow_array” • Used to fetch an array of scalars each time • Can also use “fetchrow_arrayref” • “fetchrow_hash” • Used to fetch a hash indexed by column name. • Slower but cleaner code. • Can also use “fetchrow_hashref”.

  43. More advanced statements • Quote • Used to properly quote data for use with a prepare statement • “$value = $dbh->quote($blast_result);” • Placeholders • Speeds up execution, optional • my $prep = $dbh->prepare (“select x from y where z = ?”); • loop_start • $prep->bind_param(1,$z); • $prep->execute(); • loop_end

More Related