1 / 116

BioPerl

BioPerl. An Introduction to Perl – by Seung-Yeop Lee XS extension – by Sen Zhang BioPerl Introduction– by Hairong Zhao BioPerl Script Examples – by Tiequan Zhang. Part I. An Introduction to Perl. by Seung-Yeop Lee. What is Perl?.

rune
Download Presentation

BioPerl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BioPerl • An Introduction to Perl – by Seung-Yeop Lee • XS extension – by Sen Zhang • BioPerl Introduction– by Hairong Zhao • BioPerl Script Examples – by Tiequan Zhang

  2. Part I. An Introduction to Perl by Seung-Yeop Lee

  3. What is Perl? • Perl is an interpreted programming language that resembles both a real programming language and a shell. • A Language for easily manipulating text, files, and processes • Provides more concise and readable way to do jobs formerly accomplished using C or shells. • Perl stands for Practical Extraction and Report Language. • Author: Larry Wall (1986)

  4. Why use Perl? • Easy to use • Basic syntax is C-like • Type-”friendly” (no need for explicit casting) • Lazy memory management • A small amount of code goes a long way • Fast • Perl has numerous built-in optimization features which makes it run faster than other scripting language. • Portability • One script version runs everywhere (unmodified).

  5. Why use Perl? • Efficiency • For programs that perform the same task (C and Perl), even a skilled C programmer would have to work harder to write code that: • Runs as fast as Perl code • Is represented by fewer lines of code • Correctness • Perl fully parses and pre-”compiles” script before execution. • Efficiently eliminates the potential for runtime SYNTAX errors. • Free to use • Comes with source code

  6. interpreter path ‘#’ denotes a line commment Newline character Delimits a string Terminator character Function which outputs arguments. Hello, world! #!/usr/local/bin/perl # print “Hello, world \n”;

  7. Basic Program Flow • No “main” function • Statements executed from start to end of file. • Execution continues until • End of file is reached. • exit(int) is called. • Fatal error occurs.

  8. Variables • Data of any type may be stored within three basic types of variables: • Scalar • List • Associative array (hash table) • Variables are always preceded by a “dereferencing symbol”. • $ - Scalar variables • @ - List variables • % - Associative array variables

  9. Variables • Notice that we did NOT have to • Declare the variable before using it • Define the variable’s data type • Allocate memory for new data values

  10. Scalar variables • References to variables always being with “$” in both assignments and accesses: • For scalars: • $x = 1; • $x = “Hello World!”; • $x = $y; • For scalar arrays: • $a[1] = 0; • $a[1] = $b[1];

  11. List variables • Lists are prefaced by an “@” symbol: @count = (1, 2, 3, 4, 5); @count = (“apple”, “bat”, “cat”); @count2 = @count; • A list is simply an array of scalar values. • Integer indexes can be used to reference elements of a list. • To print an element of an array, do: print $count[2];

  12. Associative Array variables • Associative array variables are denoted by the % dereferencing symbol. • Associative array variables are simply hash tables containing scalar values • Example: $fred{“a”} = “aaa”; $fred{“b”} = “bbb”; $fred{6} = “cc”; $fred{1} = 2; • To do this in one step: %fred = (“a”, “aaa”, “b”, “bbb”, 6, “cc”, 1, 2);

  13. Statements & Input/Output • Statements • Contains all the usual if, for, while, and more… • Input/Output • Any variable not starting with “$”, “@” or “%” is assumed to be a filehandle. • There are several predefined filehandles, including STDIN, STDOUT and STDERR.

  14. Subroutines • We can reuse a segment of Perl code by placing it within a subroutine. • The subroutine is defined using the sub keyword and a name. • The subroutine body is defined by placing code statements within the {} code block symbols. sub MySubroutine { #Perl code goes here. }

  15. Subroutine call • To call a subroutine, prepend the name with the & symbol: &MySubroutine; • Subroutine may be recursive (call themselves).

  16. Pattern Matching • Perl enables to compare a regular expression pattern against a target string to test for a possible match. • The outcome of the test is a boolean result (TRUE or FALSE). • The basic syntax of a pattern match is $myScalar =~ /PATTERN/ • “Does $myScalar contain PATTERN ?”

  17. Functions • Perl provides a rich set of built-in functions to help you perform common tasks. • Several categories of useful built-in function include • Arithmetic functions (sqrt, sin, … ) • List functions (push, chop, … ) • String functions (length, substr, … ) • Existance functions (defined, undef)

  18. Perl 5 • Introduce new features: • A new data type: the reference • A new localization: the my keyword • Tools to allow object oriented programming in Perl • New shortcuts like “qw” and “=>” • An object oriented based liberary system focused around “Modules”

  19. Variable Reference Value References • A reference is a scalar value which “points to” any variable.

  20. Creating References • References to variables are created by using the backslash(\) operator. $name = “bio perl”; $reference = \$name; $array_reference = \@array_name; $hash_reference = \%hash_name; $subroutine_ref = \&sub_name;

  21. Dereferencing a Reference • Use an extra $ and @ for scalars and arrays, and -> for hashes. print “$$scalar_reference\n” “@$array_reference\n” “$hash_reference->{‘name’}\n”;

  22. a is 2 Variable Localization • local keyword is used to limit the scope of a variable to within its enclosing brackets. • Visible not only from within the enclosing bracket but in all subroutine called within those brackets $a = 1; sub mySub { local $a = 2; &mySub1($a); } sub mySub1 { print “a is $a\n”; }

  23. a is 1 Variable Localization – cont’d • my keyword hides the variable from the outside world completely. • Totally hidden $a = 1; sub mySub { my $a = 2; &mySub1($a); } sub mySub1 { print “a is $a\n”; }

  24. Object Oriented Programming in Perl (1) • Defining a class • A class is simply a package with subroutines that function as methods. #!/usr/local/bin/perl package Cat; sub new { … } sub meow { … }

  25. Object Oriented Programming in Perl (2) • Perl Object • To initiates an object from a class, call the class “new” method. $new_object = new ClassName; • Using Method • To use the methods of an object, use the “->” operator. $cat->meow();

  26. Object Oriented Programming in Perl (3) • Inheritance • Declare a class array called @ISA. • This array store the name and parent class(es) of the new species. package NorthAmericanCat; @NorthAmericanCat::ISA = (“Cat”); sub new { … }

  27. @name = qw(Tom Mary Michael); Miscellaneous Constructs • qw • The “qw” keyword is used to bypass the quote and comma character in list array definitions. @name = (“Tom”, “Mary”, “Michael”);

  28. %client = {“name” => “Michael”, “phone” => ”123-3456”, “email” => “mich@nj.net”}; Miscellaneous Constructs • => • The => operator is used to make hash definitions more readable. %client = {“name”, , “Michael”, “phone” , ”123-3456”, “email” , ”mich@nj.net”};

  29. Perl Modules • A Perl module is a reusable package defined in a library file whose name is the same as the name of the package. • Similar to C link library or C++ class package Foo; sub bar { print “Hello $_[0]\n”} sub blat { print “World $_[0]\n”: 1;

  30. Names • Each Perl module has a unique name. • To minimize name space collision, Perl provides a hierarchical name space for modules. • Components of a module name are separated by double colons (::). • For example, • Math::Complex • Math::Approx • String::BitCount • String::Approx

  31. Module files • Each module is contained in a single file. • Module files are stored in a subdirectory hierarchy that parallels the module name hierarchy. • All module files have an extension of .pm.

  32. Module libraries • The Perl interpreter has a list of directories in which it searhces for modules. • Global arry @INC >perl –V @INC: /usr/local/lib/perl5/5.00503/sun4-solaris /usr/local/lib/perl5/5.00503 /usr/local/lib/perl5/site-perl/5.005/sun4-solaris /usr/local/lib/perl5/site-perl/5.005

  33. Creating Modules • To create a new Perl module: ../development>h2xs –X –n Foo::Bar Writing Foo/Bar/Bar.pm Writing Foo/Bar/Makefile.PL Writing Foo/Bar/test.pl Writing Foo/Bar/Changes Writing Foo/Bar/MANIFEST ../development>

  34. Create the makefile Create test directory blib and the installs the module in it. Run test.pl Install your module Building Modules • To build a Perl module: perl Makefile.PL make make test make install

  35. Using Modules • A module can be loaded by calling the use function. use Foo; bar( “a” ); blat( “b” ); • Calls the eval function to process the code. • The 1; causes eval to evaluate to TRUE.

  36. End of Part I. Thank You…

  37. Part II:XS(eXternal subroutine)extension • Sen Zhang

  38. XS • XS is an acronym for eXternal Subroutine. • With XS, we can call C subroutines directly from Perl code, as if they were Perl subroutines.

  39. Perl is not good at: • very CPU-intensive things, like numerical integration . • very memory-intensive things. Perl programs that create more than 10,000 hashes run slowly. • system software, like device drivers. • things that have already been written in other languages.

  40. Usually… • These things are done by other highly efficient system programming languages such as C\C++.

  41. Can we call C subroutine from Perl? • Solution is: Perl C API

  42. When perl talks with C subroutine using perl C API • two things must happen: • control flow - control must pass from Perl to C (and back) • C program execution • Perl program execution • data flow - data must pass from Perl to C (and back) • C data representation • Perl data representation

  43. In order to use perl C API • What is Perl's internal data structures. • How the Perl stack works, and how a C subroutine gets access to it. • How C subroutines get linked into the Perl executable. • Understand the data paths through the DynaLoader module that associate the name of a Perl subroutine with the entry point of a C subroutine

  44. If you do code directly to the Perl C API • You will find You keep writing the same little bits of code • to move parameters on and off the Perl stack; • to convert data from Perl's internal representation to C variables; • to check for null pointers and other Bad Things. • When you make a mistake, you don't get bad output: you crash the interpreter. • It is difficult, error-prone, tedious, and repetitive.

  45. Pain killer is • XS

  46. What is XS? • Narrowly, XS is the name of the glue language • More broadly, XS comprises a system of programs and facilities that work together : • MakeMaker, • Xsub glue routine, • XS language itself, • xsubpp, • h2xs, • DynaLoader.

  47. MakeMaker -tool • Perl's MakeMaker facility can be used to provide a Makefile to easily install your Perl modules and scripts.

  48. MakeMaker, • Xsub glue routine, • XS language itself, • xsubpp, • h2xs, • DynaLoader.

  49. MakeMaker, • Xsub glue routine, • XS language itself, • xsubpp, • h2xs, • DynaLoader.

  50. Xsub • The Perl interpreter calls a kind of glue routine as an xsub. • Rather than drag the Perl C API into all our C code, we usually write glue routines. (We'll refer to an existing C subroutine as a target routine.)

More Related