1 / 20

Subroutines and Files

Learn how subroutines can save time and prevent errors in bioinformatics. Discover built-in subroutines and how to find pre-defined ones. Explore file manipulation and basic file operations. Complete examples and assignments provided.

grayt
Download Presentation

Subroutines and Files

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Subroutines and Files Bioinformatics Ellen Walker Hiram College

  2. Why Subroutines? • Saves typing • Saves potential copy/paste errors • Collect common algorithm in one place for reuse

  3. Built-In Subroutines • Provide common useful functions, e.g. • Index • Length • Substr • Call with arguments, • Index($string, $pat) #$string and $pat are arguments • Different arguments produce different results

  4. Finding Predefined Subroutines • Textbooks (Safari Online has several) • Google (include “Perl” in your string) • Online documentation • http://www.gotapi.com/perl is nicely searchable

  5. my $code = “ACA”; print length($code); print “goodbye\n”; Sub length my $string = shift(@_) my $length = 0; …code to count … return $length; How a Subroutine Works ACA “ACA” 3

  6. Key Components • sub name • Declares this as a subroutine and names it • shift @_ • Pulls the arguments out of the list (in parentheses, one at a time, left to right) • Example: somesub(“ACT”,1) • $a = shift@_ ($a is “ACT) • $b = shift@_ ($b is 1) • return value • Ends the subroutine & gives it a value

  7. Example (p. 122) # find all GC-rich 4-7mers and determine their complements my $GCmatch; while ($someDNA =~m/([GC]{4,7})/g ){ $GCmatch = $1; print “5’ $GCmatch 3’\n\n”; $compl = complement($GCmatch); print “3’ $compl 5’”\n”; }

  8. Subroutine (p. 123) #book version has good documentation sub complement { my $dna = shift(@_); #get first arg my $anti = $dna; $anti =~ tr/ACGTacgt/TGCAtgca/; return $anti; }

  9. Download These (Ch. 7) • Counting nucleotides • countNucleotides( $str, “C”); • countNucleotides( $str, “[CG]”); • Printing sequences with fixed line width • printSequence($str, 80);

  10. Variable Scope • Variables exist from when they are declared (“my”) until the end of the block (closing brace). • Variables in subroutines exist only during the subroutine • Each call to a subroutine re-initializes the variables

  11. Files and Programs • Files are stored on the computer’s hard drive and maintained by the operating system. • Programs are connected to files via special subroutines • “open” creates a file handle • “close” releases the file (important!)

  12. Basic File Manipulation • Open a file and read • my $HANDLE; • open ($HANDLE, ‘<‘, $filename); • $line = <$HANDLE>; • Open a file and write • My $HANDLE; • open($HANDLE, ‘>’, $filename); • print $HANDLE “Hello world!”; • Close a file • close($HANDLE);

  13. Allowing for Errors • If you try to read a file that doesn’t exist, or write a file that does, the open() command will return false • The rest of your program won’t work. • To fix this add: or die(“some message $file :$!”) to the end of the command ($! Contains the system error messages)

  14. Complete Open Examples open ($HANDLE, ‘<‘, $filename) or die(“Cannot open file: $filename: $!); open ($HANDLE, ‘>‘, $filename) or die(“Cannot write file: $filename: $!);

  15. Reading lines • Subroutine chomp removes the ‘\n’ character at the end of each line • $line = <$HANDLE> puts the next line in $line • When there are no more lines, the result is false • Example: put the whole file in one sequence while ($line = <$HANDLE>) { chomp $line $seq = $seq . $line }

  16. Printing to a file • The print commands (print and printf) can optionally be followed with a file handle before the string to print • Examples: • print $HANDLE “Hello\n”; • printf $HANDLE “GC percent is %.1f\n”, $GCcount * 100.0 / $total;

  17. ReadInDNA • Subroutine to read FASTA formatted file (p. 141) • Returns sequence as one long string • Removes whitespace, lines that begin with # (comments), and all digits

  18. FASTA File Format • One header line, begins with > • Many lines of text, sometimes capitalized, sometimes with spaces after every n characters • (ReadInDNA handles these variations)

  19. Getting a FASTA File • Go to NCBI http://www.ncbi.nlm.nih.gov/ • Search for what you want and download the file to your current machine • Send the file to your directory of cs.hiram.edu (Demo to be provided)

  20. Assignment • Using subroutines from your text, determine the GC content of the given genomes. (Examples to be provided)

More Related