1 / 16

Perl scripting

Perl scripting. Computer Basics. CPU. CPU, RAM, Hard drive CPU can only use data in the register directly. RAM. HARD DRIVE. Computer languages. Machine languages: binary code directly taken by the CPU. Usually CPU model specific. Fast.

kato
Download Presentation

Perl scripting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Perl scripting

  2. Computer Basics CPU • CPU, RAM, Hard drive • CPU can only use data in the register directly RAM HARDDRIVE

  3. Computer languages • Machine languages: binary code directly taken by the CPU. Usually CPU model specific. Fast. • Assembly language: mapping binary code to three-letter instructions; Platform-dependent. Fast • High-level language: “human-like” syntax, often non-CPU dependent. Compiled into machine code before use. Fast. E.g. C, C++, Fotran, Pascal, Basic. • Scripting language: usually not compiled into binary code. Interpreted and executed on request. Slow. E.g. Perl, Php, PythonJavascript, Bash script,Ruby • Byte-code language: source code converted to platform independent, intermediate code for rapid compilation. Java, Microsoft .NET. Speed intermediate.

  4. Two elements of a program • Data structure & Algorithm • Different data structures may have corresponding, well optimized algorithms for information processing and extraction. (computer science) • For example: Inserting (algorithm) a node (data structure) in a linked list (data structure).

  5. Basic Types • Bit: 1 bit has 2 states, 1 or 0 • 1 Byte = 8 bits, i.e. max(1 Byte) = (binary)11111111 = 255 • Characters in the ASCII encoding can be encoded by 1 byte. In C, data type byte is in fact written as “char” • Byte is the smallest unit of storage. • Boolean (true/false) theoretically takes only 1 bit, but in reality it takes 1 Byte. • How many Boolean states can you store using 1 byte?

  6. BASIC TYPES • Integer: 32 bit, signed -216 + 1 ~ +216 - 1; unsigned +232 -1 • Long integer: 64 bit. • Float: 32 bit. 24bit for significand, the rest for the exponent. • Float point numbers could lose precision, try this in perl: • print 0.6/0.2-3; • Correct way: • sub round { • my($n) = @_; • return int($n + $n/abs($n*2)); • } • print round(0.6/0.2)-3;

  7. Pointers / reference • Pointers (or reference in other languages) are essentially an integer. • This integer stores a memory address. • This memory address refers to another variable. • http://perldoc.perl.org/perlref.html

  8. Complex types • Set: unordered values. • Array (vector): a set of ordered values of the same basic type. • Index starting from 0 in most langs, last index = length -1 • Hash: key => value pairs. Key must be unique. Array can be thought of as a special Hash where key values are ordered, consecutive integers. • String * : in C, a string is simply an array of characters. In many other languages, strings are treated as a “basic type”. Most algorithms for arrays also works for strings.

  9. Complex types • Classes: objected-oriented programming • A class packages related data of different datatypes, as well as algorithms associated with them into a nice blackbox for you to use. • Objected-oriented programming.

  10. PERL • PERL lumps all “basic types” as “Scalar”, “$” • PERL interpreter decides on what it “looks like” • Convenient, but sometimes problematic, especially when you parse in a user-provided data file. • Arrays, definition: @, reference $. • Hash, definition: %, reference $ • RegExp • Handlers. • use strict; • PERL has an ugly grammar. • PERL has many short-cuts, such as $_ • DO NOT USE THEM!

  11. Flow control • for, foreach, while, unless, until, if elsif else • http://perldoc.perl.org/perlsyn.html#Compound-Statements

  12. Functions (subroutines) • Traditionally, “subroutines” do not accept parameters • Function is a better term, but b/c perl is ugly so it continues to use sub. • sub functionname { • my($param1, $param2) = @_; #get the parameters • return xxxx. • } • Call: functionname($param1, $param2); • I prefix all private functions with “fn”. But you don’t need to do that. • However, capitalize first letter of each word! • Use Verb + Noun phrases as function names • fnGetFileName(), fnDownloadPicture.

  13. How to name variables • Variable names should reflex their basic types. • Descriptive names should be given, with each word capitalized • I use the c-style prefix on them

  14. Start with the DNA sequence:  ATGGAAATGGAGAGGCCTCTGCAAATGATGCCGGATTGTTTCAGACATATAGAAATGTCT,    report its length and check if its length can be divided by 3, also check if it's a valid DNA sequence. If check fails, do not continue. • Translate it into Peptide sequences using universal codon table. • Display it on screen in the following format where DNA is on first line, translated amino acids aligns with the middle letter at each codon at the second line: • This DNA sequence goes through generation after generation of replication. • At each replication, it has a user-specified probability (0-1) of single-nucleotide mutation. This mutational probability is specified through the command line.

  15. If mutation happens, 1 random letter in the DNA will be changed to A,T,C or G with equal probability. It's okay if the letter "changes" to the same letter. • Display at each generation the DNA and protein sequence as described in step 3, also display the generation. • Check if a stop codon has occured at each generation. If so the protein has lost its function, stop the evolution and output the generation at which the stop codon occurs. • This program should be able to deal with DNA sequence with upper or lowercase letters.

  16. Create a shell script called getdistr.sh • Run the simulation mutation.pl for 1000 times with mutational probabilities of 0.01, 0.1 and 0.5 respectively • Collect all DNA and protein sequence outputs to dist_$mutationprob.log • Collect the stopping generation at which stop codon first occurs in dist_$mutationprob.txt • Use R to plot dist_0.01.txt, dist_0.1.txt and dist_0.5.txt on a histogram (each parameter with different colors). X axis should be log10(Generation).

More Related