Perl scripting
1 / 16

Perl scripting - PowerPoint PPT Presentation

  • Uploaded on

Perl scripting. Computer Basics. CPU. CPU, RAM, Hard drive CPU can only use data in the register directly. RAM. HARD DRIVE. Computer languages. Machine languages: binary code directly taken by the CPU. Usually CPU model specific. Fast.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Perl scripting' - kato

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Computer basics
Computer Basics


  • CPU, RAM, Hard drive

  • CPU can only use data in the register directly



Computer languages
Computer languages

  • Machine languages: binary code directly taken by the CPU. Usually CPU model specific. Fast.

  • Assembly language: mapping binary code to three-letter instructions; Platform-dependent. Fast

  • High-level language: “human-like” syntax, often non-CPU dependent. Compiled into machine code before use. Fast. E.g. C, C++, Fotran, Pascal, Basic.

  • Scripting language: usually not compiled into binary code. Interpreted and executed on request. Slow. E.g. Perl, Php, PythonJavascript, Bash script,Ruby

  • Byte-code language: source code converted to platform independent, intermediate code for rapid compilation. Java, Microsoft .NET. Speed intermediate.

Two elements of a program
Two elements of a program

  • Data structure & Algorithm

  • Different data structures may have corresponding, well optimized algorithms for information processing and extraction. (computer science)

  • For example: Inserting (algorithm) a node (data structure) in a linked list (data structure).

Basic types
Basic Types

  • Bit: 1 bit has 2 states, 1 or 0

  • 1 Byte = 8 bits, i.e. max(1 Byte) = (binary)11111111 = 255

  • Characters in the ASCII encoding can be encoded by 1 byte. In C, data type byte is in fact written as “char”

  • Byte is the smallest unit of storage.

  • Boolean (true/false) theoretically takes only 1 bit, but in reality it takes 1 Byte.

  • How many Boolean states can you store using 1 byte?

Basic types1

  • Integer: 32 bit, signed -216 + 1 ~ +216 - 1; unsigned +232 -1

  • Long integer: 64 bit.

  • Float: 32 bit. 24bit for significand, the rest for the exponent.

  • Float point numbers could lose precision, try this in perl:

  • print 0.6/0.2-3;

  • Correct way:

  • sub round {

    • my($n) = @_;

    • return int($n + $n/abs($n*2));

  • }

  • print round(0.6/0.2)-3;

Pointers reference
Pointers / reference

  • Pointers (or reference in other languages) are essentially an integer.

  • This integer stores a memory address.

  • This memory address refers to another variable.


Complex types
Complex types

  • Set: unordered values.

  • Array (vector): a set of ordered values of the same basic type.

    • Index starting from 0 in most langs, last index = length -1

  • Hash: key => value pairs. Key must be unique. Array can be thought of as a special Hash where key values are ordered, consecutive integers.

  • String * : in C, a string is simply an array of characters. In many other languages, strings are treated as a “basic type”. Most algorithms for arrays also works for strings.

Complex types1
Complex types

  • Classes: objected-oriented programming

  • A class packages related data of different datatypes, as well as algorithms associated with them into a nice blackbox for you to use.

  • Objected-oriented programming.

Perl scripting

  • PERL lumps all “basic types” as “Scalar”, “$”

  • PERL interpreter decides on what it “looks like”

  • Convenient, but sometimes problematic, especially when you parse in a user-provided data file.

  • Arrays, definition: @, reference $.

  • Hash, definition: %, reference $

  • RegExp

  • Handlers.

  • use strict;

  • PERL has an ugly grammar.

  • PERL has many short-cuts, such as $_


Flow control
Flow control

  • for, foreach, while, unless, until, if elsif else


Functions subroutines
Functions (subroutines)

  • Traditionally, “subroutines” do not accept parameters

  • Function is a better term, but b/c perl is ugly so it continues to use sub.

  • sub functionname {

  • my($param1, $param2) = @_; #get the parameters

  • return xxxx.

  • }

  • Call: functionname($param1, $param2);

  • I prefix all private functions with “fn”. But you don’t need to do that.

  • However, capitalize first letter of each word!

  • Use Verb + Noun phrases as function names

  • fnGetFileName(), fnDownloadPicture.

How to name variables
How to name variables

  • Variable names should reflex their basic types.

  • Descriptive names should be given, with each word capitalized

  • I use the c-style prefix on them

Perl scripting

  • Start with the DNA sequence:  ATGGAAATGGAGAGGCCTCTGCAAATGATGCCGGATTGTTTCAGACATATAGAAATGTCT,    report its length and check if its length can be divided by 3, also check if it's a valid DNA sequence. If check fails, do not continue.

  • Translate it into Peptide sequences using universal codon table.

  • Display it on screen in the following format where DNA is on first line, translated amino acids aligns with the middle letter at each codon at the second line:

  • This DNA sequence goes through generation after generation of replication.

  • At each replication, it has a user-specified probability (0-1) of single-nucleotide mutation. This mutational probability is specified through the command line.

Perl scripting

  • If mutation happens, 1 random letter in the DNA will be changed to A,T,C or G with equal probability. It's okay if the letter "changes" to the same letter.

  • Display at each generation the DNA and protein sequence as described in step 3, also display the generation.

  • Check if a stop codon has occured at each generation. If so the protein has lost its function, stop the evolution and output the generation at which the stop codon occurs.

  • This program should be able to deal with DNA sequence with upper or lowercase letters.

Perl scripting

  • Create a shell script called changed to A,T,C or G with equal probability. It's okay if the letter "changes" to the same

  • Run the simulation for 1000 times with mutational probabilities of 0.01, 0.1 and 0.5 respectively

  • Collect all DNA and protein sequence outputs to dist_$mutationprob.log

  • Collect the stopping generation at which stop codon first occurs in dist_$mutationprob.txt

  • Use R to plot dist_0.01.txt, dist_0.1.txt and dist_0.5.txt on a histogram (each parameter with different colors). X axis should be log10(Generation).