1 / 55

BINF 634 Bioinformatics Programming

BINF634 FALL09 LECTURE 1. 2. Acknowledgements. John GrefenstetteAssistance with course development.Sharing course materials.Friendship :^).. BINF634 FALL09 LECTURE 1. 3. Experimental Biology Computational Biology and Bioinformatics. . . . . . . . Database. Problem Statement. Experiment. Results.

Sophia
Download Presentation

BINF 634 Bioinformatics Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. BINF634 FALL09 LECTURE 1 1 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com Office Hours: By appointment Required texts: Beginning Perl for Bioinformatics by Tisdall and Waliszewski Programming Perl (3rd Edition) by Wall, Christiansen and Orwant Course Meeting Times: 304B M: 4:30 pm – 7:10 pm Course webpage http://binf.gmu.edu/~jsolka/fall09/binf634/Fall_2009BINF_634_Syllabus_rev1.html

    2. BINF634 FALL09 LECTURE 1 2 Acknowledgements John Grefenstette Assistance with course development. Sharing course materials. Friendship :^).

    3. BINF634 FALL09 LECTURE 1 3 Experimental Biology Computational Biology and Bioinformatics

    4. BINF634 FALL09 LECTURE 1 4 Bioinformatics Programming Tasks Manage large experimental data sets Sequence data Microarray data (gene expression) Mass spec data (proteomics) Genotype project data (HapMap) Clinical data Build tools for Knowledge Discovery Find motifs in sequence data Data clustering Visualization Build analysis pipelines Glue several analysis steps together into a single automated process "Munge" data: Take data from one application or database and format it for input to another application of database

    5. BINF634 FALL09 LECTURE 1 5 Where the Course Fits

    6. BINF634 FALL09 LECTURE 1 6 Objectives Programming skills Problem solving and Debugging Reading and Writing Documentation Data Munging: Data filtering and transformation Pattern matching and data mining Visualization and web presentation Object-oriented programming Bioinformatics skills Biological sequence analysis Interacting with biological databases Using Bioperl

    7. BINF634 FALL09 LECTURE 1 7 Background and Prerequisites Molecular Biology BIOL 482 or similar course Recombinant DNA - Watson, Gilman, Witlowski, Zoller http://www.amazon.com/Recombinant-DNA-Genes-Genomes-Course/dp/0716728664/ref=dp_ob_title_bk Online Tutorials http://www.biology-online.org/1/5_DNA.htm Computer Science IT 108, CS 112 or similar Previous programming experience

    8. BINF634 FALL09 LECTURE 1 8 Course Policies Programming assignments (50%) 5 graded programming assignments Exams: Midterm (20%) and Final (20%) May include both closed-book section and open-book programming problems In-class Quizzes (10%) Weekly homework assignments All HW assignments must be submitted to me via email by the beginning of the next class. HW assignments will not be graded individually, but you may be called upon to discuss your work during the next class. Therefore, late assignments will not be accepted. Grading criteria: A: 90-100 B: 80-89 C: 70-79 Keep an eye on the webpage http://binf.gmu.edu/~jsolka/fall09/binf634/Fall_2009BINF_634_Syllabus_rev1.html

    9. BINF634 FALL09 LECTURE 1 9 Honor Code Policies I take honor code violations very seriously. Programming assignments must be your work. Each assignment will specify whether you may use code from other sources. Any material you take from another source must be acknowledged within the program documentation. You must read and understand the honor code handout. Violations of the honor code WILL be referred to the Honor Council. All students must adhere to the GMU Honor Code: See: http://honorcode.gmu.edu/

    10. BINF634 FALL09 LECTURE 1 10 Pragmatics Assignments and Announcement Will be posted on course wepage; check daily Class email will be sent to your email address from Patriot Web Accounts You should have an account on the server binf.gmu.edu Systems administrator: Chris Ryan, cryan1@gmu.edu Accessing perl: Login from Rooms 304B or 320 Login from off-campus using ssh Go to ftp://ftp.ssh.com/pub/ssh/ for academic Windows client Alternatively go to http://www.chiark.greenend.org.uk/~sgtatham/putty/ Install perl on your own computer -- see textbooks and backup slide materials

    11. Pragmatics Unix This class will focus on using the Unix operating system We will be using Mac OS X (at least in the classroom) There are numerous UNIX tutorials http://www.unixtools.com/tutorials.html Text Editors Perl program are stored in plain text files I recommend emacs or vim for a Unix text editor (see links for windows support) http://www.claremontmckenna.edu/math/ALee/emacs/emacs.html http://www.vim.org If you are interested in an integrated development environment I recommend Eclipse (see backup slides) www.eclipse.org There is a tutorials for each online http://www.gnu.org/software/emacs/tour/ http://www.yolinux.com/TUTORIALS/LinuxTutorialAdvanced_vi.html

    12. BINF634 FALL09 LECTURE 1 12 Review: Molecular Biology Life evolved from common origin about 3.5 billion years ago All life shares similar biochemistry Proteins: active elements Nucleic acids: informational elements Molecular Biology: the study of structure and function of proteins and nucleic acids

    13. BINF634 FALL09 LECTURE 1 13 Proteins Functions: Structural proteins Enzymes Transport Antibody defense Structure: Chains of amino acids Typical size ~300 residues Range from about 100 to over 5000 residues

    14. BINF634 FALL09 LECTURE 1 14

    15. BINF634 FALL09 LECTURE 1 15

    16. BINF634 FALL09 LECTURE 1 16 Translation Translation involves mRNA and ribosomes Ribosomes made of protein and ribosomal RNA (rRNA) Transfer RNA (tRNA) make connection between specific codons in mRNA and amino acids As tRNA binds to the next codon in mRNA, its amino acid is bound to the last amino acid in the protein chain When a STOP codon is encountered, the ribosome releases the mRNA and synthesis ends

    17. BINF634 FALL09 LECTURE 1 17

    18. BINF634 FALL09 LECTURE 1 18 DNA Structure DNA contains: Genes "a locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions, and or other functional sequence regions ".[1] Promoters “a promoter is a region of DNA that facilitates the transcription of a particular gene” Non-coding regions DNA which does not contain instructions for making proteins Reading frames An open reading frames (ORF): a contiguous sequence of DNA starting at a start codon and ending at a STOP codon

    19. BINF634 FALL09 LECTURE 1 19 Shotgun DNA Sequencing

    20. Sequence Files -- FASTA Format

    21. GenBank Record LOCUS AK091721 2234 bp mRNA linear PRI 20-JAN-2006 DEFINITION Homo sapiens cDNA FLJ34402 fis, clone HCHON2001505. ACCESSION AK091721 VERSION AK091721.1 GI:21750158 KEYWORDS oligo capping; fis (full insert sequence). SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini; Hominidae; Homo. TITLE Complete sequencing and characterization of 21,243 full-length human cDNAs JOURNAL Nat. Genet. 36 (1), 40-45 (2004) FEATURES Location/Qualifiers source 1..2234 /organism="Homo sapiens" /mol_type="mRNA" CDS 529..1995 /note="unnamed protein product" /codon_start=1 /protein_id="BAC03731.1" /db_xref="GI:21750159" /translation="MVAERSPARSPGSWLFPGLWLLVLSGPGGLLRAQEQPSCRRAFD ... RLDALWALLRRQYDRVSLMRPQEGDEGRCINFSRVPSQ" ORIGIN 1 gttttcggag tgcggaggga gttggggccg ccggaggaga agagtctcca ctcctagttt 61 gttctgccgt cgccgcgtcc cagggacccc ttgtcccgaa gcgcacggca gcggggggaa ...

    22. Why Perl? Widely used in Bioinformatics Bioperl http://www.bioperl.org/wiki/Main_Page Ease of Programming Excellent pattern matching features Good for gluing other program together Easy to learn (enough to get started) Rapid Prototyping Few lines of code needed for many problems One-liners Portability Runs on Unix, Windows, Macs Open Source Culture Many sources of help ( try: %perldoc perldoc) %perldoc –f print http://perldoc.perl.org/index-tutorials.html Many sources of useful modules ( http://www.cpan.org/ )

    23. BINF634 FALL09 LECTURE 1 23 Variables The types of Perl variables are indicated by the initial symbol: $var stores a scalar (a single string or number) $x = 10; $s = "ATTGCGT"; $x = 3.1417; @var stores an array (a list of values) @a = (10, 20, 30); @a = (100, $x, "Jones", $s); print "@a\n"; # prints "100 3.1417 Jones ATTGCGT" %var stores a hash (associative array) %ages = { John => 30, Mary => 22, Lakshmi => 27 }; print $age{"Mary"}, "\n"; # prints 22

    24. BINF634 FALL09 LECTURE 1 24 Declaring Variables use strict; Putting use strict; at the top of your programs will tell perl to slap your hands with a fatal error whenever you break certain rules. Requires us to declare all variables Avoids creating variable by typos variables may be declaring using my, our or local for now, we only need to use my: my $a; # value of $a is undef my ($a, $b, $c); # $a, $b, $c are all undef my @array; # value of @array is () Can combine declaration and initialization: my @array = qw/A list of words/; my $a = "A string";

    25. BINF634 FALL09 LECTURE 1 25 How Things Can Go Wrong

    26. BINF634 FALL09 LECTURE 1 26 Scalar and List Context All operations in Perl are evaluated in either scalar or list context, and may behave differently depending on context @array = ('one', 'two', 'three'); $a = @array; # scalar context for assignment, return size print $a; # prints 3 ($a) = @array; # list context for assignment print $a; # prints 'one' ($a, $b) = @array; print "$a, $b"; # prints 'one, two' ($a, $b, $c, $d) = @array; # $d is undefined

    27. BINF634 FALL09 LECTURE 1 27 String Operations Ways to concatenate strings $DNA1 = "ATG"; $DNA2 = "CCC"; $DNA3 = $DNA1 . $DNA2; # concatenation operator $DNA3 = "$DNA1$DNA2"; # string interpolation print "$DNA3"; # prints ATGCCC $DNA3 = '$DNA1$DNA2'; # no string interpolation print "$DNA3"; # prints $DNA1$DNA2

    28. BINF634 FALL09 LECTURE 1 28 Arrays An array stores an ordered list of scalars: @gene_array = (‘EGF1’, ‘TFEC’, ‘CFTR’, ‘LOC1691’); print “@gene_array\n”; Output: EGF1 TFEC CFTR LOC1691 # there’s more than one way to do it (see previous slide on declaring variables) @gene_array = qw/EGF1 TFEC CFTR LOC1691/;

    29. BINF634 FALL09 LECTURE 1 29 Arrays An array stores an ordered list of scalars: @a = (‘one’, ‘two’, ‘three’, ‘four’); The array is indexed by integers starting with 0: print “$a[1] $a[0] $a[3]\n”; prints: two one four Notice: $a[i] is a scalar since we used the $ method of addressing the variable

    30. BINF634 FALL09 LECTURE 1 30 Unix Commands I cat --- for creating and displaying short files chmod --- change permissions cd --- change directory cp --- for copying files date --- display date echo --- echo argument ftp --- connect to a remote machine to download or upload files grep --- search file head --- display first part of file ls --- see what files you have lpr --- standard print command more --- use to read files mkdir --- create directory mv --- for moving and renaming files

    31. BINF634 FALL09 LECTURE 1 31 Unix Commands II pwd --- find out what directory you are in rm --- remove a file rmdir --- remove directory setenv --- set an environment variable sort --- sort file tail --- display last part of file tar --- create an archive, add or extract files ssh --- log in to another machine wc --- count characters, words, lines This site has a nice reference card http://www.digilife.be/quickreferences/QRC/UNIX%20commands%20reference%20card.pdf

    32. BINF634 FALL09 LECTURE 1 32 chmod and tar chmod There is a nice tutorial here http://www.perlfect.com/articles/chmod.shtml tar There is a nice tutorial here http://www.apl.jhu.edu/Misc/Unix-info/tar/tar_2.html

    33. BINF634 FALL09 LECTURE 1 33 Running perl on binf.gmu.edu % ssh binf.gmu.edu Password: ****** -- Create binf634 directory (don't type stuff in red) % mkdir binf634 % cd binf634 % ls -- Copy a file to current directory -- (the "." means :current directory") % cp ~jsolka/public_html/fall09/binf634/bookcode/examples /example4-1.pl . % ls % ls -l % l

    34. BINF634 FALL09 LECTURE 1 34 Running perl on binf.gmu.edu % cat example4-1.pl #!/usr/bin/perl -w # Example 4-1 Storing DNA in a variable, and printing it out # First we store the DNA in a variable called $DNA $DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC'; # Next, we print the DNA onto the screen print $DNA; # Finally, we'll specifically tell the program to exit. exit; -- Changing permissions % chmod 755 example4-1.pl -- Running a perl script % example4-1.pl

    35. BINF634 FALL09 LECTURE 1 35 Editing a Perl Script -- Read the Emacs or vi tutorial. -- Make a copy and edit the copy % cp example4-1.pl first.pl % l % e first.pl -- 1. Change 'print $DNA;' to 'print $DNA, "\n";' -- 2. Now add a comment: # Author: your name % cat first.pl #!/usr/bin/perl -w # Author: Jeff Solka # Example 4-1 Storing DNA in a variable, and printing it out # First we store the DNA in a variable called $DNA $DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC'; # Next, we print the DNA onto the screen print $DNA, "\n"; # Finally, we'll specifically tell the program to exit. exit;

    36. BINF634 FALL09 LECTURE 1 36 For Next Week Read Tisdall chapters 1-5. Be ready to ask questions Be ready to answer questions HW 1: Write programs as described in the following exercises from "Beginning Perl for Bioinformatics" by Tisdall: 4.3, 4.4, 4.5, 5.2, 5.4 and 5.6 For each exercise, create a perl script called exX.Y.pl, for example, ex4.3.pl for the first exercise. email me the assignments at jlsolka@gmail.com Use the following format initialoffirstname.lastname.ex.4.3

    37. BINF634 FALL09 LECTURE 1 37 Some of the Details

    38. BINF634 FALL09 LECTURE 1 38 Alternative Development Environments

    39. BINF634 FALL09 LECTURE 1 39 What is Eclipse? Eclipse is a multi-language software development platform comprising an IDE and a plug-in system to extend it. It is written primarily in Java and is used to develop applications in this language and, by means of the various plug-ins, in other languages as well—C/C++, Cobol, Python, Perl, PHP and more. The initial codebase originated from VisualAge.[1] In its default form it is meant for Java developers, consisting of the Java Development Tools (JDT). Users can extend its capabilities by installing plug-ins written for the Eclipse software framework, such as development toolkits for other programming languages, and can write and contribute their own plug-in modules. Language packs provide translations into over a dozen natural languages.[2] Released under the terms of the Eclipse Public License, Eclipse is free and open source software. http://en.wikipedia.org/wiki/Eclipse_(software)

    40. BINF634 FALL09 LECTURE 1 40 What Operating Systems Does Eclipse Run Under? LINUX MAC OSX WINDOWS XP Vista

    41. BINF634 FALL09 LECTURE 1 41 Languages Supported by the Eclipse IDE JAVA Out of the box PERL Via EPIC library Note one must also have a PERL compiler PYTHON Via PyDev library Note one must also have a PYTHON compiler installed

    42. BINF634 FALL09 LECTURE 1 42 Advantages and Disadvantages of the Eclipse Development Environment Advantages Support for a plethora of languages Industrial strength Used by many professional software developer Has support for configuration management Disadvantages Can be slow when developing in languages other than JAVA (may be mere anecdotal evidence)

    43. BINF634 FALL09 LECTURE 1 43 Installing Eclipse Under Windows XP - I First make sure that you have a Java Runtime Environment installed Microsoft Windows XP [Version 5.1.2600] (C) Copyright 1985-2001 Microsoft Corp. C:\Documents and Settings\Owner>java -version java version "1.5.0_05" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_05-b05) Java HotSpot(TM) Client VM (build 1.5.0_05-b05, mixed mode) C:\Documents and Settings\Owner> If you don’t have a JRE installed go to http://java.sun.com/j2se/1.4.2/download.html

    44. BINF634 FALL09 LECTURE 1 44 Installing Eclipse Under Windows XP - II Obtain the Eclipse zipped file from the Eclipse downloads link at http://www.eclipse.org/downloads/ I believe that I chose this one Eclipse IDE for Java Developers (85 MB) Unzip it into an eclipse folder under your windows Program Files directory In my case here C:\Program Files\eclipse Note that Eclipse does not modify your system’s registry

    45. BINF634 FALL09 LECTURE 1 45 Installing Eclipse Under Windows XP - III Once installed (unzipped) Double click on the eclipse.exe icon There is a “hello world” java tutorial There are a number of other tutorials Eclipse3-1.pdf (I will email it to you it is publicly available on the web)

    46. BINF634 FALL09 LECTURE 1 46 Downloading ActiveStates ActivePerl Go here and click on the Windows download link http://www.activestate.com/activeperl/ You should be downloading version 5.10 Use this self extracting binary to install the program This takes a long time (30 minutes or more, go enjoy your favorite beverage)

    47. BINF634 FALL09 LECTURE 1 47 Installing the Eclipse EPIC Library This is my synopsis of this EPIC webpage tutorial http://www.epic-ide.org/download.php This is also a helpful site http://www.epic-ide.org/faq.php Under Eclipse user the Help->Software Updates Tab Switch to the Available Software tab Choose Add Site and choose http://e-p-i-c.sf.net/updates Tick the newly created site and click the install button

    48. BINF634 FALL09 LECTURE 1 48 Creating Your First PERL Program Under the Eclipse IDE - I Under Eclipse go to Window -> Open Perspective -> Other Choose PERL Under Eclipse go to Window -> Preferences Click on the PERL + and enter in the full path to the ActiveStates PERL executable In my case it is "C:\Perl\bin\perl5.10.0.exe"

    49. BINF634 FALL09 LECTURE 1 49 Creating Your First PERL Program Under the Eclipse IDE - II Click on File -> New PERL Project Call it something like HelloWorld Click on File -> New PERL File Call it something like HelloWorldPerl Left click on this file symbol and make sure its extension is .pl (Now it should have a camel symbol) Enter in your code print "Hello from ActivePerl!\n"; Now you should be able to choose Run from the top menu or left click on the program symbol and choose Run As Perl Local If all goes well a console window with the output Hello from ActivePerl! should show up

    50. BINF634 FALL09 LECTURE 1 50 Debugging With Eclipse and PERL The Perl PPM package PadWalker has to be installed before one can debug your PERL programs under Eclipse Follow the steps on the next two slides to install PadWalker within ActiveStates PERL

    51. BINF634 FALL09 LECTURE 1 51 First Find the Package (PadWalker) Find a package. To find a package in the repository: Click the All packages button, Enter text from the package's name or abstract in the Filter field As text is entered in the Filter field, the list of packages is automatically updated as the substring match becomes more precise. Click the magnifying glass icon to filter on different meta-data (e.g. Author). Alternatively, just start typing the name of the package. The Package List will highlight the first package that matches the string you have typed.

    52. BINF634 FALL09 LECTURE 1 52 Next Install the Package (PadWalker) Install a package. To install a package from the repository: Click on the desired package in the Package List to select it. Mark the package by: Clicking the Mark for install button or, Hitting the "+" key or, Selecting Install <package-name> from the Action menu or, Right-clicking the selection and choosing Install <package-name> from the context menu. Click the Run marked actions button or select Run Marked Actions (Ctrl-Enter) from the File menu. In my case I installed PadWalker 1.7

    53. BINF634 FALL09 LECTURE 1 53 Installing PadWalker Via ppm There are other interesting discussions here but they seem to have been somewhat relegated by the gui-based ActiveStates PERL ppm interface http://trouchelle.com/perl/ppmrepview.pl

    54. BINF634 FALL09 LECTURE 1 54 Editors

    55. BINF634 FALL09 LECTURE 1 55 http://www.viemu.com/vi-vim-cheat-sheet.gif

    56. BINF634 FALL09 LECTURE 1 56 http://refcards.com/docs/gildeas/gnu-emacs/emacs-refcard-a4.pdf

More Related