1 / 22

MCB3895-004 Lecture #3 Sept 2/14

MCB3895-004 Lecture #3 Sept 2/14. Intro to UNIX terminal. Introduction to UNIX. Nearly all bioinformatics software runs on UNIX and its derivatives (e.g., LINUX and Mac OS) Very little bioinformatics software runs on Windows

gerard
Download Presentation

MCB3895-004 Lecture #3 Sept 2/14

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MCB3895-004 Lecture #3Sept 2/14 Intro to UNIX terminal

  2. Introduction to UNIX • Nearly all bioinformatics software runs on UNIX and its derivatives (e.g., LINUX and Mac OS) • Very little bioinformatics software runs on Windows • Bioinformatics is very strongly tied to the open-source software movement • Lots of help available on-line • Most programs are free • Windows is not very open-source friendly

  3. Windows users: • Option 1: Do all of your work connected to the Biotechnology Cluster server. Download sshclient(ftp://ftp.uconn.edu/restricted/ssh/) • Option 2: Install LINUX to run in parallel with Windows (e.g., Biolinuxhttp://nebc.nerc.ac.uk/tools/bio-linux)

  4. Terminal • The terminal is the primary way to do computational biology • Mac: Utilities/Applications/ Terminal • Linux: Applications/Accessories/ Terminal • Windows: sshclient

  5. Assignment • A handy resource to learn the basics of UNIX is the “Unix and Perl Primer for Biologists”, which can be found here: http://korflab.ucdavis.edu/Unix_and_Perl/unix_and_perl_v3.1.1.pdf • The commands they demonstrate mainly involve creating, removing and moving around files and directories • Once you learn them, these commands will take you far beyond what you can do with a more familiar GUI like Mac Finder or Windows Explorer

  6. Worthy of special comment • Directory trees • Using tab to autocomplete • Wildcard characters like * to perform the same operation to multiple files (this is insanely useful once you get the hang of it!) • Using nano as a very basic text editor Never, ever, ever use Word for this! • Use underscores “_” not spaces in your filenames

  7. Directory trees • All computer files are organized hierarchically • Each folder has an address /Users/Jonathan/ Laptop_backup/Destop/ e-Books

  8. A quick reference to where you are in UNIX • “/” - root • “~” - your user home directory • “.” - “here”, the directory you are in now • “../” - one level up in the directory tree

  9. More UNIX tricks • “>” (greater than) redirects the output of a command into a new file e.g., ls * > list • a list of the files in this directory is now stored in the file “list”

  10. More UNIX tricks • cat joins multiple files together e.g.,cat file1 file2 > file3 • file3 contains file1 and file2 joined together • file1 and file2 still exist as they were

  11. More UNIX tricks • grepextracts all lines containing a particular pattern from a file e.g.,grep “NP_” file1 • Prints every line that contains the pattern “NP_” to the screen

  12. More UNIX tricks • wccounts the newlines, words and bytes in a file e.g.,wc file1 • Prints an output like this: 10602 18921 752002 file1 newlines words bytes filename

  13. More UNIX tricks • “|” (pipe) directs the output of one command into another e.g.,grep “NP_” file1 | wc • Sounds the output of the grep command into wc, because grep extracts lines from a file, can be used to count the number of lines matching the grep expression e.g., grep “NP_” file1 | less • Displays grep result as a list you can scroll through

  14. More UNIX tricks • gzip/gunzip: single file compression e.g., gunzip file.txt.gz • Decompresses file.txt e.g., gzip file.txt • Creates compressed file file.txt.gz, removes file.txt

  15. More UNIX tricks • tar: file archive management e.g., tar -cf all.tar * • Creates tar archive all.tar containing all files in that directory, individual files unchanged e.g., tar -xf all.tar • Extracts all files from tar archive all.tar to the current directory, all.tar not deleted • tar is very commonly used before gzip - “tarballs”

  16. Connecting to the Bioinformatics facility server • UNIX command ssh • e.g., ssh -l jlklassen bbcsrv3.biotech.uconn.edu • Will ask for a password • If the first time connecting, will want you to authenticate an RSA key (security feature) • Your terminal now controls the bioinformatics facility server, not your own machine • You can have multiple terminals open at the same time

  17. Transferring files to the Bioinformatics facility server • Method 1: Filezilla(https://filezilla-project.org/) • Nice GUI • Works on all platforms • Install the client, not the server

  18. Transferring files to the Bioinformatics facility server • Method 2: UNIX command scp • e.g., scp jlklassen@bbcsrv3.biotech.uconn.edu:all.tar all.tar • Copy all.tar from my computer to the biotech server • e.g., scp -r jlklassen@bbcsrv3.biotech.uconn.edu:dir/ . • Copy the directory “dir” from the biotech server to the current working directory • “-r” flag indicates “recursive”, needed for directories

  19. Text editors • Using nano works, but can be cumbersome for complex tasks • Word is always bad! Adds layers you don’t see. • Mac and LINUX have TextEdit and Gedit as default text editors, both work well • Windows: Notepad and Wordpad are insufficient. I suggest downloading Geditfor Windows (https://wiki.gnome.org/Apps/Gedit) • Other options exist for all platforms

  20. Assignment • See instructions posted on the website at http://wp.mcb3895.mcb.uconn.edu • Part 1: work through Korf manual sections U1-U27 (some commands require external files, ignore these but understand what they do) • Part 2: log on to the Biotech server, download a genome from NCBI and answer the questions given • The assignment is due at the start of class 1 week from today

  21. Command line power! • The simplest way to download these data is to use the terminal command wget $ wget –r --no-directories --retr-symlinks-P Acaricomes_phytoseiuli/ ftp://ftp.ncbi.nlm.gov/genomes/refseq/bacteria/Acaricomes_phytoseiuli/latest_assembly_versions/GCF_000376245.1_ASM37624v1/ • Deconstructed: • -r – “recursive”, i.e., download everything in this directory • --no-directories– does not create the entire ftp directory structure • --retr-symlinks– NCBI uses a fancy file structure using something called “symbolic links”, where a file points to another file somewhere else. “--retr-symlinks” gets the actual files, not just the links • -P Acaricomes_phytoseuili/ – where to put the output

More Related