1 / 59

Bioinformatics 生物信息学理论和实践 唐继军 jtang@cse.sc 13928761660

Bioinformatics 生物信息学理论和实践 唐继军 jtang@cse.sc.edu 13928761660. www.cse.sc.edu/~jtang/BJFU. 作业. GTTGCAGCAATGGTAGACTCAACGGTAGCAATAACTGCAGGACCTAGAGGAAAAACAGTAGGGATTAATAAGCCCTATGGAGCACCAGAAATTACAAAAGATGGTTATAAGGTGATGAAGGGTATCAAGCCTGAA 为什么用缺省 blast 出不来结果?需要如何选择? 相关物种的最新 pubmed 文章有哪些?.

fay
Download Presentation

Bioinformatics 生物信息学理论和实践 唐继军 jtang@cse.sc 13928761660

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics生物信息学理论和实践唐继军jtang@cse.sc.edu13928761660Bioinformatics生物信息学理论和实践唐继军jtang@cse.sc.edu13928761660

  2. www.cse.sc.edu/~jtang/BJFU

  3. 作业 • GTTGCAGCAATGGTAGACTCAACGGTAGCAATAACTGCAGGACCTAGAGGAAAAACAGTAGGGATTAATAAGCCCTATGGAGCACCAGAAATTACAAAAGATGGTTATAAGGTGATGAAGGGTATCAAGCCTGAA • 为什么用缺省blast出不来结果?需要如何选择? • 相关物种的最新pubmed文章有哪些?

  4. DNA Sequencing capability has grown exponentially DNA sequences in GenBank Doubling time = 18 months

  5. BLAST Algorithm

  6. Sample Multiple Alignment

  7. Bioinformatics Paradigm • Find the data • Download the data • Reformat the data • Collect the samples • Run molecular analysis • Filter the data • Run analysis software • Collect and sort results • Publish / Data sharing

  8. Multi-Sequence FASTA file >FBpp0074027 type=protein; loc=X:complement(16159413..16159860,16160061..16160497); ID=FBpp0074027; name=CG12507-PA; parent=FBgn0030729,FBtr0074248; dbxref=FlyBase:FBpp0074027,FlyBase_Annotation_IDs:CG12507 PA,GB_protein:AAF48569.1,GB_protein:AAF48569; MD5=123b97d79d04a06c66e12fa665e6d801; release=r5.1; species=Dmel; length=294; MRCLMPLLLANCIAANPSFEDPDRSLDMEAKDSSVVDTMGMGMGVLDPTQ PKQMNYQKPPLGYKDYDYYLGSRRMADPYGADNDLSASSAIKIHGEGNLA SLNRPVSGVAHKPLPWYGDYSGKLLASAPPMYPSRSYDPYIRRYDRYDEQ YHRNYPQYFEDMYMHRQRFDPYDSYSPRIPQYPEPYVMYPDRYPDAPPLR DYPKLRRGYIGEPMAPIDSYSSSKYVSSKQSDLSFPVRNERIVYYAHLPE IVRTPYDSGSPEDRNSAPYKLNKKKIKNIQRPLANNSTTYKMTL >FBpp0082232 type=protein; loc=3R:complement(9207109..9207225,9207285..9207431); ID=FBpp0082232; name=mRpS21-PA; parent=FBgn0044511,FBtr0082764; dbxref=FlyBase:FBpp0082232,FlyBase_Annotation_IDs:CG32854-PA,GB_protein:AAN13563.1,GB_protein:AAN13563; MD5=dcf91821f75ffab320491d124a0d816c; release=r5.1; species=Dmel; length=87; MRHVQFLARTVLVQNNNVEEACRLLNRVLGKEELLDQFRRTRFYEKPYQV RRRINFEKCKAIYNEDMNRKIQFVLRKNRAEPFPGCS >FBpp0091159 type=protein; loc=2R:complement(2511337..2511531,2511594..2511767,2511824..2511979,2512032..2512082); ID=FBpp0091159; name=CG33919-PA; parent=FBgn0053919,FBtr0091923; dbxref=FlyBase:FBpp0091159,FlyBase_Annotation_IDs:CG33919-PA,GB_protein:AAZ52801.1,GB_protein:AAZ52801; MD5=c91d880b654cd612d7292676f95038c5; release=r5.1; species=Dmel; length=191; MKLVLVVLLGCCFIGQLTNTQLVYKLKKIECLVNRTRVSNVSCHVKAINW NLAVVNMDCFMIVPLHNPIIRMQVFTKDYSNQYKPFLVDVKIRICEVIER RNFIPYGVIMWKLFKRYTNVNHSCPFSGHLIARDGFLDTSLLPPFPQGFY QVSLVVTDTNSTSTDYVGTMKFFLQAMEHIKSKKTHNLVHN >FBpp0070770 type=protein; loc=X:join(5584802..5585021,5585925..5586137,5586198..5586342,5586410..5586605); ID=FBpp0070770; name=cv-PA; parent=FBgn0000394,FBtr0070804; dbxref=FlyBase:FBpp0070770,FlyBase_Annotation_IDs:CG12410-PA,GB_protein:AAF46063.1,GB_protein:AAF46063; MD5=0626ee34a518f248bbdda11a211f9b14; release=r5.1; species=Dmel; length=257; MEIWRSLTVGTIVLLAIVCFYGTVESCNEVVCASIVSKCMLTQSCKCELK NCSCCKECLKCLGKNYEECCSCVELCPKPNDTRNSLSKKSHVEDFDGVPE LFNAVATPDEGDSFGYNWNVFTFQVDFDKYLKGPKLEKDGHYFLRTNDKN LDEAIQERDNIVTVNCTVIYLDQCVSWNKCRTSCQTTGASSTRWFHDGCC ECVGSTCINYGVNESRCRKCPESKGELGDELDDPMEEEMQDFGESMGPFD GPVNNNY …

  9. Fields

  10. ENTREZis the GenBank web query tool

  11. Advanced query interface:

  12. Expasy.org

  13. Other Important Databases • Genomes • Proteins • Biochemical & Regulatory Pathways • Gene Expression • Genetic Variation (mutants, SNPs) • Protein-Protein Interactions • Gene Ontology (Biological Function)

  14. http://genome.ucsc.edu/

  15. UCSC Genome Browser Search by gene name: or by sequence:

  16. Lots of additional data can be added as optional "tracks" - anything that can be mapped to locations on the genome

  17. ensembl.org

  18. KEGG: Kyoto Encylopedia of Genes and Genomes • Enzymatic and regulatory pathways • Mapped out by EC number and cross-referenced to genes in all known organisms (wherever sequence information exits) • Parallel maps of regulatory pathways

  19. Genome Ontology • Genetics is a messy science • Scientists have been working in isolation on individual species for many years - naming genes, mutants, odd phenotypes • “sonic hedgehog” • Now that we have complete genome sequences, how to reconcile the names across all species? • Genome Ontology uses a single 3 part system • Molecular function (specific tasks) • Biological process (broad biologial goals - e.g cell division) • Cellular component (location)

  20. Unix/Linux

  21. Filename Extensions • Most Linux filenames start with a lower case letter and end with a dot followed by one, two, or three letters: myfile.txt • However, this is just a common convention and is not required. • It is also possible to have additional dots in the filename. • The part of the name following the dot is called the “extension.” • The extension is often used to designate the type of file.

  22. Some Common Extensions • By convention: • files that end in .txt are text files • files that end in .c are source code in the "C” language • files that end in .html are HTML files for the Web • Compressed files have the .zip or .gz extension • Linux does not require these extensions (unlike Windows), but it is a sensible idea and one that you should follow

  23. Working with Directories • Directories are a means of organizing your files on a Linux computer. • They are equivalent to folders on Windows and Macintosh computers • Directories contain files, executable programs, and sub-directories • Understanding how to use directories is crucial to manipulating your files on a Linux system.

  24. Your Home Directory • When you login to the server, you always start in your Home directory. • Create sub-directories to store specific projects or groups of information, just as you would place folders in a filing cabinet. • Do not accumulate thousands of files with cryptic names in your Home directory

  25. File & Directory Commands • This is a minimal list of Linux commands that you must know for file management: • All of these commands can be modified with many options. Learn to use Linux ‘man’ pages for more information.

  26. Navigation • pwd (present working directory) shows the name and location of the directory where you are currently working:> pwd /home/jtang • This is a “pathname,” the slashes indicate sub-directories • The initial slash is the “root” of the whole filesytem • ls (list) gives you a list of the files in the current directory: • > ls assembin4.fasta Misc test2.txt bin temp testfile • Use the ls -l (long) option to get more information about each file > ls -l total 1768 drwxr-x--- 2 browns02 users 8192 Aug 28 18:26 Opioid -rw-r----- 1 browns02 users 6205 May 30 2000 af124329.gb_in2 -rw-r----- 1 browns02 users 131944 May 31 2000 af151074.fasta

  27. Sub-directories • cd (change directory) moves you to another directory >cd Misc > pwd /u/browns02/Misc • mkdir (make directory) creates a new sub-directory inside of the current directory > ls assembler phrap space > mkdir subdir > ls assembler phrap space subdir • rmdir (remove directory) deletes a sub-directory, but the sub-directory must be empty > rmdir subdir > ls assembler phrap space

  28. Shortcuts • There are some important shortcuts in Linux for specifying directories • . (dot) means "the current directory" • .. means "the parent directory" - the directory one level above the current directory, so cd .. will move you up one level • ~ (tilde) means your Home directory, so cd ~ will move you back to your Home. • Just typing a plain cd will also bring you back to your home directory

  29. Create new files • pico • nano • vi/vim • emacs

  30. Programming • perl • python • c/c++ • R • Java

  31. Linux File Protections • File protection (also known as permissions) enables the user to set up a file so that only specific people can read (r), write/delete (w), and execute (x) it. • Write and delete privilege are the same on a Linux system since write privilege allows someone to overwrite a file with a different one.

  32. File Owners and Groups • Linux file permissions are defined according to ownership. The person who creates a file is its owner. • You are the owner of files in your Home directory and all its sub-directories • In addition, there is a concept known as a Group. • Members of a group have privileges to see each other's files. • We create groups as the members of a single lab - the students, technicians, postdocs, visitors, etc. who work for a given PI.

  33. View File Permissions $ ls -l total 2 -rw-r--r-- 1 jtang None 56 Feb 29 11:21 data.txt -rwxr-xr-x 1 jtang None 33 Feb 29 11:21 test.pl • Use the ls -l command to see the permissions for all files in a directory: • The username of the owner is shown in the third column. (The owner of the files listed above is jtang) • The owner belongs to the group “None” • The access rights for these files is shown in the first column. This column consists of 10 characters known as the attributes of the file: r, w, x, and - rindicates read permission w indicates write (and delete) permission x indicates execute (run) permission - indicates no permission for that operation

  34. $ ls -l total 2 -rw-r--r-- 1 jtang None 56 Feb 29 11:21 data.txt -rwxr-xr-x 1 jtang None 33 Feb 29 11:21 test.pl • The first character in the attribute string indicates if a file is a directory (d) or a regular file (-). • The next 3 characters (rwx) give the file permissions for the owner of the file. • The middle 3 characters give the permissions for other members of the owner's group. • The last 3 characters give the permissions for everyone else (others) • The default protections assigned to new files on our system is: -rw-r----- (owner=read and write, group =read, others=nothing)

  35. Change Protections • Only the owner of a file can change its protections • To change the protections on a file use the chmod (change mode) command. [Beware, this is a confusing command.] • Taken all together, it looks like this: > chmod 644 data.txt This will set the owner to have read, write; add the permission for the group and the world to read 600, 755, 700,

More Related