Introduction to the gcg wisconsin package
This presentation is the property of its rightful owner.
Sponsored Links
1 / 64

Introduction to the GCG Wisconsin Package PowerPoint PPT Presentation


  • 56 Views
  • Uploaded on
  • Presentation posted in: General

Introduction to the GCG Wisconsin Package. The Center for Bioinformatics UNC at Chapel Hill Jianping (JP) Jin Ph.D. Bioinformatics Scientist Phone: (919)843-6105 E-mail: [email protected] Fax: (919)843-3103. What is GCG.

Download Presentation

Introduction to the GCG Wisconsin Package

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Introduction to the gcg wisconsin package

Introduction to the GCG Wisconsin Package

The Center for Bioinformatics

UNC at Chapel Hill

Jianping (JP) Jin Ph.D.

Bioinformatics Scientist

Phone: (919)843-6105

E-mail: [email protected]

Fax: (919)843-3103


What is gcg

What is GCG

  • An integrated package of over 130 programs (the GCG Wisconsin Package).

  • For extensive analyses of nucleic acid and protein sequences.

  • Associated with most major public nucleic acid and protein databases.

  • Works on UNIX OS.


Why use gcg

Why use GCG

  • Removes the need for the constant collection of new software by end users.

  • Removes the need to learn new interface as new software is released.

  • Provides a flow of analyses within a single interface.

  • Unix environment allows users to automate complex, repetitive tasks.

  • Allows users to use multiple processors to accelerate their jobs.

  • Supports almost all public databases that can be updated daily. Fast local search.


Flexibility or automation

Flexibility or Automation

  • 1. MEME: upstream regulatory motifs;

  • 2. MotifSearch: genes sharing these potential regulatory motifs;

  • 3. PileUp: multiple sequence alignment;

  • 4. Distances: extract pairwise distances from the alignment;

  • 5. GrowTree: a phylogenetics tree.


Interfaces

Interfaces

  • Command Line: Running programs from UNIX system prompt.

  • SeqLab: Graphic User’s Interface, requiring an X windows display.

  • SeqWeb: to a core set of sequence analysis program.


Limitations with gcg

Limitations with GCG

  • The GUI interface does not give the users the full access to the power of the command line, nor to the complete set of programs.

  • Many programs place a limit of the maximum size of the sequences that they can handle (350 Kb). This limitation will be removed in version 11.


Databases gcg supports

Databases GCG Supports

  • Nucleic acid databases

    • GenBank

    • EMBL (abridged)

  • Protein databases

    • NRL_3D

    • UniProt (SWISS-PROT, PIR, TrEMBL)

    • PROSITE, Pfam,

  • Restriction Enzymes (REBASE)


Database update services

Database Update Services

  • DataServe: Automatically updates nucleic acid on a daily basis via FTP.

  • DataExtended: the most compete set of nucleic acid and protein data. The timing of the release is coordinated with the major GenBank release, 2-3 months.

  • DataBasic: Similar to DataExtended, but excludes EST and GSS data from GenBank and EMBL.


File importing and exporting

File Importing and Exporting

  • Reformat

  • FromEMBL

  • FromGenBank

  • FromPIRToPIR

  • FromStadenToStaden

  • FromIGToIG

  • FromFastAToFastA


File formats with gcg

File Formats with GCG

  • Single sequence files (in GCG format)

  • List (a list of files)

  • MSF (multiple sequence format)

  • RSF (rich sequence format)


Typical program

Typical program


Result from map analysis

Result from MAP analysis


X windows server must be running

X-Windows server must be running


Seqlab main window list mode

SeqLab Main Window (List Mode)


Seqlab editor mode

SeqLab Editor Mode


Display by features

Display by Features


Seqlab editor mode cont

SeqLab Editor Mode (cont.)


Seqlab output manager

SeqLab Output Manager


Gcg programs

GCG Programs

  • 1. Comparison

  • 2. Database Searching and Retrieval

  • 3. DNA/RNA Secondary Structure

  • 4. Editing and Publication

  • 5. Evolution

  • 6. Fragment Assembly

  • 7. Importing and exporting

  • 8. Mapping

  • 9. Primer Selection

  • 10. Protein Analysis

  • 11. Translation


Create your own sequence

Create your own sequence


Plasmidmap

PlasmidMap


Findpatterns

FindPatterns


Hmmerpfam analysis

HmmerPfam Analysis


Gene finding frame

Gene Finding (FRAME)


Restriction enzyme map

Restriction Enzyme Map


Consensus sequence

Consensus Sequence


Phylogenetic tree cladogram

Phylogenetic Tree (Cladogram)


Peptide structure

Peptide Structure


Peptide structure 2

Peptide Structure (2)


Isoelectric analysis

Isoelectric Analysis


Transmemberane domains

Transmemberane Domains


Neucleic acid 2 nd structure

Neucleic Acid 2nd Structure


Pairwise comparison gap

Pairwise Comparison (Gap)

  • Neelman & Wunsch algorithm.

  • A global alignment covering the whole length of both sequences and the resulting sequences are of the same length with inserted gaps.

  • Good when two sequences are closely related.


Pairwise comparison bestfit

Pairwise Comparison (BestFit)

  • Algorithm of Smith and Waterman.

  • Local homology alignment that finds the best segment of similarity b/w two sequences.

  • The most sensitive sequence comparison method available.


Comparison of two sequences

Comparison of two sequences


Gapshow

GapShow


Multiple comparison pileup

Multiple Comparison (PileUp)

  • The method of Feng and Doolittle similar to Higgins & Sharp.

  • A series of progressive pairwise alignments (up to 500 seq.) generate a final alignment.

  • An extension of Gap, not ideal for finding the best local region of similarity, such as a shared motif.


Multiple comparison by pileup

Multiple Comparison by Pileup


Multiple comparison by pileup1

Multiple Comparison by Pileup


Dendrogram by pileup

Dendrogram by Pileup


Database search

Database Search

  • Nearly always employ local alignment algorithms.

  • Often use “heuristic” methods (for a screen), FASTA and BLAST.

  • Assures the seq.are given correct local similarity score, but no guarantee that all seq. with high Smith-Waterman scores pass through the screen.


Blast

BLAST

  • Accepts a number of sequences as input and specify any number of DBs. $Blast –INfile2=PIR,SWPLUS; -INfile=hsp70.msf{*}.

  • Support 5 BLAST programs, but no gap alignment available for TBLASTX.

  • For non-coding nucleotide homology search, considering either reducing the word size from 11 to 6/7, or using the FASTA.

  • The number of scoring matrices is limited, BLOSUM62/45/80 and PAM70 available for –MATRix parameter.


Database search ssearch

Database Search (SSearch)

  • A rigorous Smith-Waterman search for similarity between a query sequence and a group of sequences of the same type.

  • The most sensitive method available for similarity search.

  • Very slow.


Hmmersearch

HmmerSearch

  • Use a profile HMM as a query to search a sequence database.

  • Profile HMM: a position specific scoring table, a statistical model of the consensus of a multiple sequence alignment.

  • Output can be used for any GCG program that accepts list file.


Profile hidden markov model

Profile Hidden Markov Model


Hmmersearch1

HmmerSearch


Hmmersearch cont

HmmerSearch (cont.)


Hmmersearch cont1

HmmerSearch (cont.)


Hs cont histogram of scores

HS (cont.Histogram of scores)


Hs cont resulting alignment

HS (cont. resulting alignment)


Netblast

NetBLAST

  • Sends your query sequences over the internet to a server at NCBI, Bethesda.

  • Some limitations on NetBLAST, e.g. prohibiting TBLASTX search vs. the nr database, only Alu, EST, GSS, STS.

  • Not support as many options as are available with BLAST.


Netblast1

NetBLAST


Psiblast

PSIBLAST

  • Similar to BLAST, except using position-specific scoring matrices during the search.

  • Use protein sequence(s) to iteratively search protein database(s).


Meme and motifsearch

MEME and MotifSearch

  • Multiple EM Motif Elicitation, a tool for discovering motifs in a group of DNA or protein sequences.

  • Motif: a sequence pattern that occurs repeatedly in a group of related sequences.

  • Use a set of MEME profiles to search a database for new sequences similar to the original family.


Meme profile

MEME PROFILE


Meme cont

MEME (cont.)


Growtree cladogram

GrowTree (Cladogram)


Access to gcg on campus

Access to GCG on Campus

  • 1. Onyen and password plus sign up to BioSci service at http://onyen.unc.edu;

  • 2. Computer connected to the Campus network;

  • 3. Postscript printer connected to the campus network;

  • 4. SSH Secure Client;

  • 5. X-Windows Server (optional).


Sign up bioscience

Sign up BioScience


Log onto gcg

Log onto GCG


Log onto gcg cont

Log onto GCG (cont.)


Gcg welcome page

GCG Welcome Page


How to get seqlab to run

How to get seqlab to run

  • Open X-Windows;

  • Logon to the GCG server, nun.isis.unc.edu, through SSH Secure Shell Client;

  • At the prompt ($) enter the command “export DISPLAY=yourMachineIP:0.0;

  • Enter the command “xterm &” to activate the xterm window;

  • On the GCG main window enter the command “seqlab &” to activate the SeqLab GUI.


How to get seqlab to run cont

How to get SeqLab to run (cont.)


  • Login