Introduction to the gcg wisconsin package
1 / 64

Introduction to the GCG Wisconsin Package - PowerPoint PPT Presentation

  • Uploaded on

Introduction to the GCG Wisconsin Package. The Center for Bioinformatics UNC at Chapel Hill Jianping (JP) Jin Ph.D. Bioinformatics Scientist Phone: (919)843-6105 E-mail: [email protected] Fax: (919)843-3103. What is GCG.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Introduction to the GCG Wisconsin Package' - wynter-graham

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Introduction to the gcg wisconsin package

Introduction to the GCG Wisconsin Package

The Center for Bioinformatics

UNC at Chapel Hill

Jianping (JP) Jin Ph.D.

Bioinformatics Scientist

Phone: (919)843-6105

E-mail: [email protected]

Fax: (919)843-3103

What is gcg
What is GCG

  • An integrated package of over 130 programs (the GCG Wisconsin Package).

  • For extensive analyses of nucleic acid and protein sequences.

  • Associated with most major public nucleic acid and protein databases.

  • Works on UNIX OS.

Why use gcg
Why use GCG

  • Removes the need for the constant collection of new software by end users.

  • Removes the need to learn new interface as new software is released.

  • Provides a flow of analyses within a single interface.

  • Unix environment allows users to automate complex, repetitive tasks.

  • Allows users to use multiple processors to accelerate their jobs.

  • Supports almost all public databases that can be updated daily. Fast local search.

Flexibility or automation
Flexibility or Automation

  • 1. MEME: upstream regulatory motifs;

  • 2. MotifSearch: genes sharing these potential regulatory motifs;

  • 3. PileUp: multiple sequence alignment;

  • 4. Distances: extract pairwise distances from the alignment;

  • 5. GrowTree: a phylogenetics tree.


  • Command Line: Running programs from UNIX system prompt.

  • SeqLab: Graphic User’s Interface, requiring an X windows display.

  • SeqWeb: to a core set of sequence analysis program.

Limitations with gcg
Limitations with GCG

  • The GUI interface does not give the users the full access to the power of the command line, nor to the complete set of programs.

  • Many programs place a limit of the maximum size of the sequences that they can handle (350 Kb). This limitation will be removed in version 11.

Databases gcg supports
Databases GCG Supports

  • Nucleic acid databases

    • GenBank

    • EMBL (abridged)

  • Protein databases

    • NRL_3D

    • UniProt (SWISS-PROT, PIR, TrEMBL)

    • PROSITE, Pfam,

  • Restriction Enzymes (REBASE)

Database update services
Database Update Services

  • DataServe: Automatically updates nucleic acid on a daily basis via FTP.

  • DataExtended: the most compete set of nucleic acid and protein data. The timing of the release is coordinated with the major GenBank release, 2-3 months.

  • DataBasic: Similar to DataExtended, but excludes EST and GSS data from GenBank and EMBL.

File importing and exporting
File Importing and Exporting

  • Reformat

  • FromEMBL

  • FromGenBank

  • FromPIR ToPIR

  • FromStaden ToStaden

  • FromIG ToIG

  • FromFastA ToFastA

File formats with gcg
File Formats with GCG

  • Single sequence files (in GCG format)

  • List (a list of files)

  • MSF (multiple sequence format)

  • RSF (rich sequence format)

Gcg programs
GCG Programs

  • 1. Comparison

  • 2. Database Searching and Retrieval

  • 3. DNA/RNA Secondary Structure

  • 4. Editing and Publication

  • 5. Evolution

  • 6. Fragment Assembly

  • 7. Importing and exporting

  • 8. Mapping

  • 9. Primer Selection

  • 10. Protein Analysis

  • 11. Translation

Neucleic acid 2 nd structure
Neucleic Acid 2nd Structure

Pairwise comparison gap
Pairwise Comparison (Gap)

  • Neelman & Wunsch algorithm.

  • A global alignment covering the whole length of both sequences and the resulting sequences are of the same length with inserted gaps.

  • Good when two sequences are closely related.

Pairwise comparison bestfit
Pairwise Comparison (BestFit)

  • Algorithm of Smith and Waterman.

  • Local homology alignment that finds the best segment of similarity b/w two sequences.

  • The most sensitive sequence comparison method available.

Multiple comparison pileup
Multiple Comparison (PileUp)

  • The method of Feng and Doolittle similar to Higgins & Sharp.

  • A series of progressive pairwise alignments (up to 500 seq.) generate a final alignment.

  • An extension of Gap, not ideal for finding the best local region of similarity, such as a shared motif.

Database search
Database Search

  • Nearly always employ local alignment algorithms.

  • Often use “heuristic” methods (for a screen), FASTA and BLAST.

  • Assures the seq.are given correct local similarity score, but no guarantee that all seq. with high Smith-Waterman scores pass through the screen.


  • Accepts a number of sequences as input and specify any number of DBs. $Blast –INfile2=PIR,SWPLUS; -INfile=hsp70.msf{*}.

  • Support 5 BLAST programs, but no gap alignment available for TBLASTX.

  • For non-coding nucleotide homology search, considering either reducing the word size from 11 to 6/7, or using the FASTA.

  • The number of scoring matrices is limited, BLOSUM62/45/80 and PAM70 available for –MATRix parameter.

Database search ssearch
Database Search (SSearch)

  • A rigorous Smith-Waterman search for similarity between a query sequence and a group of sequences of the same type.

  • The most sensitive method available for similarity search.

  • Very slow.


  • Use a profile HMM as a query to search a sequence database.

  • Profile HMM: a position specific scoring table, a statistical model of the consensus of a multiple sequence alignment.

  • Output can be used for any GCG program that accepts list file.


  • Sends your query sequences over the internet to a server at NCBI, Bethesda.

  • Some limitations on NetBLAST, e.g. prohibiting TBLASTX search vs. the nr database, only Alu, EST, GSS, STS.

  • Not support as many options as are available with BLAST.


  • Similar to BLAST, except using position-specific scoring matrices during the search.

  • Use protein sequence(s) to iteratively search protein database(s).

Meme and motifsearch
MEME and MotifSearch

  • Multiple EM Motif Elicitation, a tool for discovering motifs in a group of DNA or protein sequences.

  • Motif: a sequence pattern that occurs repeatedly in a group of related sequences.

  • Use a set of MEME profiles to search a database for new sequences similar to the original family.

Access to gcg on campus
Access to GCG on Campus

  • 1. Onyen and password plus sign up to BioSci service at;

  • 2. Computer connected to the Campus network;

  • 3. Postscript printer connected to the campus network;

  • 4. SSH Secure Client;

  • 5. X-Windows Server (optional).

How to get seqlab to run
How to get seqlab to run

  • Open X-Windows;

  • Logon to the GCG server,, through SSH Secure Shell Client;

  • At the prompt ($) enter the command “export DISPLAY=yourMachineIP:0.0;

  • Enter the command “xterm &” to activate the xterm window;

  • On the GCG main window enter the command “seqlab &” to activate the SeqLab GUI.