introduction to the gcg wisconsin package
Download
Skip this Video
Download Presentation
Introduction to the GCG Wisconsin Package

Loading in 2 Seconds...

play fullscreen
1 / 64

Introduction to the GCG Wisconsin Package - PowerPoint PPT Presentation


  • 89 Views
  • Uploaded on

Introduction to the GCG Wisconsin Package. The Center for Bioinformatics UNC at Chapel Hill Jianping (JP) Jin Ph.D. Bioinformatics Scientist Phone: (919)843-6105 E-mail: [email protected] Fax: (919)843-3103. What is GCG.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Introduction to the GCG Wisconsin Package' - wynter-graham


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
introduction to the gcg wisconsin package

Introduction to the GCG Wisconsin Package

The Center for Bioinformatics

UNC at Chapel Hill

Jianping (JP) Jin Ph.D.

Bioinformatics Scientist

Phone: (919)843-6105

E-mail: [email protected]

Fax: (919)843-3103

what is gcg
What is GCG
  • An integrated package of over 130 programs (the GCG Wisconsin Package).
  • For extensive analyses of nucleic acid and protein sequences.
  • Associated with most major public nucleic acid and protein databases.
  • Works on UNIX OS.
why use gcg
Why use GCG
  • Removes the need for the constant collection of new software by end users.
  • Removes the need to learn new interface as new software is released.
  • Provides a flow of analyses within a single interface.
  • Unix environment allows users to automate complex, repetitive tasks.
  • Allows users to use multiple processors to accelerate their jobs.
  • Supports almost all public databases that can be updated daily. Fast local search.
flexibility or automation
Flexibility or Automation
  • 1. MEME: upstream regulatory motifs;
  • 2. MotifSearch: genes sharing these potential regulatory motifs;
  • 3. PileUp: multiple sequence alignment;
  • 4. Distances: extract pairwise distances from the alignment;
  • 5. GrowTree: a phylogenetics tree.
interfaces
Interfaces
  • Command Line: Running programs from UNIX system prompt.
  • SeqLab: Graphic User’s Interface, requiring an X windows display.
  • SeqWeb: to a core set of sequence analysis program.
limitations with gcg
Limitations with GCG
  • The GUI interface does not give the users the full access to the power of the command line, nor to the complete set of programs.
  • Many programs place a limit of the maximum size of the sequences that they can handle (350 Kb). This limitation will be removed in version 11.
databases gcg supports
Databases GCG Supports
  • Nucleic acid databases
    • GenBank
    • EMBL (abridged)
  • Protein databases
    • NRL_3D
    • UniProt (SWISS-PROT, PIR, TrEMBL)
    • PROSITE, Pfam,
  • Restriction Enzymes (REBASE)
database update services
Database Update Services
  • DataServe: Automatically updates nucleic acid on a daily basis via FTP.
  • DataExtended: the most compete set of nucleic acid and protein data. The timing of the release is coordinated with the major GenBank release, 2-3 months.
  • DataBasic: Similar to DataExtended, but excludes EST and GSS data from GenBank and EMBL.
file importing and exporting
File Importing and Exporting
  • Reformat
  • FromEMBL
  • FromGenBank
  • FromPIR ToPIR
  • FromStaden ToStaden
  • FromIG ToIG
  • FromFastA ToFastA
file formats with gcg
File Formats with GCG
  • Single sequence files (in GCG format)
  • List (a list of files)
  • MSF (multiple sequence format)
  • RSF (rich sequence format)
gcg programs
GCG Programs
  • 1. Comparison
  • 2. Database Searching and Retrieval
  • 3. DNA/RNA Secondary Structure
  • 4. Editing and Publication
  • 5. Evolution
  • 6. Fragment Assembly
  • 7. Importing and exporting
  • 8. Mapping
  • 9. Primer Selection
  • 10. Protein Analysis
  • 11. Translation
pairwise comparison gap
Pairwise Comparison (Gap)
  • Neelman & Wunsch algorithm.
  • A global alignment covering the whole length of both sequences and the resulting sequences are of the same length with inserted gaps.
  • Good when two sequences are closely related.
pairwise comparison bestfit
Pairwise Comparison (BestFit)
  • Algorithm of Smith and Waterman.
  • Local homology alignment that finds the best segment of similarity b/w two sequences.
  • The most sensitive sequence comparison method available.
multiple comparison pileup
Multiple Comparison (PileUp)
  • The method of Feng and Doolittle similar to Higgins & Sharp.
  • A series of progressive pairwise alignments (up to 500 seq.) generate a final alignment.
  • An extension of Gap, not ideal for finding the best local region of similarity, such as a shared motif.
database search
Database Search
  • Nearly always employ local alignment algorithms.
  • Often use “heuristic” methods (for a screen), FASTA and BLAST.
  • Assures the seq.are given correct local similarity score, but no guarantee that all seq. with high Smith-Waterman scores pass through the screen.
blast
BLAST
  • Accepts a number of sequences as input and specify any number of DBs. $Blast –INfile2=PIR,SWPLUS; -INfile=hsp70.msf{*}.
  • Support 5 BLAST programs, but no gap alignment available for TBLASTX.
  • For non-coding nucleotide homology search, considering either reducing the word size from 11 to 6/7, or using the FASTA.
  • The number of scoring matrices is limited, BLOSUM62/45/80 and PAM70 available for –MATRix parameter.
database search ssearch
Database Search (SSearch)
  • A rigorous Smith-Waterman search for similarity between a query sequence and a group of sequences of the same type.
  • The most sensitive method available for similarity search.
  • Very slow.
hmmersearch
HmmerSearch
  • Use a profile HMM as a query to search a sequence database.
  • Profile HMM: a position specific scoring table, a statistical model of the consensus of a multiple sequence alignment.
  • Output can be used for any GCG program that accepts list file.
netblast
NetBLAST
  • Sends your query sequences over the internet to a server at NCBI, Bethesda.
  • Some limitations on NetBLAST, e.g. prohibiting TBLASTX search vs. the nr database, only Alu, EST, GSS, STS.
  • Not support as many options as are available with BLAST.
psiblast
PSIBLAST
  • Similar to BLAST, except using position-specific scoring matrices during the search.
  • Use protein sequence(s) to iteratively search protein database(s).
meme and motifsearch
MEME and MotifSearch
  • Multiple EM Motif Elicitation, a tool for discovering motifs in a group of DNA or protein sequences.
  • Motif: a sequence pattern that occurs repeatedly in a group of related sequences.
  • Use a set of MEME profiles to search a database for new sequences similar to the original family.
access to gcg on campus
Access to GCG on Campus
  • 1. Onyen and password plus sign up to BioSci service at http://onyen.unc.edu;
  • 2. Computer connected to the Campus network;
  • 3. Postscript printer connected to the campus network;
  • 4. SSH Secure Client;
  • 5. X-Windows Server (optional).
how to get seqlab to run
How to get seqlab to run
  • Open X-Windows;
  • Logon to the GCG server, nun.isis.unc.edu, through SSH Secure Shell Client;
  • At the prompt ($) enter the command “export DISPLAY=yourMachineIP:0.0;
  • Enter the command “xterm &” to activate the xterm window;
  • On the GCG main window enter the command “seqlab &” to activate the SeqLab GUI.
ad