Instruction to use the SVARAP program Plan - PowerPoint PPT Presentation

Instruction to use the svarap program plan l.jpg
Download
1 / 26

Instruction to use the SVARAP program Plan Principle of SVARAP program Use of SVARAP: GDE Alignment Formatting the GDE alignment Variability analysis Activation of « macros » Pasting the GDE alignment Checking-up the GDE alignment format

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Instruction to use the SVARAP program Plan

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Instruction to use the svarap program plan l.jpg

Instruction to use the SVARAP programPlan

  • Principle of SVARAP program

  • Use of SVARAP:

    • GDE Alignment

    • Formatting the GDE alignment

    • Variability analysis

      • Activation of « macros »

      • Pasting the GDE alignment

      • Checking-up the GDE alignment format

      • Rough data of variability analysis by nucleotidic site

      • Variability analysis by window of 50 nucleotides for 2000 nucleotides length

      • Variability analysis by nucleotidic site for 2000 nucleotides length

  • Program ASVARAP: study of amino acid variability

  • Examples

  • Download / References

  • Contact


Principle of svarap program l.jpg

Principle of SVARAP program

  • « SVARAP » (Sequence VARiability Analysis Program) analyses, evidences and graphically represents variability or genetic diversity of nucleotidic sequences. Ii uses a Microsoft Excel® file which is able to analyse simultaneously up to 100 séquences of up to 4000 nucleotides.

  • Variability is defined as the proportion of analysed sequences for which the nucleotide at a given position is not the most frequently found in the studied set of sequences.

  • The program generates graphes and calculates mean, median, minimal and maximal values, and coefficient of variation for windows of 50 nucleotides. It also analyses site by site.

  • Classically, tools aligning sequences identify sites and natures of nucleotidic differences. Quantitative analysis of variability or diversity may increase the level of information to find some discriminant or conserved regions, which could be aimed by PCR; or highly polymorphic « spots ».

    Thompson J. D., Gibson T. J., Plewniak F., Jeanmougin F., Higgins D. G. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 1997, 25(24) : 4876-82.

Next


How svarap works l.jpg

How SVARAP works ?

  • Sequences are aligned and the alignement in GDE format is copied then pasted in a cell of our program that format the sequences to facilitate future analysis. Notably, each nucleotide stand in a different cell to get in a same column the nucleotides corresponding to a same nucleotidic site.

  • Consensus nucleotide at each nucleotidic site (defined as the most frequently found at this position for the studied set of sequences) is automatically generated.

  • The program simultaneously calculates the absolute numbers of each of the 4 nucleotides (G, A, C, T, or deletions or insertions), and their frequencies (en %). Diversity or variability is defined as the proportion of sequences for which, at a given site, nucleotide differ of the nucleotide which is the most frequently found for the studied set of sequences. It is calculated with the formula: 100 – (maximal value in % of frequency for each of the four nucleotide at a given nucleotidic site).The program also calculates the number of nucleotides of different nature harbored at a given site. Results are analysed to calculate for windows of 50 nucleotides the median, mean, minimal and maximal values of variability. Concommitantly, a site by site analysis is also done and given for length of 2000 nucleotides.

  • Finally, SVARAP graphically represents the diversity/variability.


Alignment of sequences in gde format l.jpg

Alignment of sequences in GDE format

  • Initial « material » is a set of sequences (maximuml 100 sequences).

  • SVARAP uses an alignment in GDE format (Genetic Data Environment). Firstly, sequences are aligned with ClustalX v.1.8 [Thompson, 1997] after asking in the Output Format Options for creation of a GDE file. Then, the alignment is copied then pasted in a cell of our file Microsoft Excel® nommé « AnaVarNuc_Pos… ».

Next


To get an alignment in gde format using clustal x v1 8 1 2 l.jpg

To get an alignment in GDE format using clustal X v1.8 (1/2)

  • Open ClustalX (1.8) and append sequences in FASTA format.

  • Select tab « Alignment », then output Format Options...

Next


To get an alignment in gde format using clustal x v1 8 2 2 l.jpg

To get an alignment in GDE format using clustal X v1.8(2/2)

  • Select GDE format.

  • Start alignment.

  • Locate the GDE file.


Formatting the gde alignment using microsoft word l.jpg

Formatting the GDE alignment using Microsoft Word®

  • Like for most of sequences analysis, it is necessary to format sequences.

  • Copy then paste in a Microsoft Word® then 1/ delete all paragraphe jump; 2/ replace the « - » by another kind (. for instance) that do not lead to paragraph jump; 3/ add a paragraph jump before the name of sequences. Then paste a paragraph jump (<enter>) after the name of sequences (and before the 1st nucleotide).


Activating macros l.jpg

Activating « macros »

  • The Microsoft Excel® contains « macros ». It is necessary to activate them to use the file; it is possible to suppress this step :


Pasting the gde alignment in svarap l.jpg

Pasting the GDE alignment in SVARAP

1

2

3

How to analyse > 4000 nucleotides or > 2000 nucleotides simultaneously.

Link to final analysis

4

1

  • 1. 2 files, analysing variability for nucleotides 1 to 2000 or 2001 to 4000, are downloadable, as analysis for 4000 nucleotides cannot be done simultaneously.

  • 2. When using this program: click on column B then key <Suppr> to delete prior work.

  • 3. Paste in a same cell (white space, cell B2, the GDE alignment formatted using Microsoft Word®).

  • Sheet « Paste the alignment »

2

3


Verify format of gde alignment 1 2 l.jpg

Verify format of GDE alignment (1/2)

  • In column A, only sequence name, and in columns F, I, L and O, only sequences. Right number of sequences.

  • If not: check the GDE alignment.

  • Sheet « Sep1000 »

Next


Verify format of gde alignment 2 2 l.jpg

Verify format of GDE alignment(2/2)

  • In column B, only sequence name, and in column C, only sequences. Right number of sequences.

  • If not: check the GDE alignment.

  • Sheet « Nuc 1-1000 » and « Nuc 1001-2000 »


Analysis of variability l.jpg

Analysis of variability

2

5

6

3

1

4

  • This sheet and the table contain the main part of analysis of variability: the level of variability (1.) correspond to the proportion of sequences for which, at a given nucleotidic site, the nucleotide differ compared with the nucleotide the most frequently found in the studied set of sequences. Positions that are defined (2.) correspond to those defined in your set of sequences. The number of distinct variations (3.) correspond to the number of different nucleotides observed at a given site.

  • This analysis is done by windows of 200 bases for reasons related to Microsoft Excel software (4.).

  • 5. Analysis in absolute value. 6. Analysis in %

  • Sheets « Var...»

1

2

3

4

5

6

Next


Consensus sequence on a length of 2000 nucleotides l.jpg

Consensus sequence on a length of 2000 nucleotides

1

  • The consensus nucleotide is calculated for each of the nucleotidic sites on the whole length of the studied sequences.

  • # (1.) correspond to an indetermination:

    examples: major representation equivalent for 2 nucleotides; insertions or deletions as major representation.

  • Sheet « Consensus »

1

Next


Rough data of variability by nucleotidic site on a length of 2000 nucleotides l.jpg

Rough data of variability by nucleotidic site on a length of 2000 nucleotides

  • The variability is calculated for each of the nucleotidic positions on the whole length of the studied sequences.

  • Sheet « Consensus »


Analysis by window of 50 nucleotides l.jpg

Analysis by window of 50 nucleotides

  • Variability is calculated and analysed by windows of 50 nucleotides on the whole length of the studied sequences. The analysis is available:

  • in tables Sheet « Data fen 50 »

  • in graphe Sheet « Fig 1-2000 fen 50 »


Analysis by nucleotidic site for a length of 2000 nucleotides 1 2 l.jpg

Analysis by nucleotidic site for a length of 2000 nucleotides (1/2)

1

  • A graph for variability calculated for each of the nucleotidic sites on the whole length of the studied sequences is systematically generated.

  • Sheet « Fig var par position »

  • Each window of 250 nucleotides can be printed separately or copied then pasted in another software (1.). Or all 2000 nucleotides are printable at the same time:

1

Next


Analysis by nucleotidic site for a length of 2000 nucleotides 2 2 l.jpg

Analysis by nucleotidic site for a length of 2000 nucleotides(2/2)

  • Look before printing of the variability calculated for each of the nucleotidic positions on the whole length for the studied sequences.

  • Sheet « Fig var par position »


How to analyse more than 4000 nucleotides l.jpg

How to analyse more than 4000 nucleotides

This program is not only limited concerning the length of studied sequences. It can analyse more than 4000 nucleotides, and more than 2000 nucleotides at the same time.

To analyse more than 4000 nucleotides:

  • Copy the file « AnaVarNuc_Pos 1-2000 »

  • Go to sheet « Paste alignment »

  • Unmask all columns (<Format><Colonnes><Afficher>)

  • Go to cells F2 to F201 and replace 1 by the starting site to analyse in your alignment (e.g. 8000, or 10224); then replace in column G2 to G201, respectively 1001 by a value incremented of 1000 vs the one written in column F (e.g. 9000, or 11224)

  • You have so programmed the analysis of nucleotides 8000 to 10000, or 10224 to 12224.


How to analyse more than 2000 nucleotides at the same time l.jpg

How to analyse more than 2000 nucleotides at the same time

This program is not only limited concerning the length of studied sequences. It can analyse more than 4000 nucleotides, and more than 2000 nucleotides at the same time.

To analyse more than 2000 nucleotides at the same time:

  • Use the values of variability for 2000 nucleotidic sites ad stored in the sheet called « consensus ». When copying in a new Microsoft Excel® file these values by 2000 nucleotides from several files, you are creating graphics for the appropriate length.


Applications for svarap l.jpg

Applications for SVARAP

An example of use of SVARAP

  • SVARAP produces rapidly graphical representations which can be easily interpreted.

  • It leads in a first step to analyse genetic diversity in a set of sequences by windows of 50 nucleotides.

  • A more precise information is also available with site by site analysis.

Next


Contact l.jpg

Contact

  • For informations or questions, you can contact me at : Philippe.Colson@ap-hm.fr


Download l.jpg

Download

Download the instructions

for use of SVARAP

Download SVARAP to analyse

nucleotidic positions 1 to 2000

(Microsoft Excel® v97)

Link to Clustal X v1.8

Download SVARAP to analyse

nucleotidic positions 2001 to 4000

(Microsoft Excel® v97)

Download ASVARAP to analyse

amino acid positions 1 to 1000

(Microsoft Excel® v97)

References

  • URL: http://ifr48.free.fr/recherche/jeu_cadre/jeu_rickettsie.html


1 delete the paragraph jump l.jpg

1/ Delete the paragraph jump

In Microsoft Word® v97 - French edition:

  • <Edition><Remplacer><Plus><Spécial><Marque de paragraphe><Remplacer tout>


2 replace dashes l.jpg

2/ Replace dashes

To copy then paste

In Microsoft Word® v97 - French edition:

  • <Edition><Remplacer>

  • Dans rechercher: -

  • Dans remplacer par: ―

  • <Remplacer tout>


3 add paragraph jumps before and after the name of sequences l.jpg

3/ Add paragraph jumps before and after the name of sequences.

In Microsoft Word® v97 - French edition:

  • <Edition>

  • Dans rechercher: #

  • Dans remplacer par: # par Marque de paragraphe#


Application asvarap l.jpg

Application ASVARAP

  • The study of variability can also concern amino acid sequences (amino acids 1 to 1000). The principle and use are the same as for SVARAP :

    Download


  • Login