surabhi agarwal n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Genome Databases and Analysis PowerPoint Presentation
Download Presentation
Genome Databases and Analysis

Loading in 2 Seconds...

play fullscreen
1 / 29

Genome Databases and Analysis - PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on

Surabhi Agarwal. Genome Databases and Analysis.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Genome Databases and Analysis


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
surabhi agarwal
Surabhi AgarwalGenome Databases and Analysis

With the advent of the genome sequencing technology, biological research has now easy and fast access to the complete DNA sequences of many organisms. This DNA sequence information, when stored with the help of databases, can be used for comparative genomics research.

master layout part 1
Master Layout: Part 1

1

This animation consists of 3 parts:

Part 1: Genome sequencing protocol

Part 2: Genome databases

Part 3: Genome alignment and its analysis

Extract the DNA of the organism whose genome is to be sequenced

2

Fragment the genomic DNA and integrate it with Bacterial Artificial Chromosome vectors

Genomic DNA

3

Sequence the BAC fragments using DNA sequencing techniques

DNA fragments

4

GTTCTGGGACCTTTTCAAACTGAAGAGAGGAGGCTGGCTGCATCATGGGAGAAGAGACTATTGGGAAGAAGTTACCTGCAACTACAGCAACTCCAGACTCATCAAAAACAGAAATGGACAGCAGGACAAAGAGCAAGGATTACTGCAAAGTAATATTTCCATATGAGGCACAGAATGATGATGAATTGACAATCAAAGAAGGAGATATAGTCACTCTCATCAATAAGGACTGCATCGACGTAGGCTGGTGGGAAGGAGAGCTGAACGGCAGACGAGGCGTGTTCCCCGATAACTTCGTGAAGTTACTTCCACCGGACTTTGAAAAGGAAGGGAATAGACCCAAGAAGCCACCGCC

Protein sequences determined and stored in databases for future usage

Generate a detailed physical map of the genome with clones derived from each chromosome organized in a series of contigs

5

Completed DNA sequence

definitions of the components part 1 genome sequencing protocol
Definitions of the componentsPart 1 – Genome sequencing protocol

1

1. Genome: The complete hereditary information of an organism is referred to as the genome.

2. Restriction Enzyme: An enzyme that cleaves double-stranded or single-stranded DNA into smaller fragments at specific recognizable DNA sequences called restriction sites.

3. Bacterial Artificial Chromosomes (BACs):Bacterial Artificial Chromosomes are DNA constructs that are useful for cloning purposes. These cloning vectors can carry DNA inserts of around 150-350 kbp and have been extremely useful in genome sequencing projects carried out.

4.Sticky Ends: A cohesive or sticky end of DNA refers to those DNA molecules having a 3’ or 5’ overhang region after they have been cleaved by the restriction enzyme. These overhangs possess nucleotide sequences that are complementary to the cloning vector and can therefore easily anneal with the cloning vector.

5. DNA ligase: Enzyme that is involved in repairing or joining single stranded breaks or discontinuities in double stranded DNA.

6. Recombinant BAC: Those BAC vectors that possess the recombinant DNA, i.e., plasmid DNA integrated with the foreign DNA to be cloned.

2

3

4

5

definitions of the components part 1 genome sequencing protocol1
Definitions of the componentsPart 1 – Genome sequencing protocol

1

7. Contigs: Set of over-lapping common DNA fragments that are derived from a single genetic source. These contigs are mapped to deduce the complete chromosome sequence.

8. Shotgun Sequencing: A DNA sequencing method in which a long DNA fragment is first broken down into smaller fragments. Each small fragment is then sequenced using established DNA sequencing protocols such as Sanger’s chain termination method. This fragmentation-sequencing protocol is repeated several times with enzymes of different specificities to obtain multiple reads. Overlapping ends of different reads are then arranged using automated computerized programs.

9. Pyrosequencing: A DNA sequencing strategy that makes use of the real time detection of pyrophosphate generated by the addition of a nucleotide to a growing DNA strand based on its corresponding DNA template.

10. Physical Map: Maps that provide the DNA-base pair distances from one nucleotide to another are known as physical maps.

2

3

4

5

step 1 genome sequencing
Step 1: Genome Sequencing

1

v

DNA DIGESTED INTO FRAGMENTS

DNA FRAGMENTS WITH APPROPRIATE STICKY ENDS

GENOMIC DNA

2

RECOMBINANT BAC

E. coli CELLS

DNA LIGASE

+

+

3

ELECTROPORATION

BAC VECTOR TREATED WITH SAME RESTRICTION ENDONUCLEASE

RESTRICTION ENZYME

TRANSFORMED E. coli CELLS

4

Action

Description of the action

Audio Narration

The genomic DNA is cleaved into fragments by restriction enzymes that cut the DNA at specific sequences known as restriction sites. The genomic DNA breaks into smaller fragments . The BAC vector is cleaved at its restriction site using the same restriction endonuclease. The DNA fragment having suitable sticky ends is then integrated with the BAC vector and annealed using DNA ligase. This recombinant DNA is then incorporated into bacterial cells such as E. coli.

Sequential steps of an Experimental Process

Follow the steps in the animation. Animator needs to re-draw all figures in the final animation. The pink curve “Restriction Enzyme” is shown to attach to the “Genomic DNA”. Show the “Genomic DNA” getting broken into fragments. Chose one fragment and attach it to ring shaped figure ”BAC Vector”. The attached figure integrates to The cell. Follow it with the last figure.

5

Biochemistry by A.L.Lehninger et al., 3rd edition

step 2 genome sequencing
Step 2: Genome Sequencing

1

CONTIGS ARE IDETIFIED AND MAPPED

GTTCTGGGACCTTTTCAAACTGAAGAGAGGAGGCTGGCTGCATCATGGGAGAAGAGACTATTGGGAAGAAGTTACCTGCAACTACAGCAACTCCAGACTCATCAAAAACAG

2

BAC TO BE SEQUENCED IS FRAGMENTED

SEQUENCE OVERLAPS REVEAL FINAL SEQUENCE

3

FRAGMENTS ARE SEQUENCED AT RANDOM

Action

Description of the action

Audio Narration

4

The genomic DNA fragments of the library are then organized into a physical map and aligned as contigs, after which a particular contig is identified for further sequencing. The BAC selected for sequencing is fragmented and then subjected to methodologies such as Sanger’s method and pyrosequencing. The sequence of the clone is then deduced by aligning them based on their overlapping regions. The entire genomic sequence is then obtained once each BAC is sequenced in this manner. For a detailed study of the various methods of sequencing , refer to the OSCAR animation titled “Genomics”

Sequential steps of an Experimental Process

Show figure 1 which has a cluster of aligned fragments. Select one fragment and break it further into smaller units. Show the unit getting sequenced by one of the 3 technologies. This is followed by a the nest figure which shows the overlap in sequences. This is followed by the fully sequenced peptide

5

Biochemistry by A.L.Lehninger et al., 3rd edition

master layout part 2
Master Layout: Part 2

1

This animation consists of 3 parts:

Part 1: Genome sequencing protocol

Part 2: Genome Databases

Part 3: Genome alignment and its analysis

GTTCTGGGACCTTTTCAAACTGAAGAGAGGAGGCTGGCTGCATCATGGGAGAAGAGACTATTGGGAAGAAGTTACCTGCAACTACAGCAACTCCAGACTCATCAAAAACAGAAATGGACAGCAGGACAAAGAGCAAGGATTACTGCAAAGTAATATTTCCATATGAGGCACAGAATGATGATGAATTGACAATCAAAGAAGGAGATATAGTCACTCTCATCAATAAGGACTGCATCGACGTAGGCTGGTGGGAAGGAGAGCTGAACGGCAGACGAGGCGTGTTCCCCGATAACTTCGTGAAGTTACTTCCACCGGACTTTGAAAAGGAAGGGAATAGACCCAAGAAGCCACCGCC

2

Establish, Maintain and Disseminate the Genomic data of various organisms

3

4

Organization, Search and Retrieval of Genomic Data

5

definitions of the components part 2 genome databases
Definitions of the componentsPart 2 – Genome databases

1

  • Nucleotide Database: A collection of records of the nucleotide sequences that are related to the DNA of an organism. This includes gene sequences, genome sequences, Expression Sequence Tags (EST) etc.
  • Accession Number: This is a unique identification number that is given to each of the sequence entries in biological databases that provide easy access directly to the sequence of interest. These accession numbers are modified every time the sequence gets updated. The identifiers also vary with each database.
  • ENTREZ: It is an integrated search portal that has features which enable the user to search many distinct biological databases simultaneously.
  • International Nucleotide Sequence Database Collaboration (INSDC): INSDC is an International collaboration that has been established for exchanging and sharing Nucleotide Sequence Data. This includes collection and dissemination of all DNA and RNA sequences generated by the members of INSDC.
  • Word –length: The minimum length of the initial set of nucleotides, which needs to be matched completely, before alignment extension of the two sequences can be initiated. Sensitivity and speed of the search can be regulated by increasing or decreasing the word-size.
  • Threshold: It refers to the expected number of matches between nucleotide bases that can occur by chance. The statistical significance of the results can be judged based on this parameter. The default value for most cases are 10, which implies that in a random model, 10 such matches are expected to be found merely by chance.

2

3

4

5

definitions of the components part 2 genome databases1
Definitions of the componentsPart 2 – Genome databases

1

Gap Penalty and Gap Extension: During an alignment of two or more given nucleotide sequences, a gap is introduced wherever a base mismatch occurs. In this context, “Gap penalty” refers to a deduction in the overall alignment score on introduction of a gap while the “Gap Extension” is for extending an already existing gap.

Alignment Score: This is also referred to as the Bit Score and provides a comparative quantification of the quality of alignment. The score, increases when a higher number of residue matches and lower number of mismatches are encountered. The alignment having a higher bit score is a better match.

Match-Mismatch Scores: During alignment of nucleotide sequences, the scoring system used adds a “Reward” score for matching bases and subtracts a “Penalty” score for mismatching bases. These scores are represented as pairs of values in the BLAST algorithm.

Percentage Identity: This indicates the percentage of nucleotide bases that are an identical match to each other during the comparison of two sequences.

E-value: E-value provides a quantification of any chance alignment between two or more sequences instead of them being a biologically significant match. For similarity match against a database, this value is dependant on the size of the database against which the sequence is compared. The closer the e-value is to zero, the higher is the biological significance of the match.

2

3

4

5

step 1 a submit a sequence
Step 1.a: Submit a Sequence

1

NUCLEOTIDE DATABASE

2

SUBMIT A SEQUENCE

SEARCH THE DATABASE

ANALYSIS TOOLS

Submit your sequence here

3

VERIFYING…

Albumin_S

TATCTTTTCTATCAACCCCACAAAACTTTGGCACAATGAAGTGGGTGACTTTTATTTCTCTTCTCCTTCTCTTCAGCTCTGCTTATTCCAGGGGTGTGTTTCGTCGAGATACACGTAAGAATTCTAGTTTTCAATTGTTCAACTTTTCTTTCCTAGACAAGAATGTTTCAGTAAACCTTGAATCATTAATGA

SUBMIT

4

Action

Description of the action

Audio Narration

To submit a sequence in a nucleotide database, it must be entered in any one of the sites of the members of the International Nucleotide Sequence Database Collaboration consisting of NCBI, EBI-EMBL and DDBJ. Upon submission, these entries are verified for their source of retrieval and publication details.

Entries in Web-server

Animator needs to Re-draw all the images. Show the layout of the database at first. Then show clicking effect on “SUBMIT A SEQUENCE”. Show the input of sequence in white box followed by Clicking effect on “SUBMIT”. Show the “VERIFYING…” sign in a waiting mode. While the Verifying goes on, show the diagram on the next slide.

5

step 1 b submit a sequence
Step 1.b: Submit a Sequence

1

VERIFYING…

Information exchange between databases takes place for verification process.

National Centre for Biotechnology Information

NCBI

2

3

European Bioinformatics Institute

DNA Data bank of Japan

EBI

DDBJ

4

Action

Description of the action

Audio Narration

Entries in Web-server

The newly entered sequences are exchanged between the three member servers on a daily basis and verified by them. This helps in keeping track of the updates in sequencing information and sharing data that is useful for research.

Re-create all the images and screen-shots. This is the image that flashes in front of the screen after the previous slide while the “VERIFYING…” button is in the wait mode.

5

step 1 c submit a sequence
Step 1.c: Submit a Sequence

1

NUCLEOTIDE DATABASE

2

SUBMIT A SEQUENCE

SEARCH THE DATABASE

ANALYSIS TOOLS

Nucleotide Sequence Database

Submit your sequence here

Albumin

TATCTTTTCTATCAACCCCACAAAACTTTGGCACAATGAAGTGGGTGACTTTTATTTCTCTTCTCCTTCTCTTCAGCTCTGCTTATTCCAGGGGTGTGTTTCGTCGAGATACACGTAAGAATTCTAGTTTTCAATTGTTCAACTTTTCTTTCCTAGACAAGAATGTTTCAGTAAACCTTGAATCATTAATGA

iInsulin A ID SSSSSSG1

ACGTAAGAATTCTAGTTTTCAATTGTTCAACTTTTCTTTCCTAG…

Insulin B IDSG2

CTTTGGCACAATGAAGTGGGTGACTTTTATTTCTCTTCTCCTTCTCTTCAGCTCTGCTTATTCC…

Albumin IDSG3

TATCTTTTCTATCAACCCCACAAAACTTTGGCACAATGAAGTGGGTGACTTTTATTT…

VERIFIED

3

SUBMIT

4

Action

Description of the action

Audio Narration

Entries in Web-server

The verified sequence is then given an accession number or a gene ID, which acts as the primary key for identifying this entry in the database in future.

Re-create all the images. The right most part of this screen is appended to the screen on 10th slide after the flash image of the 11th slide disappears.

5

step 2 search database
Step 2. Search Database

1

NUCLEOTIDE DATABASE

SUBMIT A SEQUENCE

SEARCH THE DATABASE

ANALYSIS TOOLS

2

Select the Database

Submit your query term

Nucleotide Sequence Database

NUCLEOTIDE

Serum Albumin

Albumin

LOCUS 9291

LENGTH 24158

ORGANISM Homo Sapiens

GENE NAME ALB

LOCATION Chromosome 4

JOURNAL Journal of Science

SEQUENCE

TATCTTTTCTATCAACCCCACAAAACTTTGGCACAATGAAGTGGGTGACTTTTATTT…

GENE

GENOME

EST

SNP

NUCLEOTIDE

GEO DATASETS

SUBMIT

3

Gene Expression Omnibus repository stores the curated gene expression DataSets as well as original Series and Platform records.

Searches the term in the whole genome profiles. These genomes are divided into 6 organism groups

Searches the database of Single Nucleotide Polymorphisms

Selects the term in the set of genes stored in the database

Contains sequences of “Expressed Sequence Tags” or “single-pass cDNA sequences”

Collection of all nucleotide sequences from a variety of sources.

Action

Description of the action

Audio Narration

4

Retrieval from Web-server

To search the database for a given gene, genome or nucleotide, the user can enter the query term in the search box. The query term can be the gene name or identifier for the gene. The user needs to select the database from which sequence has to be retrieved. These databases include:

Gene <Narrate content in the yellow box>

Genome <Narrate content in the yellow box>

EST <Narrate content in the yellow box>

SNP <Narrate content in the yellow box>

NUCLEOTIDE <Narrate content in the yellow box>

GEO DATASETS <Narrate content in the yellow box>

Once the user clicks on SUBMIT, the nucleotide sequence is shown along with a summary of the result.

Re-create all the images and screen-shots. Yellow boxes are the audio narration for each section. Do not display the yellow box as they are not a part of the database animation. The content of yellow boxes needs to be narrated as mentioned in the audio narration. Follow the steps as shown in the animation

5

slide14

Step 3.a: Analysis Tools - Nucleotide Sequence Identification

1

NUCLEOTIDE DATABASE

SUBMIT A SEQUENCE

SEARCH THE DATABASE

ANALYSIS TOOLS

2

Enter sequence 1

28

Word Size

TTTATTGTTTTCAATATCTATATAATGAAAAACTAATACTGAACAATTCAATGCTTATATACCCAAAAAT ATTTTACAATTA

Threshold

10

3

SELECT A DATABASE

NUCLEOTIDE

Existence 5, Extension 2

Gap penalty

NUCLEOTIDE

GENE

GEO

EST

SNP

1, -2

Match-Mismatch Score

ALIGNMENT ALGORITHM (BLAST)

4

Action

Description of the action

Audio Narration

Re-create all the images and screen-shots. Follow the steps as shown in animation. Show the click on “Analysis Tools”. Follow it with input of the sequence and selection of “Nucleotide” against the downlink, “SELECT DATABASE”. Follow it with input of rest of the parameters. Show clicking effect on BLAST tool

An unknown nucleotide sequence can be identified by searching it against a suitable nucleotide database. Input the sequence, and then select the database against which the match search is to be performed. Fill the parameter values and then click on the blast tool.

Analysis from database servers

5

slide15

Step 3.b: Analysis Tools - Nucleotide Sequence Identification

1

2

Identifies name of the gene and the type of nucleotide

Shows the alignment of the query nucleotide with the sequence of the identified nucleotides

NUCLEOTIDE

Shows the alignment of the two sequences by chance event. Nearer this value is to 0, more is the biological significance of the match

Bit score for alignment which is a normalized measure to compare scores with other hits

Shows the number of bases that matched in the query sequence and the hit

3

Percentage of residues substituted by a Gap

IDENTIFICATION OF GENE

SEQUENCE ALIGNMENT

ALIGNMENT SCORE

Query Start Position

Query End Position

E-Value

437 bits

Gaps

Percentage Identity

2e-118

Subject Start Position

Homo sapiens afamin (AFM), mRNA

0%

Subject End Position

100%

4

Action

Description of the action

Audio Narration

Re-create all the images and screen-shots. Follow the steps as shown in animation. Each output must be displayed separately.

Sequence identification through BLAST provides various results after alignment such as identification, alignment views, alignment score, e-value, percentage identity and gaps.

Analysis from database servers

5

slide16

Step 3.c: Analysis Tools - Nucleotide Sequence Alignment

1

NUCLEOTIDE DATABASE

SUBMIT A SEQUENCE

SEARCH THE DATABASE

ANALYSIS TOOLS

2

Enter sequence 1

3

Word Size

TTTATTGTTTTCAATATCTATATAATGAAAAACTAATACTGAACAATTCAATGCTTATATACCCAAAAAT ATTTTACAATTA

Threshold

10

3

Gap penalty

Existence 11, Extension 1

Enter sequence 2

AGTATATTAGTGCTAATTTCCCTCCGTTTGTCCTAGCTTTTCTCTTCTGTCAACCCCACACGCCTTTGGCACAATGAAGTGGGTAACCTTTATTTCCCTTC

1, -2

Match-Mismatch Score

ALIGNMENT ALGORITHM (BLAST)

4

Action

Description of the action

Audio Narration

Analysis from database servers

Re-create all the images and screen-shots. Follow the steps as shown in animation. Show the clicking effect on the 3rd tab “ANALYSIS TOOLS”. Input 2 sequences one-bye one. Follow this by inputting the parameters one at a time. Show the clicking effect n BLAST tool

Alignment can also be performed between two given nucleotide sequences. To align two sequences, enter them in the input boxes. Enter the necessary parameters, whose values will vary according to query. Then click on the alignment tool.

5

slide17

Step 3.d: Analysis Tools - Nucleotide Sequence Alignment

1

2

The figure for alignment of the 2 sequences

3

Measure for alignment occurring by chance event

Percentage of residues substituted by Gaps

Comparative Measure for quality of alignment

Gap

Alignment

Gaps

E-value

Score

5%

1e-64

241 bits

4

Action

Description of the action

Audio Narration

Analysis from database servers

Pair-wise alignment gives various kinds of results after alignment. These are alignment views, alignment score, dot-plot, e-value, percentage identity amongst many theirs.

Re-create all the images and screen-shots. Follow the steps as shown in animation. Each output must be displayed separately along with its definition box.

5

master layout part 3
Master Layout: Part 3

1

This animation consists of 3 parts:

Part 1: Genome sequencing protocol

Part 2: Genome Databases

Part 3: Genome alignment and its analysis

2

Tools available for comparing two genomes

3

Applications of comparing genomes

Seq 1

4

Seq 2

Seq 3

5

definitions of the components part 3 genome alignment and its analysis
Definitions of the componentsPart 3 – Genome alignment and its analysis

1

Orthologs: A single identical gene that is present in two different species, are known as orthologs.

Paralogs: Paralogs refers to two genes present in a single organism, of which one of them is produced by the duplication of the other but has gathered several mutations such that it performs separate functions.

Homologs: Genes that have a common origin in evolution and perform similar functions are called homologs.

Gene Order: Gene order refers to the sequential arrangement of genes within an organism’s genome.

Gene Cluster: Genes that are involved in a common functional aspect of the organism, tend to cluster together and are referred to as gene clusters. For example, genes related to certain metabolic pathways.

GC-content: The GC content is a measure of the number of Guanine and Cytosine bases in the genome of an organism and provides a useful method to compare two given genomes.

2

3

4

5

definitions of the components part 3 genome alignment and its analysis1
Definitions of the componentsPart 3 – Genome alignment and its analysis

1

7. Basic Local Alignment Search Tool (BLAST): This is the algorithm that is used to compare two given sequences or one sequence against a database. The BLAST version used for nucleotide comparison is “Nucleotide Blast”.

8. FAST-All (FASTA): This is another algorithm that was also developed to compare two given sequences based on gapped local alignments.

9. Phylogeny: The study of evolutionary relatedness of different groups of organisms which is analyzed from molecular sequencing data such as nucleotide and protein data.

2

3

4

5

step 1 comparative genomics tools
Step 1: Comparative genomics - Tools

1

Server for aligning two genomes and producing Percent Identity Plots. It uses a version of BLAST, which has modified parameters for aligning entire genomes, known as BLASTZ.

Tool for rapidly aligning whole genomes.

2

Web-based interactive computational tool to compare the order of genes in two genomes

3

Developed at MIT, GenScan is an online program to identify complete gene structures in genomic DNA.

AVID-VISTA is collection of programs and databases for comparative analysis of genomic sequences. VISTA also has pre-computed whole-genome alignments of different species.

4

Twinscan is a web-based tool by Washington University for gene-structure prediction.

5

step 1 comparative genomics tools1
Step 1: Comparative genomics - Tools

1

Action

Description of the action

Audio Narration

Schematic for Comparative Genomics Tools

Re-create all the images and screen-shots. Follow the steps as shown in animation. Replace the “Comparative Genomics Tools” In the previous slides, with these tabs one by by and narrate the explanation given on this slide

Here, we present a brief summary of comparative genomic tools

PipMaker – Server for aligning two genomes and producing Percent Identity Plots. It uses a version of BLAST, which has modified parameters for aligning entire genomes, known as BLASTZ.

GenScan - Developed at MIT, GenScan is an online program to identify complete gene structures in genomic DNA.

Twinscan -Twinscan is a web-based tool by Washington University for gene-structure prediction.

AVID - AVID-VISTA is collection of programs and databases for comparative analysis of genomic sequences. VISTA also has pre-computed whole-genome alignments of different species.

GeneOrder – Web-based interactive computational tool to compare the order of genes in two genomes

MUMmer is - Tool for rapidly aligning whole genomes.

2

3

4

http://genes.mit.edu/GENSCAN.html

http://mblab.wustl.edu/software/twinscan/

http://mummer.sourceforge.net/

http://genome.lbl.gov/vista/index.shtml

http://pipmaker.bx.psu.edu/pipmaker/

http://binf.gmu.edu:8080/GeneOrder3.0/

5

step 2 comparative genomics
Step 2: Comparative genomics

1

NUCLEOTIDE DATABASE

SUBMIT A SEQUENCE

SEARCH THE DATABASE

ANALYSIS TOOLS

2

Enter genome 1

Enter genome 2

AGTATATTAGTGCTAATTTCCCTCCGTTTGTCCTAGCTTTTCTCTTCTGTCAACCCCACACGCCTTTGGCACAATGAAGTGGGTAACCTTTATTTCCCTTC

TTTATTGTTTTCAATATCTATATAATGAAAAACTAATACTGAACAATTCAATGCTTATATACCCAAAAAT ATTTTACAATTA

3

COMPARITIVE GENOMICS TOOLS

4

Action

Description of the action

Audio Narration

Analysis from database servers

Re-create all the images and screen-shots. Follow the steps as shown in animation. Show input of two sequences and then follow it up with the clicking effect on “Comparative Genomics Tools”

For comparative genome analysis, extract the full genome sequences of interest. The servers of the comparative genomics tools have text boxes to upload these sequences. Thereafter, user needs to click on the submit button for the tool .

5

step 3 comparative genomics analysis
Step 3: Comparative genomics: Analysis

1

2

“Percent Identity Plot” is the visualization of the alignments retrieved from PipMaker for similar regions in two DNA sequences.

The entire genomes of two organisms can be aligned using tools such as MUMmer and AVID-VISTA

Comparative Genomics Tools can also predict Exons and Introns on the aligned genomes

3

WHOLE GENOME ALIGNMENT

PREDICTED EXONS AND INTRONS

PipMaker dot plot

4

Action

Description of the action

Audio Narration

Schematic for Tool Output

Re-create all the images and screen-shots. Follow the steps as shown in animation. Each output must be displayed separately along with the definition box.

The output of the various comparative genomics tools varies with the type of tool used. This may be Dot-Plot from PipMaker, whole genome alignment and predicted exons and introns in the alignment. For detailed analysis of these results, users must visit the respective sites as mentioned in references.

5

http://genes.mit.edu/GENSCAN.html, http://mblab.wustl.edu/software/twinscan/,http://mummer.sourceforge.net/

http://genome.lbl.gov/vista/index.shtml, http://pipmaker.bx.psu.edu/pipmaker/, http://binf.gmu.edu:8080/GeneOrder3.0/

slide25
Interactivity option 1: Align the genomes of Potato Spindle Tuber Viroid and Hop Latent Viroid sequences

1

Click on the GenBank ID for the two organisms under study 6

Input the 2 genomes in any genome alignment server of your choice 8

In the options to select the databases, opt for Genome databases 2/3

2

Open the NCBI Homepage on a web-browser 1

Obtain the alignment of the two genomes 9

In the summary section, check for the source organism of the sequence 5

3

Click on the search button. Obtain a list of completely sequenced Viroids 4

Enter the term “Viroids” in the search box 3/2

Click on the “FASTA” tab for the respective entries. Obtain the complete genome sequence in FASTA format 7

4

Results

Interactivity Type Options

Boundary/limits

Remove the step number mentioned in “red” from the bottom of the tab. Show all the steps in the mixed order. The user must click on the tabs order wise. If the user clicks at a tab which is not in the right order, then flash a message saying “try again”

All the tabs must be arranged in right order.

Arrange the steps in the order to be performed

Step 2 and 3 can be permitted in either order

5

questionnaire
Questionnaire

1

1. Which amongst these is NOT a nucleotide database?

Answers: a) NCBI b)PDB c) EMBL d)DDBJ

2. PipMaker compares the two genomes by finding?

Answers: a) Gene Order b) Cluster of Orthologous Genes c) Percent Identity Plots

d)All of the Above

3. Which is the tool for Whole Genome Alignment?

Answers: a) MUMmer b) PpMaker c) Both d)Neither

4. Exons can be predicted using which tool?

Answers: a) Genscan b) FASTA c) BLAST d)None of the above

5. Which is the last step in PCR reaction?

Answers: a) Annealing b) Elongation c) Denaturation d)None of the above

2

3

4

5

links for further reading
Links for further reading

Reference websites:

http://www.ebi.ac.uk/embl/

http://www.ddbj.nig.ac.jp/

http://www.ncbi.nlm.nih.gov

http://blast.ncbi.nlm.nih.gov/

www.icgeb.res.in/whotdr/presentation/comp-genomics.ppt

http://genome.crg.es/software/sgp2/

http://genes.mit.edu/GENSCAN.html

http://mblab.wustl.edu/software/twinscan/

http://mummer.sourceforge.net/

http://genome.lbl.gov/vista/index.shtml

http://pipmaker.bx.psu.edu/pipmaker/

http://binf.gmu.edu:8080/GeneOrder3.0/

links for further reading1
Links for further reading

Following URLs are used for animations

http://genes.mit.edu/GENSCAN.html

http://mblab.wustl.edu/software/twinscan/

http://mummer.sourceforge.net/

http://genome.lbl.gov/vista/index.shtml

http://pipmaker.bx.psu.edu/pipmaker/

http://binf.gmu.edu:8080/GeneOrder3.0/

http://www.ebi.ac.uk/embl/

http://www.ddbj.nig.ac.jp/

http://www.ncbi.nlm.nih.gov

http://blast.ncbi.nlm.nih.gov/

links for further reading2
Links for further reading

Books:

Bioinformatics-Sequence and Genome Analysis y David. W. Mount

Biochemistry by A.L.Lehninger et al., 3rd edition