Genome Databases and Analysis

Surabhi Agarwal Genome Databases and Analysis With the advent of the genome sequencing technology, biological research has now easy and fast access to the complete DNA sequences of many organisms. This DNA sequence information, when stored with the help of databases, can be used for comparative genomics research.

Master Layout: Part 1 1 This animation consists of 3 parts: Part 1: Genome sequencing protocol Part 2: Genome databases Part 3: Genome alignment and its analysis Extract the DNA of the organism whose genome is to be sequenced 2 Fragment the genomic DNA and integrate it with Bacterial Artificial Chromosome vectors Genomic DNA 3 Sequence the BAC fragments using DNA sequencing techniques DNA fragments 4 GTTCTGGGACCTTTTCAAACTGAAGAGAGGAGGCTGGCTGCATCATGGGAGAAGAGACTATTGGGAAGAAGTTACCTGCAACTACAGCAACTCCAGACTCATCAAAAACAGAAATGGACAGCAGGACAAAGAGCAAGGATTACTGCAAAGTAATATTTCCATATGAGGCACAGAATGATGATGAATTGACAATCAAAGAAGGAGATATAGTCACTCTCATCAATAAGGACTGCATCGACGTAGGCTGGTGGGAAGGAGAGCTGAACGGCAGACGAGGCGTGTTCCCCGATAACTTCGTGAAGTTACTTCCACCGGACTTTGAAAAGGAAGGGAATAGACCCAAGAAGCCACCGCC Protein sequences determined and stored in databases for future usage Generate a detailed physical map of the genome with clones derived from each chromosome organized in a series of contigs 5 Completed DNA sequence

Definitions of the componentsPart 1 – Genome sequencing protocol 1 1. Genome: The complete hereditary information of an organism is referred to as the genome. 2. Restriction Enzyme: An enzyme that cleaves double-stranded or single-stranded DNA into smaller fragments at specific recognizable DNA sequences called restriction sites. 3. Bacterial Artificial Chromosomes (BACs):Bacterial Artificial Chromosomes are DNA constructs that are useful for cloning purposes. These cloning vectors can carry DNA inserts of around 150-350 kbp and have been extremely useful in genome sequencing projects carried out. 4.Sticky Ends: A cohesive or sticky end of DNA refers to those DNA molecules having a 3’ or 5’ overhang region after they have been cleaved by the restriction enzyme. These overhangs possess nucleotide sequences that are complementary to the cloning vector and can therefore easily anneal with the cloning vector. 5. DNA ligase: Enzyme that is involved in repairing or joining single stranded breaks or discontinuities in double stranded DNA. 6. Recombinant BAC: Those BAC vectors that possess the recombinant DNA, i.e., plasmid DNA integrated with the foreign DNA to be cloned. 2 3 4 5

Definitions of the componentsPart 1 – Genome sequencing protocol 1 7. Contigs: Set of over-lapping common DNA fragments that are derived from a single genetic source. These contigs are mapped to deduce the complete chromosome sequence. 8. Shotgun Sequencing: A DNA sequencing method in which a long DNA fragment is first broken down into smaller fragments. Each small fragment is then sequenced using established DNA sequencing protocols such as Sanger’s chain termination method. This fragmentation-sequencing protocol is repeated several times with enzymes of different specificities to obtain multiple reads. Overlapping ends of different reads are then arranged using automated computerized programs. 9. Pyrosequencing: A DNA sequencing strategy that makes use of the real time detection of pyrophosphate generated by the addition of a nucleotide to a growing DNA strand based on its corresponding DNA template. 10. Physical Map: Maps that provide the DNA-base pair distances from one nucleotide to another are known as physical maps. 2 3 4 5

Step 1: Genome Sequencing 1 v DNA DIGESTED INTO FRAGMENTS DNA FRAGMENTS WITH APPROPRIATE STICKY ENDS GENOMIC DNA 2 RECOMBINANT BAC E. coli CELLS DNA LIGASE + + 3 ELECTROPORATION BAC VECTOR TREATED WITH SAME RESTRICTION ENDONUCLEASE RESTRICTION ENZYME TRANSFORMED E. coli CELLS 4 Action Description of the action Audio Narration The genomic DNA is cleaved into fragments by restriction enzymes that cut the DNA at specific sequences known as restriction sites. The genomic DNA breaks into smaller fragments . The BAC vector is cleaved at its restriction site using the same restriction endonuclease. The DNA fragment having suitable sticky ends is then integrated with the BAC vector and annealed using DNA ligase. This recombinant DNA is then incorporated into bacterial cells such as E. coli. Sequential steps of an Experimental Process Follow the steps in the animation. Animator needs to re-draw all figures in the final animation. The pink curve “Restriction Enzyme” is shown to attach to the “Genomic DNA”. Show the “Genomic DNA” getting broken into fragments. Chose one fragment and attach it to ring shaped figure ”BAC Vector”. The attached figure integrates to The cell. Follow it with the last figure. 5 Biochemistry by A.L.Lehninger et al., 3rd edition

Step 2: Genome Sequencing 1 CONTIGS ARE IDETIFIED AND MAPPED GTTCTGGGACCTTTTCAAACTGAAGAGAGGAGGCTGGCTGCATCATGGGAGAAGAGACTATTGGGAAGAAGTTACCTGCAACTACAGCAACTCCAGACTCATCAAAAACAG 2 BAC TO BE SEQUENCED IS FRAGMENTED SEQUENCE OVERLAPS REVEAL FINAL SEQUENCE 3 FRAGMENTS ARE SEQUENCED AT RANDOM Action Description of the action Audio Narration 4 The genomic DNA fragments of the library are then organized into a physical map and aligned as contigs, after which a particular contig is identified for further sequencing. The BAC selected for sequencing is fragmented and then subjected to methodologies such as Sanger’s method and pyrosequencing. The sequence of the clone is then deduced by aligning them based on their overlapping regions. The entire genomic sequence is then obtained once each BAC is sequenced in this manner. For a detailed study of the various methods of sequencing , refer to the OSCAR animation titled “Genomics” Sequential steps of an Experimental Process Show figure 1 which has a cluster of aligned fragments. Select one fragment and break it further into smaller units. Show the unit getting sequenced by one of the 3 technologies. This is followed by a the nest figure which shows the overlap in sequences. This is followed by the fully sequenced peptide 5 Biochemistry by A.L.Lehninger et al., 3rd edition

Master Layout: Part 2 1 This animation consists of 3 parts: Part 1: Genome sequencing protocol Part 2: Genome Databases Part 3: Genome alignment and its analysis GTTCTGGGACCTTTTCAAACTGAAGAGAGGAGGCTGGCTGCATCATGGGAGAAGAGACTATTGGGAAGAAGTTACCTGCAACTACAGCAACTCCAGACTCATCAAAAACAGAAATGGACAGCAGGACAAAGAGCAAGGATTACTGCAAAGTAATATTTCCATATGAGGCACAGAATGATGATGAATTGACAATCAAAGAAGGAGATATAGTCACTCTCATCAATAAGGACTGCATCGACGTAGGCTGGTGGGAAGGAGAGCTGAACGGCAGACGAGGCGTGTTCCCCGATAACTTCGTGAAGTTACTTCCACCGGACTTTGAAAAGGAAGGGAATAGACCCAAGAAGCCACCGCC 2 Establish, Maintain and Disseminate the Genomic data of various organisms 3 4 Organization, Search and Retrieval of Genomic Data 5

Definitions of the componentsPart 2 – Genome databases 1 • Nucleotide Database: A collection of records of the nucleotide sequences that are related to the DNA of an organism. This includes gene sequences, genome sequences, Expression Sequence Tags (EST) etc. • Accession Number: This is a unique identification number that is given to each of the sequence entries in biological databases that provide easy access directly to the sequence of interest. These accession numbers are modified every time the sequence gets updated. The identifiers also vary with each database. • ENTREZ: It is an integrated search portal that has features which enable the user to search many distinct biological databases simultaneously. • International Nucleotide Sequence Database Collaboration (INSDC): INSDC is an International collaboration that has been established for exchanging and sharing Nucleotide Sequence Data. This includes collection and dissemination of all DNA and RNA sequences generated by the members of INSDC. • Word –length: The minimum length of the initial set of nucleotides, which needs to be matched completely, before alignment extension of the two sequences can be initiated. Sensitivity and speed of the search can be regulated by increasing or decreasing the word-size. • Threshold: It refers to the expected number of matches between nucleotide bases that can occur by chance. The statistical significance of the results can be judged based on this parameter. The default value for most cases are 10, which implies that in a random model, 10 such matches are expected to be found merely by chance. 2 3 4 5

Definitions of the componentsPart 2 – Genome databases 1 Gap Penalty and Gap Extension: During an alignment of two or more given nucleotide sequences, a gap is introduced wherever a base mismatch occurs. In this context, “Gap penalty” refers to a deduction in the overall alignment score on introduction of a gap while the “Gap Extension” is for extending an already existing gap. Alignment Score: This is also referred to as the Bit Score and provides a comparative quantification of the quality of alignment. The score, increases when a higher number of residue matches and lower number of mismatches are encountered. The alignment having a higher bit score is a better match. Match-Mismatch Scores: During alignment of nucleotide sequences, the scoring system used adds a “Reward” score for matching bases and subtracts a “Penalty” score for mismatching bases. These scores are represented as pairs of values in the BLAST algorithm. Percentage Identity: This indicates the percentage of nucleotide bases that are an identical match to each other during the comparison of two sequences. E-value: E-value provides a quantification of any chance alignment between two or more sequences instead of them being a biologically significant match. For similarity match against a database, this value is dependant on the size of the database against which the sequence is compared. The closer the e-value is to zero, the higher is the biological significance of the match. 2 3 4 5

Step 1.a: Submit a Sequence 1 NUCLEOTIDE DATABASE 2 SUBMIT A SEQUENCE SEARCH THE DATABASE ANALYSIS TOOLS Submit your sequence here 3 VERIFYING… Albumin_S TATCTTTTCTATCAACCCCACAAAACTTTGGCACAATGAAGTGGGTGACTTTTATTTCTCTTCTCCTTCTCTTCAGCTCTGCTTATTCCAGGGGTGTGTTTCGTCGAGATACACGTAAGAATTCTAGTTTTCAATTGTTCAACTTTTCTTTCCTAGACAAGAATGTTTCAGTAAACCTTGAATCATTAATGA SUBMIT 4 Action Description of the action Audio Narration To submit a sequence in a nucleotide database, it must be entered in any one of the sites of the members of the International Nucleotide Sequence Database Collaboration consisting of NCBI, EBI-EMBL and DDBJ. Upon submission, these entries are verified for their source of retrieval and publication details. Entries in Web-server Animator needs to Re-draw all the images. Show the layout of the database at first. Then show clicking effect on “SUBMIT A SEQUENCE”. Show the input of sequence in white box followed by Clicking effect on “SUBMIT”. Show the “VERIFYING…” sign in a waiting mode. While the Verifying goes on, show the diagram on the next slide. 5

Step 1.b: Submit a Sequence 1 VERIFYING… Information exchange between databases takes place for verification process. National Centre for Biotechnology Information NCBI 2 3 European Bioinformatics Institute DNA Data bank of Japan EBI DDBJ 4 Action Description of the action Audio Narration Entries in Web-server The newly entered sequences are exchanged between the three member servers on a daily basis and verified by them. This helps in keeping track of the updates in sequencing information and sharing data that is useful for research. Re-create all the images and screen-shots. This is the image that flashes in front of the screen after the previous slide while the “VERIFYING…” button is in the wait mode. 5

Step 1.c: Submit a Sequence 1 NUCLEOTIDE DATABASE 2 SUBMIT A SEQUENCE SEARCH THE DATABASE ANALYSIS TOOLS Nucleotide Sequence Database Submit your sequence here Albumin TATCTTTTCTATCAACCCCACAAAACTTTGGCACAATGAAGTGGGTGACTTTTATTTCTCTTCTCCTTCTCTTCAGCTCTGCTTATTCCAGGGGTGTGTTTCGTCGAGATACACGTAAGAATTCTAGTTTTCAATTGTTCAACTTTTCTTTCCTAGACAAGAATGTTTCAGTAAACCTTGAATCATTAATGA iInsulin A ID SSSSSSG1 ACGTAAGAATTCTAGTTTTCAATTGTTCAACTTTTCTTTCCTAG… Insulin B IDSG2 CTTTGGCACAATGAAGTGGGTGACTTTTATTTCTCTTCTCCTTCTCTTCAGCTCTGCTTATTCC… Albumin IDSG3 TATCTTTTCTATCAACCCCACAAAACTTTGGCACAATGAAGTGGGTGACTTTTATTT… VERIFIED 3 SUBMIT 4 Action Description of the action Audio Narration Entries in Web-server The verified sequence is then given an accession number or a gene ID, which acts as the primary key for identifying this entry in the database in future. Re-create all the images. The right most part of this screen is appended to the screen on 10th slide after the flash image of the 11th slide disappears. 5

Step 2. Search Database 1 NUCLEOTIDE DATABASE SUBMIT A SEQUENCE SEARCH THE DATABASE ANALYSIS TOOLS 2 Select the Database Submit your query term Nucleotide Sequence Database NUCLEOTIDE Serum Albumin Albumin LOCUS 9291 LENGTH 24158 ORGANISM Homo Sapiens GENE NAME ALB LOCATION Chromosome 4 JOURNAL Journal of Science … SEQUENCE TATCTTTTCTATCAACCCCACAAAACTTTGGCACAATGAAGTGGGTGACTTTTATTT… GENE GENOME EST SNP NUCLEOTIDE GEO DATASETS SUBMIT 3 Gene Expression Omnibus repository stores the curated gene expression DataSets as well as original Series and Platform records. Searches the term in the whole genome profiles. These genomes are divided into 6 organism groups Searches the database of Single Nucleotide Polymorphisms Selects the term in the set of genes stored in the database Contains sequences of “Expressed Sequence Tags” or “single-pass cDNA sequences” Collection of all nucleotide sequences from a variety of sources. Action Description of the action Audio Narration 4 Retrieval from Web-server To search the database for a given gene, genome or nucleotide, the user can enter the query term in the search box. The query term can be the gene name or identifier for the gene. The user needs to select the database from which sequence has to be retrieved. These databases include: Gene <Narrate content in the yellow box> Genome <Narrate content in the yellow box> EST <Narrate content in the yellow box> SNP <Narrate content in the yellow box> NUCLEOTIDE <Narrate content in the yellow box> GEO DATASETS <Narrate content in the yellow box> Once the user clicks on SUBMIT, the nucleotide sequence is shown along with a summary of the result. Re-create all the images and screen-shots. Yellow boxes are the audio narration for each section. Do not display the yellow box as they are not a part of the database animation. The content of yellow boxes needs to be narrated as mentioned in the audio narration. Follow the steps as shown in the animation 5

Step 3.a: Analysis Tools - Nucleotide Sequence Identification 1 NUCLEOTIDE DATABASE SUBMIT A SEQUENCE SEARCH THE DATABASE ANALYSIS TOOLS 2 Enter sequence 1 28 Word Size TTTATTGTTTTCAATATCTATATAATGAAAAACTAATACTGAACAATTCAATGCTTATATACCCAAAAAT ATTTTACAATTA Threshold 10 3 SELECT A DATABASE NUCLEOTIDE Existence 5, Extension 2 Gap penalty NUCLEOTIDE GENE GEO EST SNP 1, -2 Match-Mismatch Score ALIGNMENT ALGORITHM (BLAST) 4 Action Description of the action Audio Narration Re-create all the images and screen-shots. Follow the steps as shown in animation. Show the click on “Analysis Tools”. Follow it with input of the sequence and selection of “Nucleotide” against the downlink, “SELECT DATABASE”. Follow it with input of rest of the parameters. Show clicking effect on BLAST tool An unknown nucleotide sequence can be identified by searching it against a suitable nucleotide database. Input the sequence, and then select the database against which the match search is to be performed. Fill the parameter values and then click on the blast tool. Analysis from database servers 5

Step 3.b: Analysis Tools - Nucleotide Sequence Identification 1 2 Identifies name of the gene and the type of nucleotide Shows the alignment of the query nucleotide with the sequence of the identified nucleotides NUCLEOTIDE Shows the alignment of the two sequences by chance event. Nearer this value is to 0, more is the biological significance of the match Bit score for alignment which is a normalized measure to compare scores with other hits Shows the number of bases that matched in the query sequence and the hit 3 Percentage of residues substituted by a Gap IDENTIFICATION OF GENE SEQUENCE ALIGNMENT ALIGNMENT SCORE Query Start Position Query End Position E-Value 437 bits Gaps Percentage Identity 2e-118 Subject Start Position Homo sapiens afamin (AFM), mRNA 0% Subject End Position 100% 4 Action Description of the action Audio Narration Re-create all the images and screen-shots. Follow the steps as shown in animation. Each output must be displayed separately. Sequence identification through BLAST provides various results after alignment such as identification, alignment views, alignment score, e-value, percentage identity and gaps. Analysis from database servers 5

Step 3.c: Analysis Tools - Nucleotide Sequence Alignment 1 NUCLEOTIDE DATABASE SUBMIT A SEQUENCE SEARCH THE DATABASE ANALYSIS TOOLS 2 Enter sequence 1 3 Word Size TTTATTGTTTTCAATATCTATATAATGAAAAACTAATACTGAACAATTCAATGCTTATATACCCAAAAAT ATTTTACAATTA Threshold 10 3 Gap penalty Existence 11, Extension 1 Enter sequence 2 AGTATATTAGTGCTAATTTCCCTCCGTTTGTCCTAGCTTTTCTCTTCTGTCAACCCCACACGCCTTTGGCACAATGAAGTGGGTAACCTTTATTTCCCTTC 1, -2 Match-Mismatch Score ALIGNMENT ALGORITHM (BLAST) 4 Action Description of the action Audio Narration Analysis from database servers Re-create all the images and screen-shots. Follow the steps as shown in animation. Show the clicking effect on the 3rd tab “ANALYSIS TOOLS”. Input 2 sequences one-bye one. Follow this by inputting the parameters one at a time. Show the clicking effect n BLAST tool Alignment can also be performed between two given nucleotide sequences. To align two sequences, enter them in the input boxes. Enter the necessary parameters, whose values will vary according to query. Then click on the alignment tool. 5

Step 3.d: Analysis Tools - Nucleotide Sequence Alignment 1 2 The figure for alignment of the 2 sequences 3 Measure for alignment occurring by chance event Percentage of residues substituted by Gaps Comparative Measure for quality of alignment Gap Alignment Gaps E-value Score 5% 1e-64 241 bits 4 Action Description of the action Audio Narration Analysis from database servers Pair-wise alignment gives various kinds of results after alignment. These are alignment views, alignment score, dot-plot, e-value, percentage identity amongst many theirs. Re-create all the images and screen-shots. Follow the steps as shown in animation. Each output must be displayed separately along with its definition box. 5

Master Layout: Part 3 1 This animation consists of 3 parts: Part 1: Genome sequencing protocol Part 2: Genome Databases Part 3: Genome alignment and its analysis 2 Tools available for comparing two genomes 3 Applications of comparing genomes Seq 1 4 Seq 2 Seq 3 5

Definitions of the componentsPart 3 – Genome alignment and its analysis 1 Orthologs: A single identical gene that is present in two different species, are known as orthologs. Paralogs: Paralogs refers to two genes present in a single organism, of which one of them is produced by the duplication of the other but has gathered several mutations such that it performs separate functions. Homologs: Genes that have a common origin in evolution and perform similar functions are called homologs. Gene Order: Gene order refers to the sequential arrangement of genes within an organism’s genome. Gene Cluster: Genes that are involved in a common functional aspect of the organism, tend to cluster together and are referred to as gene clusters. For example, genes related to certain metabolic pathways. GC-content: The GC content is a measure of the number of Guanine and Cytosine bases in the genome of an organism and provides a useful method to compare two given genomes. 2 3 4 5

Definitions of the componentsPart 3 – Genome alignment and its analysis 1 7. Basic Local Alignment Search Tool (BLAST): This is the algorithm that is used to compare two given sequences or one sequence against a database. The BLAST version used for nucleotide comparison is “Nucleotide Blast”. 8. FAST-All (FASTA): This is another algorithm that was also developed to compare two given sequences based on gapped local alignments. 9. Phylogeny: The study of evolutionary relatedness of different groups of organisms which is analyzed from molecular sequencing data such as nucleotide and protein data. 2 3 4 5

Step 1: Comparative genomics - Tools 1 Server for aligning two genomes and producing Percent Identity Plots. It uses a version of BLAST, which has modified parameters for aligning entire genomes, known as BLASTZ. Tool for rapidly aligning whole genomes. 2 Web-based interactive computational tool to compare the order of genes in two genomes 3 Developed at MIT, GenScan is an online program to identify complete gene structures in genomic DNA. AVID-VISTA is collection of programs and databases for comparative analysis of genomic sequences. VISTA also has pre-computed whole-genome alignments of different species. 4 Twinscan is a web-based tool by Washington University for gene-structure prediction. 5

Step 1: Comparative genomics - Tools 1 Action Description of the action Audio Narration Schematic for Comparative Genomics Tools Re-create all the images and screen-shots. Follow the steps as shown in animation. Replace the “Comparative Genomics Tools” In the previous slides, with these tabs one by by and narrate the explanation given on this slide Here, we present a brief summary of comparative genomic tools PipMaker – Server for aligning two genomes and producing Percent Identity Plots. It uses a version of BLAST, which has modified parameters for aligning entire genomes, known as BLASTZ. GenScan - Developed at MIT, GenScan is an online program to identify complete gene structures in genomic DNA. Twinscan -Twinscan is a web-based tool by Washington University for gene-structure prediction. AVID - AVID-VISTA is collection of programs and databases for comparative analysis of genomic sequences. VISTA also has pre-computed whole-genome alignments of different species. GeneOrder – Web-based interactive computational tool to compare the order of genes in two genomes MUMmer is - Tool for rapidly aligning whole genomes. 2 3 4 http://genes.mit.edu/GENSCAN.html http://mblab.wustl.edu/software/twinscan/ http://mummer.sourceforge.net/ http://genome.lbl.gov/vista/index.shtml http://pipmaker.bx.psu.edu/pipmaker/ http://binf.gmu.edu:8080/GeneOrder3.0/ 5

Step 2: Comparative genomics 1 NUCLEOTIDE DATABASE SUBMIT A SEQUENCE SEARCH THE DATABASE ANALYSIS TOOLS 2 Enter genome 1 Enter genome 2 AGTATATTAGTGCTAATTTCCCTCCGTTTGTCCTAGCTTTTCTCTTCTGTCAACCCCACACGCCTTTGGCACAATGAAGTGGGTAACCTTTATTTCCCTTC TTTATTGTTTTCAATATCTATATAATGAAAAACTAATACTGAACAATTCAATGCTTATATACCCAAAAAT ATTTTACAATTA 3 COMPARITIVE GENOMICS TOOLS 4 Action Description of the action Audio Narration Analysis from database servers Re-create all the images and screen-shots. Follow the steps as shown in animation. Show input of two sequences and then follow it up with the clicking effect on “Comparative Genomics Tools” For comparative genome analysis, extract the full genome sequences of interest. The servers of the comparative genomics tools have text boxes to upload these sequences. Thereafter, user needs to click on the submit button for the tool . 5

Step 3: Comparative genomics: Analysis 1 2 “Percent Identity Plot” is the visualization of the alignments retrieved from PipMaker for similar regions in two DNA sequences. The entire genomes of two organisms can be aligned using tools such as MUMmer and AVID-VISTA Comparative Genomics Tools can also predict Exons and Introns on the aligned genomes 3 WHOLE GENOME ALIGNMENT PREDICTED EXONS AND INTRONS PipMaker dot plot 4 Action Description of the action Audio Narration Schematic for Tool Output Re-create all the images and screen-shots. Follow the steps as shown in animation. Each output must be displayed separately along with the definition box. The output of the various comparative genomics tools varies with the type of tool used. This may be Dot-Plot from PipMaker, whole genome alignment and predicted exons and introns in the alignment. For detailed analysis of these results, users must visit the respective sites as mentioned in references. 5 http://genes.mit.edu/GENSCAN.html, http://mblab.wustl.edu/software/twinscan/,http://mummer.sourceforge.net/ http://genome.lbl.gov/vista/index.shtml, http://pipmaker.bx.psu.edu/pipmaker/, http://binf.gmu.edu:8080/GeneOrder3.0/

Interactivity option 1: Align the genomes of Potato Spindle Tuber Viroid and Hop Latent Viroid sequences 1 Click on the GenBank ID for the two organisms under study 6 Input the 2 genomes in any genome alignment server of your choice 8 In the options to select the databases, opt for Genome databases 2/3 2 Open the NCBI Homepage on a web-browser 1 Obtain the alignment of the two genomes 9 In the summary section, check for the source organism of the sequence 5 3 Click on the search button. Obtain a list of completely sequenced Viroids 4 Enter the term “Viroids” in the search box 3/2 Click on the “FASTA” tab for the respective entries. Obtain the complete genome sequence in FASTA format 7 4 Results Interactivity Type Options Boundary/limits Remove the step number mentioned in “red” from the bottom of the tab. Show all the steps in the mixed order. The user must click on the tabs order wise. If the user clicks at a tab which is not in the right order, then flash a message saying “try again” All the tabs must be arranged in right order. Arrange the steps in the order to be performed Step 2 and 3 can be permitted in either order 5

Questionnaire 1 1. Which amongst these is NOT a nucleotide database? Answers: a) NCBI b)PDB c) EMBL d)DDBJ 2. PipMaker compares the two genomes by finding? Answers: a) Gene Order b) Cluster of Orthologous Genes c) Percent Identity Plots d)All of the Above 3. Which is the tool for Whole Genome Alignment? Answers: a) MUMmer b) PpMaker c) Both d)Neither 4. Exons can be predicted using which tool? Answers: a) Genscan b) FASTA c) BLAST d)None of the above 5. Which is the last step in PCR reaction? Answers: a) Annealing b) Elongation c) Denaturation d)None of the above 2 3 4 5

Links for further reading Reference websites: http://www.ebi.ac.uk/embl/ http://www.ddbj.nig.ac.jp/ http://www.ncbi.nlm.nih.gov http://blast.ncbi.nlm.nih.gov/ www.icgeb.res.in/whotdr/presentation/comp-genomics.ppt http://genome.crg.es/software/sgp2/ http://genes.mit.edu/GENSCAN.html http://mblab.wustl.edu/software/twinscan/ http://mummer.sourceforge.net/ http://genome.lbl.gov/vista/index.shtml http://pipmaker.bx.psu.edu/pipmaker/ http://binf.gmu.edu:8080/GeneOrder3.0/

Links for further reading Following URLs are used for animations http://genes.mit.edu/GENSCAN.html http://mblab.wustl.edu/software/twinscan/ http://mummer.sourceforge.net/ http://genome.lbl.gov/vista/index.shtml http://pipmaker.bx.psu.edu/pipmaker/ http://binf.gmu.edu:8080/GeneOrder3.0/ http://www.ebi.ac.uk/embl/ http://www.ddbj.nig.ac.jp/ http://www.ncbi.nlm.nih.gov http://blast.ncbi.nlm.nih.gov/

Links for further reading Books: Bioinformatics-Sequence and Genome Analysis y David. W. Mount Biochemistry by A.L.Lehninger et al., 3rd edition

Genome Databases and Analysis

Genome Databases and Analysis

Presentation Transcript

The EcoCyc and MetaCyc Pathway/Genome Databases

Genome Analysis

Genome analysis and annotation

Genome Databases

Genome analysis.

Genome Related Biological Databases

Melampsora Genome Annotation and Genome Structure Analysis

Genome analysis

Editing Pathway/Genome Databases

Genome databases and webtools for genome analysis

Genome analysis

Computing with Pathway/Genome Databases

Genome databases and webtools for genome analysis

Genome, Protein and Model Organism Databases

Genome Analysis

Overview of Microbial Pathway and Genome Databases

Genome Analysis

Genome analysis and annotation

Genome analysis

Genome Annotation and Databases

Computing with Pathway/Genome Databases