Next Generation sequencing and Gene Annotation. Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV. DNA SEQUENCING.
Ms. Shivani Bhagwat
School of Biotechnology
DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA.
The first DNA sequences were obtained in the early 1970s by academic researchers using laborious methods based on two-dimensional chromatography.
DNA sequencing method based on chemical modification of DNA and subsequent cleavage at specific bases.
Chain-termination methods the DNA (typically by a kinase reaction using gamma-
The key principle of the Sanger method was the use of dideoxynucleotide triphosphates (ddNTPs) as DNA chain terminators.
The classical chain-termination method requires a single-stranded DNA template, a DNA primer(labelled ), a DNA polymerase, normal deoxynucleotidetriphosphates (dNTPs), and modified nucleotides (dideoxyNTPs) that terminate DNA strand elongation.
The DNA sample is divided into four separate sequencing reactions, containing all four of the standard deoxynucleotides (dATP, dGTP, dCTP and dTTP) and the DNA polymerase.
To each reaction is added only one of the four dideoxynucleotides (ddATP, ddGTP, ddCTP, or ddTTP) which are the chain-terminating nucleotides, lacking a 3'-OH group required for the formation of a phosphodiester bond between two nucleotides, thus terminating DNA strand extension and resulting in DNA fragments of varying length.
Separated by size (with a resolution of just one nucleotide) by gel electrophoresis on a denaturing polyacrylamide-urea gel with each of the four reactions run in one of four individual lanes (lanes A, T, G, C).
The DNA bands are then visualized by autoradiography or UV light, and the DNA sequence can be directly read off the X-ray film or gel image.
NOTE:Limitations include non-specific binding of the primer to the DNA, affecting accurate read-out of the DNA sequence, and DNA secondary structures affecting the fidelity of the sequence.
Dye-terminator sequencing denatured.
Dye-terminator sequencing utilizes labelling of the chain terminator ddNTPs, which permits sequencing in a single reaction, rather than four reactions as in the labelled-primer method. In dye-terminator sequencing, each of the four dideoxynucleotide chain terminators is labelled with fluorescent dyes, each of which emit light at different wavelengths.
Automated DNA-sequencing instruments (DNA sequencers) can sequence up to 384 DNA samples in a single batch (run) in up to 24 runs a day. DNA sequencers carry out capillary electrophoresis for size separation, detection and recording of dye fluorescence, and data output as fluorescent peak trace chromatograms.
Base calling software typically gives an estimate of quality to aid in quality trimming.
Massively parallel signature sequencing(MPSS) quality to aid in quality trimming.
Was in 1990s and a bit complicated.
It is a sequence based approach that can be used to identify and quantify mRNA
transcripts present in a sample similar to serial analysis of gene expression (SAGE)
but the biochemical manipulation and sequencing approach differ substantially.
mRNA transcripts to be identified through the generation of a 17-20 bp (base pair)
signature sequence adjacent to the 3’-end.
Each signature sequence is cloned onto one of a million microbeads. The technique
ensures that only one type of DNA sequence is on a microbead.
The microbeads are then arrayed in a flow cell for sequencing and quantification.
fluorescently labeled encoders would be used to decode the sequence.
Pyrosequencing Technology quality to aid in quality trimming.
Developed by 454 Life Sciences, which has since been acquired by Roche Diagnostics.
Based on emulsion PCR technology and detection of pyrophosphate release on nucleotide incorporation.
ssDNA template is hybridized to a sequencing primer and incubated with the enzymes DNA polymerase, ATP sulfurylase, luciferase and apyrase, and with the substrates adenosine 5´ phosphosulfate (APS) and luciferin.
The addition of one of the four deoxynucleotide triphosphates (dNTPs) initiates the second step. DNA polymerase incorporates the correct, complementary dNTPs onto the template. This incorporation releases pyrophosphate (PPi).
ATP sulfurylase quantitatively converts PPi to ATP in the presence of adenosine 5´ phosphosulfate. This ATP acts as fuel to the luciferase-mediated conversion of luciferin to oxyluciferin that generates visible light in amounts that are proportional to the amount of ATP.
Unincorporated nucleotides and ATP are degraded by the apyrase, and the reaction can restart with another nucleotide.
Emulsion PCR (ePCR) quality to aid in quality trimming.
Sequential nucleotide addition quality to aid in quality trimming.
Light reaction quality to aid in quality trimming.
Sequencing by Synthesis technology(SBS) quality to aid in quality trimming.
Sequencing by ligation technology quality to aid in quality trimming.
Developed by Applied Biosystems SOLiD .
Sequencing by ligation relies upon the sensitivity of DNA ligase for base-pairing mismatches.
The target molecule to be sequenced is a single strand of unknown DNA sequence, flanked on at least one end by a known sequence. A short "anchor" strand is brought in to bind the known sequence.
A mixed pool of probe oligonucleotides is then brought in (8 or 9 bases long), labeled (typically with fluorescent dyes) according to the position that will be sequenced.
These molecules hybridize to the target DNA sequence, next to the anchor sequence, and DNA ligase preferentially joins the molecule to the anchor when its bases match the unknown DNA sequence. Based on the fluorescence produced by the molecule, one can infer the identity of the nucleotide at this position in the unknown sequence.
VisiGen Biotechnologies approach quality to aid in quality trimming.
VisiGen Biotechnologies introduced a specially engineered DNA polymerase for use in their sequencing.
This polymerase acts as a sensor - having incorporated a donor fluorescent dye by its active centre. This donor dye acts by FRET (fluorescent resonant energy transfer), inducing fluorescence of differently labeled nucleotides.
This approach allows reads performed at the speed at which polymerase incorporates nucleotides into the sequence (several hundred per second).
The nucleotide fluorochrome is released after the incorporation into the DNA strand.
The expected read lengths in this approach should reach 1000 nucleotides, however this will have to be confirmed.
Nanopore sequencing technology quality to aid in quality trimming.
Developed by Helicose Biosciences.
This method is based on the readout of electrical signal occurring at nucleotides passing by alpha-hemolysin pores covalently bound with cyclodextrin.
The DNA passing through the nanopore changes its ion current. This change is dependent on the shape, size and length of the DNA sequence. Each type of the nucleotide blocks the ion flow through the pore for a different period of time.
The method has a potential of development as it does not require modified nucleotides, however single nucleotide resolution is not yet available.
Emulsion PCR quality to aid in quality trimming.
The single-stranded DNA fragments or templates are attached to the surface of beads using adaptors or linkers, and one bead is attached to a single DNA fragment from the DNA library.
The DNA library is generated through random fragmentation of the genomic DNA. The surface of the beads contains oligonucleotide probes with sequences that are complementary to the adaptors binding the DNA fragments.
After that, the beads will be compartmentalized into separate water-oil emulsion droplets.
In the aqueous water-oil emulsion, each of the droplets capturing one bead will serve as a PCR microreactor for amplification steps to take place and produce clonally amplified copies of the DNA fragment.
Bridge amplification on solid surface quality to aid in quality trimming.
High-density forward and reverse primers are covalently attached to the slide in a flow cell. The ratio of the primers to the template on the support defines the surface density of the amplified clusters.
The flowcell is exposed to reagents for polymerase-based extension, and priming occurs as the free/distal end of a ligated fragment "bridges" to a complementary oligo on the surface.
Repeated denaturation and extension results in localized amplification of DNA fragments in millions of unique locations across the flow cell surface. Solid-phase amplification can produce 100–200 million spatially separated template clusters (Illumina/Solexa), providing free ends to which a universal sequencing primer can be hybridized to initiate the NGS reaction.
Single-molecule templates quality to aid in quality trimming.
Some of the clonally amplified methods protocols are cumbersome to implement and require a large amount of genomic DNA material (3–20 μg).
The preparation of single-molecule templates is more straightforward and requires less starting material (<1 μg).
More importantly, these methods do not require PCR, which creates mutations in clonally amplified templates that masquerade as sequence variants.
AT-rich and GC-rich target sequences may also show amplification bias in product yield, which results in their under representation in genome alignments and assemblies.
Single molecule templates are usually immobilized on solid supports using one of at least 3 different approaches:
1. Spatially distributed individual primer molecules are covalently attached to the solid support. The template, which is prepared by randomly fragmenting the starting material into small sizes (for example,~200–250 bp) and adding common adaptors to the fragment ends, is then hybridized to the immobilized primer
2. Spatially distributed quality to aid in quality trimming.single-molecule templates are covalently attached to the solid support by priming and extending single-stranded, single-molecule templates from immobilized primers. A common primer is then hybridized to the template. In either approach, DNA polymerase can bind to the immobilized primed template configuration to initiate the NGS reaction.
Both of the above approaches are used by Helicos BioSciences.
3. Spatially distributed single polymerase molecules are attached to the solid support, to which a primed template molecule is bound. Larger DNA molecules (up to 10,000 bp) can be used with this technique .
This approach is used by Pacific Biosciences.
GENE ANNOTATION quality to aid in quality trimming.
What is Annotation??? quality to aid in quality trimming.
Extraction, definition, and interpretation of features on the genome sequence derived by integrating computational tools and biological knowledge.
-- Find the genes
– Heuristic signals
– Inherent features
– Intelligent methods
Characterize each gene
– Compare with other genes
– Find functional components
– Predict features
Heuristic Signals quality to aid in quality trimming.
DNA contains various recognition sites for internal machinery like:
• Promoter signals
• Transcription start signals
• Start Codon
• Exon, Intron boundaries
• Transcription termination signals
DNA exhibits certain biases that can be exploited to locate coding regions
• Uneven distribution of bases
• Codon bias
• CpG islands
• Encoded amino acid sequence
• Imperfect periodicity
• Other global patterns
Intelligent Methods quality to aid in quality trimming.
Pattern recognition methods weigh inputs and predict gene location
– Content-based methods
– Site-based methods
– Comparative methods
• Neural Networks
• Hidden Markov Models
neural network was traditionally used to refer to a network or circuit of biological neurons. The modern usage of the term often refers to artificial neural networks, which are composed of artificial neurons or nodes.
A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. An HMM can be considered as the simplest dynamic Bayesian network.
Looks at several structural features quality to aid in quality trimming.
– Splice donor/acceptor sites
– Putative coding regions
– Intronic regions
– Linear discriminant analysis to split exon / non-exon classes
– Dynamic programming to assemble best gene structure
Quadratic discriminant analysis quality to aid in quality trimming.
– Exon length
– Exon-intron transitions
– Splice sites
– Branch sites
– Exon, strand, frame scores
– Detects internal exons
• Select by correlation coefficient
• Select by review paper
• Select by recommendation
• Use them all
Internet Resources quality to aid in quality trimming.
Banbury Cross http://igs-server.cnrs-mrs.fr/igs/banbury
Characterize a Gene quality to aid in quality trimming.
Collect clues for potential function
• Comparison with other known genes, proteins
• Predict secondary structure
• Fold classification
• Gene Expression
• Gene Regulatory Networks
• Phylogenetic comparisons
• Metabolic pathways