Genome Sequencing & the Human Genome Project Speaker- Joy Scaria Biological Sciences Group. “Today we are initiating an unending study of human biology. Whatever else [happens]…. It will be an adventure, a priceless endeavor” -Norton Zinder.
“Today we are initiating an unending study
of human biology. Whatever else [happens]….
It will be an adventure, a priceless endeavor”
Human genome- 3 Billion nucleotides(3.2Gb)
E.coli – 4.7 million Nucleotides
E. coli genome – 300 pages of a 1000 page book
D.melanogaster- 10 books (8 Chromosomes)
Human – 200 books( 200, 000 pages)
Origin of the project
No one is certain as to who first suggested the idea
First serious proposal was by Robert Sinsheimer
Renato Dulbecco who was in attendance told this to
J.D. Watson at CSHL
DOE Scientists were made aware of Sinsheimer’s
Charles Delisi Head of DOE’s Health & Environmental
Research convened a metting and offered Los Alamos
And Livermore labs to start the project
Celera Vs Government project
The cDNA dispute and patent claims
Craig Ventor Establishes Celera
Celera database is paid while the other is free
Male or Female ???
Whose Money ??
Major participant is US government
Totally 20 Groups from UK, Japan, France
Germany and China are involved
Celera is a private company
Project goals are to
■ identify all the approximate 30,000 genes in human DNA,
■ determine the sequences of the 3 billion chemical base pairs that
make up human DNA,
■ store this information in databases,
■ improve tools for data analysis,
■ transfer related technologies to the private sector, and
■ address the ethical, legal, and social issues (ELSI) that may arise from
■ June 2000 completion of a working draft of the entire human genome
■ February 2001 analyses of the working draft are published
Published/complete genomes = 173
Prokaryotic ongoing projects = 432
Eukaryotic ongoing genomes = 368
Total = 974 (data as on 20-07-04)
Many more on the way
Draft First Sequence later
Developing overlapping genomic clone library
Cloning in Bacteria
Robotic Assembly for HTG Sequencing
Determining the purity of DNA
To determine the DNA concentration, the eluted samples are transferred into
specially designed quartz glass microplates and the absorbancy at 260 nm and 320 nm
The robot is gripping a 384 well microplate (used for PCR or sequencing)
from a 384 well MJ PTC 225 Tetrad Block.
Automated liquid handling
The Hydras are used for all liquid handling steps,
e. g. setting up the PCR reaction, the purification, the sequencing, etc.
The fridge is for plates with contents (e. g. sequencing mix, PCR mix, etc.)
which have to be stored at +4 °C. The door is opened and closed pneumatically
Capillary Array Electrophoresis (CAE).
DNA samples are introduced into the 96-capillary array; as
the separated fragments pass through the capillaries, they are
irradiated all at once with laser light. Fluorescence is measured
by a charged coupled device that acts as a simultaneous multichannel
detector. (Inset circle at upper left: Closeup view of individual capillary
lanes with separated samples.
Raw sequence: Individual unassembled sequence reads,
produced by sequencing of clones containing DNA inserts.
Paired-end sequence: Raw sequence obtained from both
ends of a cloned insert in any vector, such as a plasmid or
bacterial artificial chromosome.
Finished sequence: Complete sequence of a clone or
genome, with an accuracy of at least 99.99% and no gaps
BAC clone: Bacterial artificial chromosome vector carrying a genomic
DNA insert, typically 100–200 kb. Most of the large-insert clones
sequenced in the project were BAC clones
Draft clone :A large-insert clone for which roughly half-shotgun
sequence has been produced.
Predraft clone: A large-insert clone for which some shotgun sequence
is available, but which does not meet the standards for inclusion in the
collection of draft clones.
Contig The result of joining an overlapping collection of sequences
Scaffold: The result of connecting contigs by linking information from paired-end reads from plasmids, paired-end reads from BACs, known messenger RNAs or other sources. The contigs in a scaffold are ordered
and oriented with respect to one another
STS :Sequence tagged site, corresponding to a short (typically less than 500 bp) unique genomic locus
Common Sources of STSs
Expressed sequence tags (ESTs) are short sequences obtained
by analysis of complementary DNA (cDNA) clones.
Complementary DNA is prepared by converting mRNA into
double-stranded DNA and is thought to represent the sequences
of the genes being expressed.
Simple sequence length polymorphisms (SSLPs) are arrays of
repeat sequences that display length variations. SSLPs that are
polymorphic and have already been mapped by linkage analysis
are particularly valuable because they provide a connection
between genetic and physical maps.
Random genomic sequences
SNP :Single nucleotide polymorphism
RFLP: Restriction fragment length polymorphism
What does the project reveal ?
Gene Numbers- Celera estimates 26000,
Public project puts it as 3100
Genome is laden with repeat sequences
But Puffer fish Genome virtually lacks any
Human repetitive sequences are old and enfeebled
While Mouse repeats are dynamic
More than 1.4 million SNPs has been found
Origin of genome- From evolutionary past
Only 94 of 1,278 protein families are vertebrate specific
Why did evolution favor Males
No sexuality in lower kingdoms
Hermaprodites are perfect animals
After all Why do we need males?
Instead of being parasites on females why do
they just go away?
Total number of genes 252
SRY is mapped on Y.p11.3
It is just 896bp
Question of Mitochondrial Eve
Artificial intelligence and many more…
• Transcriptomics involves large‑scale analysis of messenger RNAs
(molecules that are transcribed from active genes) to follow when, where,
and under what conditions genes are expressed.
• Proteomics—the study of protein expression and function—can bring
researchers closer than gene expression studies to what’s actually
happening in the cell.
• Structural genomics initiatives are being launched worldwide to
generate the 3‑D structures of one or more proteins from each protein
family, thus offering clues to function and biological targets for drug design.
• Knockout studies are one experimental method for understanding the
function of DNA sequences and the proteins they encode. Researchers
inactivate genes in living organisms and monitor any changes that could
reveal the function of specific genes.
• Comparative genomics—analyzing DNA sequence patterns of humans
and well‑studied model organisms side‑by‑side—has become one of the
most powerful strategies for identifying human genes and interpreting their
Nature : Vol.409, No.6822., 15 Feb 2001