Introduction to Bioinformatics 2. Genetics Background - PowerPoint PPT Presentation

Introduction to bioinformatics 2 genetics background l.jpg
1 / 28

Introduction to Bioinformatics 2. Genetics Background Course 341 Department of Computing Imperial College, London © Simon Colton Coursework 1 coursework – worth 20 marks Work in pairs Retrieving information from a database Using Perl to manipulate that information The Robot Scientist

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Introduction to Bioinformatics 2. Genetics Background

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Introduction to bioinformatics 2 genetics background l.jpg

Introduction to Bioinformatics2. Genetics Background

Course 341

Department of Computing

Imperial College, London

© Simon Colton

Coursework l.jpg


  • 1 coursework – worth 20 marks

    • Work in pairs

  • Retrieving information from a database

  • Using Perl to manipulate that information

The robot scientist l.jpg

The Robot Scientist

  • Performs experiments

  • Learns from results

    • Using machine learning

  • Plans more experiments

  • Saves time and money

  • Team member:

    • Stephen Muggleton

Biological nomenclature l.jpg

Biological Nomenclature

  • Need to know the meaning of:

    • Species, organism, cell, nucleus, chromosome, DNA

    • Genome, gene, base, residue, protein, amino acid

    • Transcription, translation, messenger RNA

    • Codons, genetic code, evolution, mutation, crossover

    • Polymer, genotype, phenotype, conformation

    • Inheritance, homology, phylogenetic trees

Substructure and effect top down bottom up l.jpg

Affects the

Behaviour of

Affects the

Function of




Substructure and Effect(Top Down/Bottom Up)








Amino Acid

DNA strand



Cells l.jpg


  • Basic unit of life

  • Different types of cell:

    • Skin, brain, red/white blood

    • Different biological function

  • Cells produced by cells

    • Cell division (mitosis)

    • 2 daughter cells

  • Eukaryotic cells

    • Have a nucleus

Nucleus and chromosomes l.jpg

Nucleus and Chromosomes

  • Each cell has nucleus

  • Rod-shaped particles inside

    • Are chromosomes

    • Which we think of in pairs

  • Different number for species

    • Human(46),tobacco(48)

    • Goldfish(94),chimp(48)

    • Usually paired up

  • X & Y Chromosomes

    • Humans: Male(xy), Female(xx)

    • Birds: Male(xx), Female(xy)

Dna strands l.jpg

DNA Strands

  • Chromosomes are same in every cell of organism

    • Supercoiled DNA (Deoxyribonucleic acid)

  • Take a human, take one cell

    • Determine the structure of all chromosonal DNA

    • You’ve just read the human genome (for 1 person)

    • Human genome project

      • 13 years, 3.2 billion chemicals (bases) in human genome

  • Other genomes being/been decoded:

    • Pufferfish, fruit fly, mouse, chicken, yeast, bacteria

Dna structure l.jpg

DNA Structure

  • Double Helix (Crick & Watson)

    • 2 coiled matching strands

    • Backbone of sugar phosphate pairs

  • Nitrogenous Base Pairs

    • Roughly 20 atoms in a base

    • Adenine  Thymine [A,T]

    • Cytosine  Guanine [C,G]

    • Weak bonds (can be broken)

    • Form long chains called polymers

  • Read the sequence on 1 strand


Differences in dna l.jpg

Differences in DNA

  • DNA differentiates:

    • Species/race/gender

    • Individuals

  • We share DNA with

    • Primates,mammals

    • Fish, plants, bacteria

  • Genotype

    • DNA of an individual

      • Genetic constitution

  • Phenotype

    • Characteristics of the resulting organism

      • Nature and nurture



Share Material

Roughly 4%

Genes l.jpg


  • Chunks of DNA sequence

    • Between 600 and 1200 bases long

    • 32,000 human genes, 100,000 genes in tulips

  • Large percentage of human genome

    • Is “junk”: does not code for proteins

  • “Simpler” organisms such as bacteria

    • Are much more evolved (have hardly any junk)

    • Viruses have overlapping genes (zipped/compressed)

  • Often the active part of a gene is split into exons

    • Seperated by introns

The synthesis of proteins l.jpg

The Synthesis of Proteins

  • Instructions for generating Amino Acid sequences

    • (i) DNA double helix is unzipped

    • (ii) One strand is transcribed to messenger RNA

    • (iii) RNA acts as a template

      • ribosomes translate the RNA into the sequence of amino acids

  • Amino acid sequences fold into a 3d molecule

  • Gene expression

    • Every cell has every gene in it (has all chromosomes)

    • Which ones produce proteins (are expressed) & when?

Transcription l.jpg


  • Take one strand of DNA

  • Write out the counterparts to each base

    • G becomes C (and vice versa)

    • A becomes T (and vice versa)

  • Change Thymine [T] to Uracil [U]

  • You have transcribed DNA into messenger RNA

  • Example:


    Intermediate: CCTACGGTTAC

    Transcribed: CCUACGGUUAC

Genetic code l.jpg

Genetic Code

  • How the translation occurs

  • Think of this as a function:

    • Input: triples of three base letters (Codons)

    • Output: amino acid

    • Example: ACC becomes threonine (T)

  • Gene sequences end with:

    • TAA, TAG or TGA

Genetic code15 l.jpg

Genetic Code



D=Asp=Aspartic acid

E=Glu=Glutamic acid

















Example synthesis l.jpg

Example Synthesis


    Transcribed to:


    Translated to:


Proteins l.jpg


  • DNA codes for

    • strings of amino acids

  • Amino acids strings

    • Fold up into complex 3d molecule

    • 3d structures:conformations

    • Between 200 & 400 “residues”

    • Folds are proteins

  • Residue sequences

    • Always fold to same conformation

  • Proteins play a part

    • In almost every biological process

Evolution of genes inheritance l.jpg

Evolution of Genes: Inheritance

  • Evolution of species

    • Caused by reproduction and survival of the fittest

  • But actually, it is the genotype which evolves

    • Organism has to live with it (or die before reproduction)

    • Three mechanisms: inheritance, mutation and crossover

  • Inheritance: properties from parents

    • Embryo has cells with 23 pairs of chromosomes

    • Each pair: 1 chromosome from father, 1 from mother

    • Most important factor in offspring’s genetic makeup

Evolution of genes mutation l.jpg

Evolution of Genes: Mutation

  • Genes alter (slightly) during reproduction

    • Caused by errors, from radiation, from toxicity

    • 3 possibilities: deletion, insertion, alteration




  • Mutations are almost always deleterious

    • A single change has a massive effect on translation

    • Causes a different protein conformation

Evolution of genes crossover recombination l.jpg

Evolution of Genes: Crossover (Recombination)

  • DNA sections are swapped

    • From male and female genetic input to offspring DNA

Bioinformatics application 1 phylogenetic trees l.jpg

Bioinformatics Application #1Phylogenetic trees

  • Understand our evolution

  • Genes are homologous

    • If they share a common ancestor

  • By looking at DNA seqs

    • For particular genes

    • See who evolved from who

  • Example:

    • Mammoth most related to

      • African or Indian Elephants?

  • LUCA:

    • Last Universal Common Ancestor

    • Roughly 4 billion years ago

Genetic disorders l.jpg

Genetic Disorders

  • Disorders have fuelled much genetics research

    • Remember that genes have evolved to function

      • Not to malfunction

  • Different types of genetic problems

  • Downs syndrome: three chromosome 21s

  • Cystic fibrosis:

    • Single base-pair mutation disables a protein

    • Restricts the flow of ions into certain lung cells

    • Lung is less able to expel fluids

Bioinformatics application 2 predicting protein structure l.jpg

Bioinformatics Application #2Predicting Protein Structure

  • Proteins fold to set up an active site

    • Small, but highly effective (sub)structure

    • Active site(s) determine the activity of the protein

  • Remember that translation is a function

    • Always same structure given same set of codons

    • Is there a set of rules governing how proteins fold?

    • No one has found one yet

    • “Holy Grail” of bioinformatics

Protein structure knowledge l.jpg

Protein Structure Knowledge

  • Both protein sequence and structure

    • Are being determined at an exponential rate

  • 1.3+ Million protein sequences known

    • Found with projects like Human Genome Project

  • 20,000+ protein structures known

    • Found using techniques like X-ray crystallography

  • Takes between 1 month and 3 years

    • To determine the structure of a protein

    • Process is getting quicker

Sequence versus structure l.jpg











Sequence versus Structure

Protein sequence


Protein structure


Database approaches l.jpg

Database Approaches

  • Slow(er) rate of finding protein structure

    • Still a good idea to pursue the Holy Grail

  • Structure is much more conservative than sequence

    • 1.3m genes, but only 2,000 – 10,000 different conformations

  • First approach to sequence prediction:

    • Store [sequence,structure] pairs in a database

    • Find ways to score similarity of residue sequences

    • Given a new sequence, find closest matches

      • A good match will possibly mean similar protein shape

      • E.g., sequence identity > 35% will give a good match

    • Rest of the first half of the course about these issues

Potential big payoffs of protein structure prediction l.jpg

Potential (Big) Payoffsof Protein Structure Prediction

  • Protein function prediction

    • Protein interactions and docking

  • Rational drug design

    • Inhibit or stimulate protein activity with a drug

  • Systems biology

    • Putting it all together: “E-cell” and “E-organism”

    • In-silico modelling of biological entities and process

Further reading l.jpg

Further Reading

  • Human Genome Project at Sanger Centre


  • Talking glossary of genetic terms


  • Primer on molecular genetics


  • Login