introduction to bioinformatics 2 genetics background
Download
Skip this Video
Download Presentation
Introduction to Bioinformatics 2. Genetics Background

Loading in 2 Seconds...

play fullscreen
1 / 28

Introduction to Bioinformatics - PowerPoint PPT Presentation


  • 331 Views
  • Uploaded on

Introduction to Bioinformatics 2. Genetics Background Course 341 Department of Computing Imperial College, London © Simon Colton Coursework 1 coursework – worth 20 marks Work in pairs Retrieving information from a database Using Perl to manipulate that information The Robot Scientist

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Introduction to Bioinformatics' - albert


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
introduction to bioinformatics 2 genetics background

Introduction to Bioinformatics2. Genetics Background

Course 341

Department of Computing

Imperial College, London

© Simon Colton

coursework
Coursework
  • 1 coursework – worth 20 marks
    • Work in pairs
  • Retrieving information from a database
  • Using Perl to manipulate that information
the robot scientist
The Robot Scientist
  • Performs experiments
  • Learns from results
    • Using machine learning
  • Plans more experiments
  • Saves time and money
  • Team member:
    • Stephen Muggleton
biological nomenclature
Biological Nomenclature
  • Need to know the meaning of:
    • Species, organism, cell, nucleus, chromosome, DNA
    • Genome, gene, base, residue, protein, amino acid
    • Transcription, translation, messenger RNA
    • Codons, genetic code, evolution, mutation, crossover
    • Polymer, genotype, phenotype, conformation
    • Inheritance, homology, phylogenetic trees
substructure and effect top down bottom up

Affects the

Behaviour of

Affects the

Function of

Folds

into

Prescribes

Substructure and Effect(Top Down/Bottom Up)

Substructure

Species

Organism

Cell

Nucleus

Protein

Chromosome

Amino Acid

DNA strand

Gene

Base

cells
Cells
  • Basic unit of life
  • Different types of cell:
    • Skin, brain, red/white blood
    • Different biological function
  • Cells produced by cells
    • Cell division (mitosis)
    • 2 daughter cells
  • Eukaryotic cells
    • Have a nucleus
nucleus and chromosomes
Nucleus and Chromosomes
  • Each cell has nucleus
  • Rod-shaped particles inside
    • Are chromosomes
    • Which we think of in pairs
  • Different number for species
    • Human(46),tobacco(48)
    • Goldfish(94),chimp(48)
    • Usually paired up
  • X & Y Chromosomes
    • Humans: Male(xy), Female(xx)
    • Birds: Male(xx), Female(xy)
dna strands
DNA Strands
  • Chromosomes are same in every cell of organism
    • Supercoiled DNA (Deoxyribonucleic acid)
  • Take a human, take one cell
    • Determine the structure of all chromosonal DNA
    • You’ve just read the human genome (for 1 person)
    • Human genome project
      • 13 years, 3.2 billion chemicals (bases) in human genome
  • Other genomes being/been decoded:
    • Pufferfish, fruit fly, mouse, chicken, yeast, bacteria
dna structure
DNA Structure
  • Double Helix (Crick & Watson)
    • 2 coiled matching strands
    • Backbone of sugar phosphate pairs
  • Nitrogenous Base Pairs
    • Roughly 20 atoms in a base
    • Adenine  Thymine [A,T]
    • Cytosine  Guanine [C,G]
    • Weak bonds (can be broken)
    • Form long chains called polymers
  • Read the sequence on 1 strand
    • GATTCATCATGGATCATACTAAC
differences in dna
Differences in DNA
  • DNA differentiates:
    • Species/race/gender
    • Individuals
  • We share DNA with
    • Primates,mammals
    • Fish, plants, bacteria
  • Genotype
    • DNA of an individual
      • Genetic constitution
  • Phenotype
    • Characteristics of the resulting organism
      • Nature and nurture

tiny

2%

Share Material

Roughly 4%

genes
Genes
  • Chunks of DNA sequence
    • Between 600 and 1200 bases long
    • 32,000 human genes, 100,000 genes in tulips
  • Large percentage of human genome
    • Is “junk”: does not code for proteins
  • “Simpler” organisms such as bacteria
    • Are much more evolved (have hardly any junk)
    • Viruses have overlapping genes (zipped/compressed)
  • Often the active part of a gene is split into exons
    • Seperated by introns
the synthesis of proteins
The Synthesis of Proteins
  • Instructions for generating Amino Acid sequences
    • (i) DNA double helix is unzipped
    • (ii) One strand is transcribed to messenger RNA
    • (iii) RNA acts as a template
      • ribosomes translate the RNA into the sequence of amino acids
  • Amino acid sequences fold into a 3d molecule
  • Gene expression
    • Every cell has every gene in it (has all chromosomes)
    • Which ones produce proteins (are expressed) & when?
transcription
Transcription
  • Take one strand of DNA
  • Write out the counterparts to each base
    • G becomes C (and vice versa)
    • A becomes T (and vice versa)
  • Change Thymine [T] to Uracil [U]
  • You have transcribed DNA into messenger RNA
  • Example:

Start: GGATGCCAATG

Intermediate: CCTACGGTTAC

Transcribed: CCUACGGUUAC

genetic code
Genetic Code
  • How the translation occurs
  • Think of this as a function:
    • Input: triples of three base letters (Codons)
    • Output: amino acid
    • Example: ACC becomes threonine (T)
  • Gene sequences end with:
    • TAA, TAG or TGA
genetic code15
Genetic Code

A=Ala=Alanine

C=Cys=Cysteine

D=Asp=Aspartic acid

E=Glu=Glutamic acid

F=Phe=Phenylalanine

G=Gly=Glycine

H=His=Histidine

I=Ile=Isoleucine

K=Lys=Lysine

L=Leu=Leucine

M=Met=Methionine

N=Asn=Asparagine

P=Pro=Proline

Q=Gln=Glutamine

R=Arg=Arginine

S=Ser=Serine

T=Thr=Threonine

V=Val=Valine

W=Trp=Tryptophan

Y=Tyr=Tyrosine

example synthesis
Example Synthesis
  • TCGGTGAATCTGTTTGAT

Transcribed to:

  • AGCCACUUAGACAAACUA

Translated to:

  • SHLDKL
proteins
Proteins
  • DNA codes for
    • strings of amino acids
  • Amino acids strings
    • Fold up into complex 3d molecule
    • 3d structures:conformations
    • Between 200 & 400 “residues”
    • Folds are proteins
  • Residue sequences
    • Always fold to same conformation
  • Proteins play a part
    • In almost every biological process
evolution of genes inheritance
Evolution of Genes: Inheritance
  • Evolution of species
    • Caused by reproduction and survival of the fittest
  • But actually, it is the genotype which evolves
    • Organism has to live with it (or die before reproduction)
    • Three mechanisms: inheritance, mutation and crossover
  • Inheritance: properties from parents
    • Embryo has cells with 23 pairs of chromosomes
    • Each pair: 1 chromosome from father, 1 from mother
    • Most important factor in offspring’s genetic makeup
evolution of genes mutation
Evolution of Genes: Mutation
  • Genes alter (slightly) during reproduction
    • Caused by errors, from radiation, from toxicity
    • 3 possibilities: deletion, insertion, alteration
  • Deletion: ACGTTGACTC  ACGTGACTC
  • Insertion: ACGTTGACTC  AGCGTTGACTC
  • Substitution: ACGTTGACTC  ACGATGACTT
  • Mutations are almost always deleterious
    • A single change has a massive effect on translation
    • Causes a different protein conformation
evolution of genes crossover recombination
Evolution of Genes: Crossover (Recombination)
  • DNA sections are swapped
    • From male and female genetic input to offspring DNA
bioinformatics application 1 phylogenetic trees
Bioinformatics Application #1Phylogenetic trees
  • Understand our evolution
  • Genes are homologous
    • If they share a common ancestor
  • By looking at DNA seqs
    • For particular genes
    • See who evolved from who
  • Example:
    • Mammoth most related to
      • African or Indian Elephants?
  • LUCA:
    • Last Universal Common Ancestor
    • Roughly 4 billion years ago
genetic disorders
Genetic Disorders
  • Disorders have fuelled much genetics research
    • Remember that genes have evolved to function
      • Not to malfunction
  • Different types of genetic problems
  • Downs syndrome: three chromosome 21s
  • Cystic fibrosis:
    • Single base-pair mutation disables a protein
    • Restricts the flow of ions into certain lung cells
    • Lung is less able to expel fluids
bioinformatics application 2 predicting protein structure
Bioinformatics Application #2Predicting Protein Structure
  • Proteins fold to set up an active site
    • Small, but highly effective (sub)structure
    • Active site(s) determine the activity of the protein
  • Remember that translation is a function
    • Always same structure given same set of codons
    • Is there a set of rules governing how proteins fold?
    • No one has found one yet
    • “Holy Grail” of bioinformatics
protein structure knowledge
Protein Structure Knowledge
  • Both protein sequence and structure
    • Are being determined at an exponential rate
  • 1.3+ Million protein sequences known
    • Found with projects like Human Genome Project
  • 20,000+ protein structures known
    • Found using techniques like X-ray crystallography
  • Takes between 1 month and 3 years
    • To determine the structure of a protein
    • Process is getting quicker
sequence versus structure

500000

400000

300000

200000

100000

0

85

90

95

00

Sequence versus Structure

Protein sequence

Number

Protein structure

Year

database approaches
Database Approaches
  • Slow(er) rate of finding protein structure
    • Still a good idea to pursue the Holy Grail
  • Structure is much more conservative than sequence
    • 1.3m genes, but only 2,000 – 10,000 different conformations
  • First approach to sequence prediction:
    • Store [sequence,structure] pairs in a database
    • Find ways to score similarity of residue sequences
    • Given a new sequence, find closest matches
      • A good match will possibly mean similar protein shape
      • E.g., sequence identity > 35% will give a good match
    • Rest of the first half of the course about these issues
potential big payoffs of protein structure prediction
Potential (Big) Payoffsof Protein Structure Prediction
  • Protein function prediction
    • Protein interactions and docking
  • Rational drug design
    • Inhibit or stimulate protein activity with a drug
  • Systems biology
    • Putting it all together: “E-cell” and “E-organism”
    • In-silico modelling of biological entities and process
further reading
Further Reading
  • Human Genome Project at Sanger Centre
    • http://www.sanger.ac.uk/HGP/
  • Talking glossary of genetic terms
    • http://www.genome.gov/glossary.cfm
  • Primer on molecular genetics
    • http://www.ornl.gov/TechResources/Human_Genome/publicat/primer/toc.html
ad