Introduction to bioinformatics 2 genetics background
Download
1 / 28

Introduction to Bioinformatics - PowerPoint PPT Presentation


  • 329 Views
  • Updated On :

Introduction to Bioinformatics 2. Genetics Background Course 341 Department of Computing Imperial College, London © Simon Colton Coursework 1 coursework – worth 20 marks Work in pairs Retrieving information from a database Using Perl to manipulate that information The Robot Scientist

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Introduction to Bioinformatics' - albert


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Introduction to bioinformatics 2 genetics background l.jpg

Introduction to Bioinformatics2. Genetics Background

Course 341

Department of Computing

Imperial College, London

© Simon Colton


Coursework l.jpg
Coursework

  • 1 coursework – worth 20 marks

    • Work in pairs

  • Retrieving information from a database

  • Using Perl to manipulate that information


The robot scientist l.jpg
The Robot Scientist

  • Performs experiments

  • Learns from results

    • Using machine learning

  • Plans more experiments

  • Saves time and money

  • Team member:

    • Stephen Muggleton


Biological nomenclature l.jpg
Biological Nomenclature

  • Need to know the meaning of:

    • Species, organism, cell, nucleus, chromosome, DNA

    • Genome, gene, base, residue, protein, amino acid

    • Transcription, translation, messenger RNA

    • Codons, genetic code, evolution, mutation, crossover

    • Polymer, genotype, phenotype, conformation

    • Inheritance, homology, phylogenetic trees


Substructure and effect top down bottom up l.jpg

Affects the

Behaviour of

Affects the

Function of

Folds

into

Prescribes

Substructure and Effect(Top Down/Bottom Up)

Substructure

Species

Organism

Cell

Nucleus

Protein

Chromosome

Amino Acid

DNA strand

Gene

Base


Cells l.jpg
Cells

  • Basic unit of life

  • Different types of cell:

    • Skin, brain, red/white blood

    • Different biological function

  • Cells produced by cells

    • Cell division (mitosis)

    • 2 daughter cells

  • Eukaryotic cells

    • Have a nucleus


Nucleus and chromosomes l.jpg
Nucleus and Chromosomes

  • Each cell has nucleus

  • Rod-shaped particles inside

    • Are chromosomes

    • Which we think of in pairs

  • Different number for species

    • Human(46),tobacco(48)

    • Goldfish(94),chimp(48)

    • Usually paired up

  • X & Y Chromosomes

    • Humans: Male(xy), Female(xx)

    • Birds: Male(xx), Female(xy)


Dna strands l.jpg
DNA Strands

  • Chromosomes are same in every cell of organism

    • Supercoiled DNA (Deoxyribonucleic acid)

  • Take a human, take one cell

    • Determine the structure of all chromosonal DNA

    • You’ve just read the human genome (for 1 person)

    • Human genome project

      • 13 years, 3.2 billion chemicals (bases) in human genome

  • Other genomes being/been decoded:

    • Pufferfish, fruit fly, mouse, chicken, yeast, bacteria


Dna structure l.jpg
DNA Structure

  • Double Helix (Crick & Watson)

    • 2 coiled matching strands

    • Backbone of sugar phosphate pairs

  • Nitrogenous Base Pairs

    • Roughly 20 atoms in a base

    • Adenine  Thymine [A,T]

    • Cytosine  Guanine [C,G]

    • Weak bonds (can be broken)

    • Form long chains called polymers

  • Read the sequence on 1 strand

    • GATTCATCATGGATCATACTAAC


Differences in dna l.jpg
Differences in DNA

  • DNA differentiates:

    • Species/race/gender

    • Individuals

  • We share DNA with

    • Primates,mammals

    • Fish, plants, bacteria

  • Genotype

    • DNA of an individual

      • Genetic constitution

  • Phenotype

    • Characteristics of the resulting organism

      • Nature and nurture

tiny

2%

Share Material

Roughly 4%


Genes l.jpg
Genes

  • Chunks of DNA sequence

    • Between 600 and 1200 bases long

    • 32,000 human genes, 100,000 genes in tulips

  • Large percentage of human genome

    • Is “junk”: does not code for proteins

  • “Simpler” organisms such as bacteria

    • Are much more evolved (have hardly any junk)

    • Viruses have overlapping genes (zipped/compressed)

  • Often the active part of a gene is split into exons

    • Seperated by introns


The synthesis of proteins l.jpg
The Synthesis of Proteins

  • Instructions for generating Amino Acid sequences

    • (i) DNA double helix is unzipped

    • (ii) One strand is transcribed to messenger RNA

    • (iii) RNA acts as a template

      • ribosomes translate the RNA into the sequence of amino acids

  • Amino acid sequences fold into a 3d molecule

  • Gene expression

    • Every cell has every gene in it (has all chromosomes)

    • Which ones produce proteins (are expressed) & when?


Transcription l.jpg
Transcription

  • Take one strand of DNA

  • Write out the counterparts to each base

    • G becomes C (and vice versa)

    • A becomes T (and vice versa)

  • Change Thymine [T] to Uracil [U]

  • You have transcribed DNA into messenger RNA

  • Example:

    Start: GGATGCCAATG

    Intermediate: CCTACGGTTAC

    Transcribed: CCUACGGUUAC


Genetic code l.jpg
Genetic Code

  • How the translation occurs

  • Think of this as a function:

    • Input: triples of three base letters (Codons)

    • Output: amino acid

    • Example: ACC becomes threonine (T)

  • Gene sequences end with:

    • TAA, TAG or TGA


Genetic code15 l.jpg
Genetic Code

A=Ala=Alanine

C=Cys=Cysteine

D=Asp=Aspartic acid

E=Glu=Glutamic acid

F=Phe=Phenylalanine

G=Gly=Glycine

H=His=Histidine

I=Ile=Isoleucine

K=Lys=Lysine

L=Leu=Leucine

M=Met=Methionine

N=Asn=Asparagine

P=Pro=Proline

Q=Gln=Glutamine

R=Arg=Arginine

S=Ser=Serine

T=Thr=Threonine

V=Val=Valine

W=Trp=Tryptophan

Y=Tyr=Tyrosine


Example synthesis l.jpg
Example Synthesis

  • TCGGTGAATCTGTTTGAT

    Transcribed to:

  • AGCCACUUAGACAAACUA

    Translated to:

  • SHLDKL


Proteins l.jpg
Proteins

  • DNA codes for

    • strings of amino acids

  • Amino acids strings

    • Fold up into complex 3d molecule

    • 3d structures:conformations

    • Between 200 & 400 “residues”

    • Folds are proteins

  • Residue sequences

    • Always fold to same conformation

  • Proteins play a part

    • In almost every biological process


Evolution of genes inheritance l.jpg
Evolution of Genes: Inheritance

  • Evolution of species

    • Caused by reproduction and survival of the fittest

  • But actually, it is the genotype which evolves

    • Organism has to live with it (or die before reproduction)

    • Three mechanisms: inheritance, mutation and crossover

  • Inheritance: properties from parents

    • Embryo has cells with 23 pairs of chromosomes

    • Each pair: 1 chromosome from father, 1 from mother

    • Most important factor in offspring’s genetic makeup


Evolution of genes mutation l.jpg
Evolution of Genes: Mutation

  • Genes alter (slightly) during reproduction

    • Caused by errors, from radiation, from toxicity

    • 3 possibilities: deletion, insertion, alteration

  • Deletion: ACGTTGACTC  ACGTGACTC

  • Insertion: ACGTTGACTC  AGCGTTGACTC

  • Substitution: ACGTTGACTC  ACGATGACTT

  • Mutations are almost always deleterious

    • A single change has a massive effect on translation

    • Causes a different protein conformation


Evolution of genes crossover recombination l.jpg
Evolution of Genes: Crossover (Recombination)

  • DNA sections are swapped

    • From male and female genetic input to offspring DNA


Bioinformatics application 1 phylogenetic trees l.jpg
Bioinformatics Application #1Phylogenetic trees

  • Understand our evolution

  • Genes are homologous

    • If they share a common ancestor

  • By looking at DNA seqs

    • For particular genes

    • See who evolved from who

  • Example:

    • Mammoth most related to

      • African or Indian Elephants?

  • LUCA:

    • Last Universal Common Ancestor

    • Roughly 4 billion years ago


Genetic disorders l.jpg
Genetic Disorders

  • Disorders have fuelled much genetics research

    • Remember that genes have evolved to function

      • Not to malfunction

  • Different types of genetic problems

  • Downs syndrome: three chromosome 21s

  • Cystic fibrosis:

    • Single base-pair mutation disables a protein

    • Restricts the flow of ions into certain lung cells

    • Lung is less able to expel fluids


Bioinformatics application 2 predicting protein structure l.jpg
Bioinformatics Application #2Predicting Protein Structure

  • Proteins fold to set up an active site

    • Small, but highly effective (sub)structure

    • Active site(s) determine the activity of the protein

  • Remember that translation is a function

    • Always same structure given same set of codons

    • Is there a set of rules governing how proteins fold?

    • No one has found one yet

    • “Holy Grail” of bioinformatics


Protein structure knowledge l.jpg
Protein Structure Knowledge

  • Both protein sequence and structure

    • Are being determined at an exponential rate

  • 1.3+ Million protein sequences known

    • Found with projects like Human Genome Project

  • 20,000+ protein structures known

    • Found using techniques like X-ray crystallography

  • Takes between 1 month and 3 years

    • To determine the structure of a protein

    • Process is getting quicker


Sequence versus structure l.jpg

500000

400000

300000

200000

100000

0

85

90

95

00

Sequence versus Structure

Protein sequence

Number

Protein structure

Year


Database approaches l.jpg
Database Approaches

  • Slow(er) rate of finding protein structure

    • Still a good idea to pursue the Holy Grail

  • Structure is much more conservative than sequence

    • 1.3m genes, but only 2,000 – 10,000 different conformations

  • First approach to sequence prediction:

    • Store [sequence,structure] pairs in a database

    • Find ways to score similarity of residue sequences

    • Given a new sequence, find closest matches

      • A good match will possibly mean similar protein shape

      • E.g., sequence identity > 35% will give a good match

    • Rest of the first half of the course about these issues


Potential big payoffs of protein structure prediction l.jpg
Potential (Big) Payoffsof Protein Structure Prediction

  • Protein function prediction

    • Protein interactions and docking

  • Rational drug design

    • Inhibit or stimulate protein activity with a drug

  • Systems biology

    • Putting it all together: “E-cell” and “E-organism”

    • In-silico modelling of biological entities and process


Further reading l.jpg
Further Reading

  • Human Genome Project at Sanger Centre

    • http://www.sanger.ac.uk/HGP/

  • Talking glossary of genetic terms

    • http://www.genome.gov/glossary.cfm

  • Primer on molecular genetics

    • http://www.ornl.gov/TechResources/Human_Genome/publicat/primer/toc.html


ad