Gene expression analysis and transcriptomics daniel hurley
This presentation is the property of its rightful owner.
Sponsored Links
1 / 42

Gene expression analysis and transcriptomics Daniel Hurley PowerPoint PPT Presentation


  • 106 Views
  • Uploaded on
  • Presentation posted in: General

Gene expression analysis and transcriptomics Daniel Hurley. What are we going to talk about?. Understanding the core principles and ‘root hypothesis’ of transcriptomics Choosing between different technologies How to design an experiment How to make sense of the data.

Download Presentation

Gene expression analysis and transcriptomics Daniel Hurley

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Gene expression analysis and transcriptomics daniel hurley

Gene expression analysisand transcriptomicsDaniel Hurley


What are we going to talk about

What are we going to talk about?

  • Understanding the core principles and ‘root hypothesis’ of transcriptomics

  • Choosing between different technologies

  • How to design an experiment

  • How to make sense of the data


Core principles transcriptomics

Core principles: Transcriptomics

  • Transcriptomicsis the study of the nature and abundance of transcribed elements in a population of cells or a tissue

  • ‘Transcribed elements’ are:

But also

ncRNAs (non-coding RNAs)

mRNA

miRNA

siRNA

piRNA

snoRNA

And many more being discovered

tRNA

rRNA


Core principles root hypothesis

Core principles: Root hypothesis

  • Summarising in one statement:

The central dogma suggests that the abundance of transcribed elementsaffects cell behaviour and tissue function. Therefore, we hypothesise that comparing the abundance of transcribed elements between different conditions can tell us something about cell behaviour and tissue function in those conditions.

The central dogma suggests that the abundance of mRNA affects protein activity. Therefore, we hypothesise that comparing the abundance of mRNA between different conditions can tell us something about protein activity in those conditions.


Gene expression analysis and transcriptomics daniel hurley

But


Core principles the omics part

Core principles: the ‘omics’ part

  • This isn’t very ‘omic’ yet

The central dogma suggests that the abundance of mRNA affects protein activity. Therefore, we hypothesise that comparing the abundance of mRNA between different conditions can tell us something about protein activity in those conditions.

So the ‘omics’ part is about large-scale measurements, and exploratory hypotheses


Core principles what can you do with it

Core principles: what can you do with it?

Some answers:

  • Ask questions about relationships between specific genes

  • Identify potential drug targets

  • Learn the transcriptomic ‘signature’ of a condition

  • Make functional hypotheses about an uncharacterised gene

  • Classify conditions according to their ‘signature’ (e.g. disease)


Technology different types of data

Technology: different types of data

  • Realtime RT-PCR – still regarded as ‘gold standard’ by many. Ubiquitous, but labour-intensive, and not really ‘transcriptome-scale’

  • Microarrays – revolutionary in the 1990s, driving an explosion in bioinformatics. Use has plateaued but still common

  • RNAseq – generating count data from high-throughput sequencing of the transcriptome. Perceived as the method of the future (next slides)

‘Transcriptomic’ data can be gathered using a number of methods:

Pretty much everything here also applies to quantitative proteomics and to some extent metabolomics, although we will not discuss them in depth


Technology microarrays vs rnaseq 1

Technology: Microarrays vs. RNAseq (1)

  • Microarrays = old

  • RNAseq = the new hotness

  • BUT it’s not that simple

It’s easy to think that:

Microarrays

RNAseq


Technology microarrays vs rnaseq 2

Technology: Microarrays vs. RNAseq(2)

Microarrays

RNAseq

Only detect transcripts for which there are probes on the array

Measure every assembled transcript

Generally detect only one type of RNA (e.g. mRNA OR miRNA)

Measure mRNA and ncRNA and everything else

Generally do not detect alternative splicing

Detect alternative splicing

Also costs a lot (but getting rapidly cheaper)

Costs a lot

Need to be made specific to a species or condition (e.g. human, mouse, tobacco)

Similar experimental protocol for every sample type

Dynamic range said to be less than RNAseq

Dynamic range said to be greater than microarrays

Mature technology: known, reliable ways to analyse data. No arguments

New technology: no-one is really sure how to analyse the data. Lots of arguments


Technology microarrays vs rnaseq 3

Technology: Microarrays vs. RNAseq (3)

  • So when should you use one or the other for a gene expression experiment?

  • Availability: If you have a well-characterised and popular organism (human, E. coli, mouse, rat, fruit fly, various plant species) for which a commercial microarray exists, it’s an option. Otherwise it’s RNAseq

  • Breadth: If you are think alternative splicing, or ncRNA are important in your biological process, then RNAseq might be a better choice

  • Cost: Per-sample, microarrays are lower-cost than RNAseq for human work. For organisms with a smaller transcriptome, the difference is less clear

  • Complexity: If you don’t want to spend a lot of time (= money) on difficult normalisation and bioinformatics decisions, microarrays may be a better choice. RNAseq bioinformatics is still very new (~20 competing R packages doing the same job!)

  • Futureproofing: If you really want to compare this data with future data, RNAseq is likely to be around longer. On the other hand, there is a huge amount of published microarray data to which you may be able to compare results.


Design how do you do it

Design: how do you do it?

  • Most important, you need a clear and coherent design document. This is important because the cost of repeating experiments is high, and because the data can be bewildering

  • Ask yourself ‘what is the experimental question I am asking’? Good examples:

    • Which transcripts could be differentially expressed between control and treated samples across all replicates, correcting for variance between replicates?

    • Which transcripts could be differentially expressed between each of the 3 combinations of two tissues across all patients, correcting for inter-patient variance?

    • Are there transcripts which represent the tissue type? That is, transcripts which are more similar across patients than we see between different samples from the same patient?

  • A good experimental design document:

    • Sets out the experimental question

    • Defines the conditions that will be compared

    • Defines the types of comparisons that will be done between conditions (e.g. pairwise comparisons looking for differences, or a before-and-after ‘paired’ analysis)


Design variance and replication

Design: variance and replication

  • How many replicates is ‘enough’?

  • The short answer is ‘it depends’

    • On your estimate of effect strength

    • On the signal-to-noise ratio of the detector

    • On the amount of variance within conditions

  • Three observations is the minimum to define a distribution

  • Choose your replication strategy to capture the variance that interests you…

  • …and correct for the variance that doesn’t


What s a condition

What’s a ‘condition’?

  • What do I mean when I say a ‘condition’ in an experimental sense?

  • I mean any state of interest in which we can observe a cell population, tissue or organism.

  • Examples:

Patient A with melanoma

Patient B with melanoma

MTX sensitive cancer cell line

MTX resistant cancer cell line

HeLa cells 24h post-transfection with siRNA against BRCA1

HeLa cells 48h post-transfection with siRNA against BRCA1

Knockout mouse without Gene X

Wild-type mouse with Gene X


Design variance and p values

Design: variance and p-values

  • Precise interpretation of a p-value is complex

  • But it’s uncontroversial (I think!) to say that it’s a proxy measure of the weight of evidence against a null hypothesis

  • Multiple testing hypothesis problem: we are more likely to see what looks like an interesting result due to chance alone

  • Can correct for this using false discovery rate assessment and control

  • The more variance within a condition…

P-values capture this intuition in a numeric and rankable form.

  • The less convincing a result (= less evidence against the null hypothesis)


Data handling the output

Data: handling the output

  • Simple fold-change; not recommended – why?

  • LIMMA (R package) is the benchmark for microarrays

  • A raft of packages for RNAseq: EdgeR, deSeq the most common.

You’ve done an experiment, and you get a big bunch of data files. Then what?

Analysis approaches


Data what do the results look like 1

Data: what do the results look like? (1)

Output from a typical differential expression transcriptomic experiment might look something like this:

Transcript information

Model parameters

Hypothesis strength data

Note the sorting, colour-coding and annotation


Data when do you believe the results

Data: when do you believe the results?

  • SkepticalHippo says “multiple hypothesis testing is very important”

  • T-tests work fine for realtime RT-PCR, or a chi-square test, or Fisher’s Exact Test

  • The LIMMA package for microarrays incorporates more sophisticated approaches for modelling difference and adjusting for multiple hypothesis testing

    • The Bonferroni correction is often too conservative

    • Benjamini-Hochberg FDR is a pragmatic approach for exploratory bioinformatics

  • Again, various ways of doing this in RNAseq data, but no one clear approach or piece of software


Data what do the results look like 2

Data: what do the results look like? (2)

If we zoom in on an individual transcript, it might look like this:

But not everything is differentially expressed!


Data what do the results look like 3

Data: what do the results look like? (3)

We can get high-altitude views of the data by using:

Each tool represents the data in a different way, and all tell us something important.


Gene expression analysis and transcriptomics daniel hurley

Data: ranking heatmaps


Core principles what can you do with it1

Core principles: what can you do with it?

Some answers:

  • Ask questions about relationships between specific genes

  • Identify potential drug targets

  • Learn the transcriptomic ‘signature’ of a condition

  • Make functional hypotheses about an uncharacterised gene

  • Classify conditions according to their ‘signature’ (e.g. disease)


Summary what to do

Summary: what to do

  • Spend time clearly defining your experimental question in transcriptomic terms

  • Get advice on technology, experimental design, and research outputs

  • Choose replication and conditions which capture the variance that interests you, and corrects for the variance which doesn’t

  • Be conservative about the number of different questions you ask at once; consider pilot and follow-up experiments

  • Keep returning to your data. Actively look for as many ways as possible to visualise similarity and difference within the data


Gene expression analysis and transcriptomics daniel hurley

Fin

Any questions?


  • Login