- By
**sora** - Follow User

- 270 Views
- Uploaded on

Download Presentation
## Research Philosophy and Research Methodology

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Tell the difference

Research Philosophy and Research Methodology

Prof. Rudolf Wu

Biology & Chemistry Department

City University of Hong Kong

Outline

- What is research and what is good research?
- Different types of research
- Key processes in conducting research
- General principles in experimental design

Philosophy

- “Study of the truths and principles of the universe, life, and morals, and of human understanding of these” (Oxford dictionary)

What is research?

- Collection of data
- Analysis of data
- Interpretation of data

What is research?

- Is the conduct of polling survey “research”?

Good research: Novelty and Originality

- Never done before
- Advance our knowledge and understanding

Enough? Anything else?

Good research: Generalization

- DNA structure (James Watson & Francis Crick, Nobel Laureate 1962)
- Genetic code for all living organisms
- Explain replication and protein synthesis

Good research: Prediction

- Chemical bonding and forces between molecules (Linus Pauling, Chemistry Nobel Laureate 1954)
- Predict molecular interactions and chemical reactions

Good research: Wide application

- Polymerase Chain Reaction (Kary Mullis, Nobel Laureate in chemistry 1993 )
- Make 1 million copies of DNA within hours
- Wide application in medicine, forensic sciences, molecular biology, genetics, biotechnology
- Form the basis of paleobiology

Good research: Wide application

- Fiber optics (Charles Kao and George Hockham, 1967)
- Light loss in glass fiber due to scattering and absorption of impurities
- Lead to the development of silica fiber of sufficient purity to carry IR for > 100 km, speeding up transmission of signals and lowering energy requirements

Good research: Wide application

- Microsoft Windows
- Search engines (e.g. Google)

Vital elements of good research

- Originality & Novelty
- Able to make generalization
- Able to make prediction
- Wide application

Ask yourself this question:

“How about my own research project?”

Types of Research

- Discovery
- Technique development
- Hypothesis testing

Types of research

- Discovery (Fact finding)
- How many species of

tree are there on Lantau island?

- What are the no. and proportion of yellow hair and white

hair on my dog?

Types of research

1. Discovery (Fishing expedition)

- Which species of fish is most sensitive to cadmium?
- What is the optimal temperature and pH for crystalizing compound A?

Types of research

2.Technique development

- Lower the detection limit of chemicals
- New/more user friendly program for faster calculation
- Improve the resolution of microscope / image recognition/telescope

Types of research

3. Hypothesis testing

- Can Drug A increase blood pressure?
- Can Chemical A increase the reaction rate ?

IMPORTANT: What makes you think that Drug A can increase blood pressure?

because at that time the water temperature should be 20oC, tidal current should be minimal, and

Oct. should be the spawning season of groupers

I’ll go fishing

Sunday 4 pm.

I hope there will

be a lot of fish

Discovery research may also have very significant impact

- Penicillin (Alexander Fleming, Howard Florey & EB Chain, Nobel Laureates, 1945)
- Superconductor (Heike Onnes, who observed no electrical resistance in mercury below 4.2 K)

Types of research

- Discovery (Fact finding)
- How many species of

tree are there on Lantau island?

- What are the no. and proportion of yellow hair and white

hair on my dog?

It would be entirely different and becomes good research if:

- There are good indications that white hair and yellow hair on you dog and your neighbors’ dogs all appear in certain proportion (e.g. 3 White: 1 Yellow)

It becomes good research because

- Once you have verified this fact, you may make generalization and prediction on other dogs in HK, China or even better, worldwide

Worldwide

Hong Kong

China

It would be even better research if you further ask the question:

- Why is this 3:1 ratio found in all dogs?

Remember how Mendel did his

experiment on pea and come up with

the 9:3:3:1 ratio in genetics?

How about your research project?

- Fact finding?
- Fishing expedition?
- Technique development?
- Hypothesis testing?

Conceptual model

Formulate testable hypothesis

Design & carry out experiment to test the hypothesis

Analyse, interpret and compare data

Extrapolation, generalization & prediction

OR

Application

Ask a Question

- Is reproductive output of fish lower when oxygen level is low?

Build a Conceptual model

Low

Oxygen

Reduce

sex hormone

Reduce

feeding

Reduce energy

Available for

reproduction

Reduce

Gonad development

Reduce no. & quality

of egg/sperm,

Fertilization success

Formulate testable hypotheses

- Low oxygen reduces feeding?
- Low oxygen reduces level of sex hormones (testosterone and/or estradiol or gonadotropins)?
- Low oxygen reduces energy channeled to reproduction (A smaller gonad)?
- Low oxygen affects gonad development (less mature sperm and eggs)?
- Low oxygen reduces no. of eggs and sperm, gamete quality (sperm motility, size of egg) and fertilization success?

Hypothesis testing

- In statistics, nothing is ever “proved”
- Hypothesis is only rejected as “unlikely”, and their logical counterpart is therefore accepted

I give up!

Mathematician

If I try for 1 million times, I may be able to get there

5 m

4 m

Statistician

4 m

In memory of Peter Larkin

Hypothesis testing

- Null hypothesis (Ho): x =/=y
- Alternative hypothesis (H1) x =y
- Set a probability level (α) that you are prepared to accept (e.g. 0.05, 0.01)
- Do your experiment
- Perform appropriate statistics test on your experiment data (Calculate probability)
- If p < α: reject Ho (because it is unlikely), accept H1
- If p > α: accept Ho (because it is likely), reject H1

It would be more likely to reject your null hypothesis if

- The effect is larger
- The sample size is larger
- The α value is larger

Experimental Design: Some General Principles

- Control
- Confounding factors
- Signal to noise ratio
- Randomization
- Error control
- Treatment and level of treatment
- Replication & Optimal sample size

Experimental Design: Controls

- Set up proper Control to compare with Treatment (all conditions in your Controls should be exactly as those in your “Treatments”, except without treatment)
- Treatment vs No Treatment
- Before and after

Question

- How do you set up a proper control if you want to test whether a new Drug can lower blood pressure?

Question

- How do you set up a proper control if you want to test whether a new Drug can lower blood pressure?
- Blind (placebo)
- Double blind

Experimental Design: Confounding Factors

- Confounding factors (e.g. sex, size, different batch/origin of materials) may affect your result
- It may be desirable to control these confounding factors in order to minimize their effects (e.g. use the same sex, same size and same batch of materials in your experiment)
- Question: What is the problem in doing this? Is it a good thing or bad thing?

Signal to noise ratio

- Noise (Natural variations)
- Signal (effect that you want to detect)
- You cannot measure any signal if it is less than noise
- If noise is very high:
- You can only detect big difference OR
- You have to increase your sample size OR
- You have to reduce noise

Signal to noise ratio

- If your control= 100+30 mg/L, you can only detect “signal” which causes > 30% change
- This is the reason why we always try to control and standardize size, age, source, reproductive stage, tissue (e.g. right lobe of liver) sampling method etc., to minimize noise so that we can detect signal more easily.
- Noise is generally high in field studies (100-200%), moderate in physiological studies (10%) and low in chemical (2-3%) and physical studies (<0.5%)

Experimental Design: Randomization

- Most (if not all) statistics assumes that samples are comprised of individual observations drawn from the population randomly. Your experimental data may be invalid if this is not so.

Individual observations

μ1

x1

estimate

Sample

Population

Experimental Design: Randomization

- Question: How do you know that you are taking representative samples in your experiment?
- Question: How do ensure your sampling is random?

Random Sampling

- To ensure that All units in the sampling area must have an equal probability of being selected in order to provide an unbiased estimate

Homogenous/uniform

Distribution (rare)

Random Sampling

- If the population/distribution is heterogeneous, random sampling is particularly important in order to get an unbiased estimate

Heterogeneous

distribution

Stratified Random Sampling

- Population may be divided into “strata” if there is clearly defined groups
- Sample each “stratum” independently and randomly
- No. of unit sampled is proportional to the total no. of units in each stratum or the size of the stratum
- Strata should be included as a predictor variable in the model

1

High

Y = ∑ Wh Yh

h=1

Medium

Where: W=proportion of total units in

stratum h, Yh is the mean of stratum h

Low

Systematic Sampling

- Equally spaced
- Spatial: e.g. plot each 10 km along a transect
- Time: e.g. every 10 days
- Interested in changes along a gradient
- Run into risk with an unknown gradient

Experimental Design: Error Control

- In any experiment, it is essential to
- Identify the major sources of variability of your data, and
- bring the variability under control

Have you done this in your experiment?

Error control

Example 1: Estimate no.of intertidal animals using quadrates

- Different shores 10-1000 (100 times)
- Different tidal level 10-500 (50 times)
- Different quadrates 10-50 (5 times)

There is no point to count animals accurately within each quadrate. You should spend more effort in sampling different shores

Error control

Example 2: Compare mercury levels in fish sampled from a polluted site and a clean site

- Sites 50%
- Individuals 7%
- Tissue 200% (Liver =50X in muscle)
- Sample 1%
- You cannot detect any difference if you analyze the whole fish (because data will depends on the size of the liver).
- You can save some effort by pooling the same tissue from different fish for Hg analysis
- There is little merit to refine your analytical technique or in using high resolution equipment

Error control

You should always concentrate your effort in controlling large error. You may neglect small errors

Consumption= Growth+Respiration+Excretion+feces

100% = 37% + 50% + 3% +10%

There is no need to measure excretion. Instead, spend most of your effort in providing a more accurate estimate on respiration.

Error control

- If you want to compare concentration of mercury in fish from three different sites, error may derive from different:
- Sites
- Water depth
- Species
- Size
- Season
- Individuals
- Tissues

Error control

If you: (a) only afford to do a fixed no. of samples (say 50), and (b) have some idea about the variations associated with each sampling level, hierarchical sampling design can help you to optimize no. of samples amongst the various levels to minimize error and give you the max. power:

- Water depth 12
- Species 20
- Size 5
- Season 2
- Individuals 8
- Tissues 3

Experiment: Effects of nutrients on plant growth

Heater

Put 1-5 times of nutrients in flower pots and measure growth after 1 week

Block 2

Block 3

Block 1

T1

T2

T1

T3

T3

T2

T2

T3

T1

Large

Medium

Small

- Group fish in blocks according to their size (“alikes”)
- Assign treatment (T1-T3) at random to individuals within each block
- This reduces Within SS more than Within df, (-MS within, +F), thus more likely to reject Ho

Double Randomized Block (Latin Square) for

experiments with two obvious sources of variability

Size 1

Size 2

Size 3

Site A

T2

T1

T3

Site B

T3

T1

T2

Site C

T1

T3

T2

Large

Medium

Small

Assign treatment (T1-T3) at random to individuals so that

no treatment appears more than once in each row or column

Double Randomized Block (Latin Square) for experiments with two obvious sources of variability

Column= 1st Blocking factor ; Row=2nd Blocking factors

Treatment= A, B,C,D

Salinity (o/oo)

Temperature (oC)

Level

30

20

10

30

25

20

Replicate

5

5

5

5

5

5

Treatment, Levels of Treatment & ReplicationFactorial Design

Salinity

Temp

Test 3 Ho by 2 way ANOVA all in one go:

salinity, temp and interactions (salinity x temp)

Levels of treatment

- Do you really need different levels of treatments? (Are you really interested in finding out the correlation between X and Y? predicting Y from X?)
- How many levels of treatments are required? (at least 4-5 for correlation/regression)

Y= aX + b (r=0.982)

Y

X

Experimental Design: Replication

- As the no. of individual observations (no. of replicates) increases X1 and σ1will get closer to μ1 and S 1

Individual observations

μ1

x1

estimate

σ1

S1

Sample

Population

Replication

- IMPORTANT: NEVER sacrifice no. of replicates: This may make your experimental results invalid

Looking at my data, I think perhaps Chemical A may possibly have some effect on growth in some cases --- but I am not sure!

Replication: Too many or too few?

- No. of replicates depends on:
- Variance among treatments (SS among, signal)
- Variance within treatment (SS within, noise)
- How large a difference you want to detect

No. of replicates

Length of fish (cm) from different sites

Site A Site B Site C Site D

25 30 32 12

67 12 35 18

18 28 32 16

35 20 35 14

24 22 33 17

Question: Which data set above (A,B or C,D)

requires a higher no. of replicates?

No. of replicates

Length of fish (cm) from different sites

Site A Site B Site C Site D

25 30 32 12

67 12 35 18

18 28 32 16

35 20 35 14

24 22 33 17

More replicates are required to detect difference between Sites A & B, (than detecting difference between Sites C & D) because variance (within) of A,B is larger than variance (between) of A,B

No. of replicates depends on

- Effect size: the magnitude of the effect you want to detect (nothing to do with statistics: vary between studies, depends on cost-effectiveness, scientific significance and your professional judgment)
- How variable is your data (σ)
- Level of statistical significance (α)
- What statistical test you use

Optimal sample size

X 1 – X 2

t =

S 1 S 2

+

n 1 n 2

Where:

X= mean; S=variance; n=no. of replicates

We will reject Ho if t calculate > t tabulate

Optimal sample size

d 2 t2tab Sc2

2Sc d2

n

Where:

d= difference that you want to look for, Sc=common variance; n=common sample size

n >

t =

Optimal sample size

Let’s try n=26

2 t2 (df=50, p=0.05) (3.16)

26>

12

26 > 25.6 (YES!!)

Therefore n < 26 is not enough,

n > 26 is waste of time and effort

Optimal sample size

- For any large scale experiment, it pays to conduct some preliminary experiment to estimate the common variance beforehand
- You have to decide on:
- how large a difference that you want to look for
- how many samples that you can afford to do

Rule of thumb for sample size

- If you need a very large sample size, you may well be looking for a difference that has trivial scientific significance
- df < 5: only large difference can be detected
- df>30: further increase in sample size probably won’t help
- Fewer replicates are required in factorial design experiments

Type I error & Type II error

Power= (1- ß ): The probability of correctly rejecting a

hypothesis when it is false (i.e. detecting a real effect )

Power Analysis

- For a given effect size and sample size, as αis decreased power is also decreased.
- By reducing α (say, from .10 to .01), we reduce the likelihood of a Type I error but increase the likelihood of a Type II error.

Power analysis

- If test result is not statistically significant, there are two possibilities:
- there is no real effect (That’s good!)
- your study design could not detect the real effect (That’s bad!!)
- Power analysis helps you to distinguish between these alternatives

Power analysis

- Enables you to:
- determine the probability of getting a statistically significant result given that the effect is real
- work out how small a change that you can detect
- No. of replicates required (given power, variance, significant level, Effect size known)

Power is related to

- How big is the change (ES: Effect Size)
- Sample size (n)
- Variance (σ2)
- Significant level (α)

n

ES α

σ

Power α -----------------------

GPower: Free

Choosing your α and β

- Generally accept α=.05 and β=0.2 ( 80% power)
- This implies that type I error is 4 times as “harmful” as type II error (α : β= .05 : 0 .2): No basis at all !!!
- You should strike a balance between α and β to suit your need, e.g.:
- In screening a new drug, we should set α=.20 and power at 95%, to ensure that a potentially useful drug is not overlooked.
- In studying side effects of a drug, we should set α=.01 while keeping power at 95%, to better detect harmful effect

Want More?

- Copy of this presentation can be obtained from:
- “What’s New” in http://www.cityu.edu.hk/bch/merit/
- Help for experimental design and data analysis:
- Statistic Consulting Unit (Director: Prof. YV Hui), Faculty of Business
- Experimental design course for RS (MS Dept.)

Download Presentation

Connecting to Server..