Quantitative Research Methods for PhD Students

The following lecture has been approved for Post Graduate University Students This lecture may contain information, ideas, concepts and discursive anecdotes that may be thought provoking and challenging It is not intended for the content or delivery to cause offence Issues raised in the lecture may require the viewer to engage in further thought, insight, reflection, critical evaluation, reading, independent study, watching more TV, or listening to Radio 4.

Quantitative Research Methods for PhD students Prof Craig Jackson Head of Psychology Division Faculty of Education Law & Social Sciences Birmingham City University craig.jackson@bcu.ac.uk

Keep it simple “Some people hate the very name of statistics but.....their power of dealing with complicated phenomena is extraordinary. They are the only tools by which an opening can be cut through the formidable thicket of difficulties that bars the path of those who pursue the science of man.” Sir Francis Galton, 1889

Part 1 Probability, Error & Chance (crack these and research is easy)

Curse of probability Few subjects more counter-intuitive than probability Understanding this is essential “Probability is common sense reduced to calculation” Pierre Simon Laplace “{statistics} are the only tools by which an opening can be cut through the formidable thicket of difficulties that bars the path of those who pursue the science of man." Sir Francis Galton

UK National Lottery 1994 Choose 6 numbers between 1 and 49 Jackpot approx. £8 million for all 6 numbers Smaller prizes for 5 numbers, 4 numbers, and 3 numbers Week 1 - Nobody won Week 2 - Rollover Week 2 - Factory worker in Bradford won £17,880,003 using 26, 35, 38, 43, 47, 49 LOTTERY FEVER STRUCK THE UK! Insurance company protection “Out, thou strumpet, Lady Fortune”

UK National Lottery Behaviour Buying 13,983,816 tickets = a win If only winner = £6 million loss If shared winner = lose more Rollover If only winner, possibly 14th Jan 1995 Rollover of £16,292,830 Shared between 133 people who chose no.s 7,17,23,32,38,42 If everyone selected numbers at random, only 4 should have picked this combination some curious human psychology at work

UK National Lottery Behaviour Rule 1 Win a fortune Only bet when there is a rollover (Rollover Paradox) Rule 2 Never bet on numbers that other people will choose. Avoid numbers under 31 – birthday punters + amateur gamblers especially avoid 3, 7, 17 Do use “4” and “13” “Stupid” combinations are better e.g. “34,35,36,37,38,39”

3 13 15 19 21 25 28 31 38 41 44 49 47 18 39 Probability is always ahead UK national lottery Draw no. 631 Wed 9th Jan 2002 Number Rack 16 20 28 29 41 45 www.llednulb.demon.co.uk

UK lottery ball frequency Draw no. 631 Wed 9th Jan 2002 Frequency Ball No.

Heads 1000 2000 3000 4000 5000 Tails Counter intuity 4000 flips of a Euro coin Lands on “heads” 2780 times (68%) Evidence of an unfair coin?

Probability Basics Expressed as “P” or “p” Decimal measure of the likelihood of something happening P ranges from 0 through to 1 Certain events, P = 1 Impossible events P = 0 Equally likely events P = 0.5 java applet site demonstrations www.mste.uiuc.edu introductory article on probability Cohen, J & Stewart I (1998) That’s amazing isn’t it? New scientist, 17 Jan. pp24-28

Combining Probabilities Study 1. Drug x is more effective than a placebo in male patients Study 2. Drug x is more effective than a placebo in female patients Study 3. (Combining the data from study 1 & 2) Drug x is less effective than a placebo in all patients

Basic Scientific Methodology VARIABLESIV DV Controlled SAMPLINGSkewed, Methods, Bias SUBJECTSIndependent, Matched, Repeated PROBABILITY P values ERRORSType 1 and Type 2 SENSITIVITY Tweaking the methodology

Types of Error #1 CONSTANT ERRORS lack of control poor variable measurement wrong tools for measuring the variable(s) HOW TO REMOVE / CONTROL CONSTANT ERRORS redefine troublesome variables control troublesome variables control measurement of variables

Everything has errors Werner Heisenberg Science involves proving changes in dependent variables are due to (manipulation of) particular independent variables Need to prove random luck alone has not produced changes in the dependent variables that were observed Heisenberg’s uncertainty principle (1927) is an eternal problem for researchers Cannot objectively measure a phenomenon without effecting the phenomenon in some way. e.g. scanning electron microscopes

Types of Error #2 RANDOM ERRORS natural fluctuation of the universe natural blips occurring in our variables and data little can be done about them universe is a “random” and chaotic place RANDOM ERRORS ARE HERE TO STAY scientific methods have to take account of this random errors cancel themselves out with a random sample Q.E.D the need for a truly random sample

P The Meaning of P World is chaotic Need to know what causes the observed results in data Random luck / natural flux, or the IV ? Use of an “arbitrary” figure (95% certainty) to let us decide THE P VALUE IN SCIENTIFIC TERMS A measure of likelihood of error in our results The likelihood of the DV being changed by random errors alone, and not the IV

The Meaning of P Statistical software gives a p value Has calculated the likelihood of such results happening by chance < 5% and it can be assumed that such results have not occurred by chance “P > 0.05” results are likely to have been derived from random or constant errors (or both) and the IV was unlikely to have had any effect on the DV. NON-SIGNIFICANT i.e. something else changed the DV

The Meaning of P “P = 0.05” or “P <0.05” results are unlikely to have derived from random or constant errors, and the IV can be held responsible for the changes in the DV. SIGNIFICANT Repeating experiments is the only sure way of establishing if this is really true e.g. “The mean age of males in the group (n=64) was 45 years (±3) and the mean age of females (n=59) was 37 years (±5); P=0.05 and therefore males were significantly older than females”.

Errors continued. . . . . TYPE 1 ERRORS Claim that the IV produces an effect on the DV when it did not A false positive TYPE 2 ERRORS Claim that the IV did not produce an effect on the DV, when in fact it did A false negative

Part 2 Data Considerations

How Many Make a Sample? “8 out of 10 owners who expressed a preference, said their cats preferred it.” How confident can we be about such statistics? 8 out of 10? 80 out of 100? 800 out of 1000? 80,000 out of 100,000?

Types of Data / Variables ContinuousDiscrete BP Children Height Age last birthday Weight colds in last year Age OrdinalNominal Grade of condition Sex Positions 1st 2nd 3rd Hair colour “Better- Same-Worse” Blood group Height groups Eye colour Age groups

Conversion & Re-classification Easier to summarise Ordinal / Nominal data Cut-off Points (who decides this?) Allows Continuous variables to be changed into Nominal variables BP > 90mm Hg = Hypertensive BP =< 90mm Hg = Normotensive Easier clinical decisions Categorisation reduces quality of data Statistical tests may be more “sensational” Good for summaries Bad for “accuracy” BMI Obese vs Underweight

Dispersion Range Spread of data Mean Arithmetic average Median Location Mode Frequency SD Spread of data about the mean Range 50-112 mmHg Mean 82mmHg Median 82mmHg Mode 82mmHg SD ± 10mmHg

26 25 24 23 22 21 20 Multiple Measurement of small sample 25 cell clusters 22 cell clusters 24 cell clusters 21 cell clusters Total = 92 cell clusters Mean = 23 cell clusters SD = 1.8 cell clusters

Small samples spoil research N Age IQ 1 20 100 2 20 100 3 20 100 4 20 100 5 20 100 6 20 100 7 20 100 8 20 100 9 20 100 10 20 100 Total 200 1000 Mean 20 100 SD 0 0 N Age IQ 1 18 100 2 20 110 3 22 119 4 24 101 5 26 105 6 21 113 7 19 120 8 25 119 9 20 114 10 21 101 Total 216 1102 Mean 21.6 110.2 SD ± 4.2 ± 19.2 N Age IQ 1 18 100 2 20 110 3 22 119 4 24 101 5 26 105 6 21 113 7 19 120 8 25 119 9 20 114 10 45 156 Total 240 1157 Mean 24 115.7 SD ± 8.5 ± 30.2

Presentation of data Table of means Exposed Controls T P n=197 n=178 Age45.5 48.9 2.19 0.07 (yrs) ( 9.4) ( 7.3) I.Q 105 99 1.78 0.12 ( 10.8) ( 8.7) Speed 115.1 94.7 3.76 0.04 (ms) ( 13.4) ( 12.4)

Correlation of 1 Maximal - any value of one variable precisely determines the other. Perfect +ve Correlation of -1 Any value of one variable precisely determines the other, but in an opposite direction to a correlation of 1. As one value increases, the other decreases. Perfect -ve Correlation of 0 - No relationship between the variables. Totally independent of each other. “Nothing” Correlation of 0.5 - Only a slight relationship between the variables i.e half of the variables can be predicted by the other, the other half can’t. Medium +ve Correlation and Association Correlation is a numerical expression between 1 and -1 (extending through all points in between). Properly called the Correlation Coefficient. A decimal measure of association (not necessarily causation) between variables Correlations between 0 and 0.3 are weak Correlations between 0.4 and 0.7 are moderate Correlations between 0.8 and 1 are strong

Correlation and Association With a scatter diagram, each individual observation becomes a point on the scatter plot, based on two co-ordinates, measured on the abscissa and the ordinate ordinate abscissa Two perpendicular lines are drawn through the medians - dividing the plot into quadrants Each quadrant should outlie 25% of all observations

Part 3 Research Design

It all depends on the size of the needle

Background on Surveys • Large-scale • Quantitative • Can be descriptive • (“2% of women think they are beautiful”) • Can be inferential • (“Significantly more single women think they’re beautiful than married women do”) • Done with a sample of patients, respondents, consumers, or professionals • Differences between any groups assessed with hypothesis testing • Important that sample size must be large enough to detect any such difference if it truly exists

Importance of Sample Size • “Forgotten” in many studies • Little consideration given • Appropriate sample size needed to confirm / refute hypotheses • Small samples far too small to detect anything but the grossest difference • Non-significant results are reported as “significant” – Type 2 error • Too large a sample – unnecessary waste of (clinical) resources • Ethical considerations – waste of patient time, inconvenience, discomfort • Essential to assess optimal sample size before starting investigation

Qualitative studies need to sample wisely too… • Asian GPs’ attitudes to ANP • Objective: • To determine attitudes to ANP among Asian doctors in East Birmingham PCT • Method: • Send invitation to 55 Asian GPs (Approx 47% of East Birmingham PCT) • Intends to interview (30mins) with first 20 GPs who respond • Sample would be 36% of Asian GPs – and only 17% of GPs in PCT • Severely Biased Research (and ethically dodgy too)

The POPULATION POPULATIONS Sampling a Population Process of selecting units (e.g. people, organisations) from a population Generalise results to the population First question should be… Who do you want to generalize findings to ?

Sampling a Population A POPULATION REPRESENTATIVE SAMPLE (theoretical) ACCESSIBLE SAMPLE (actual) Are this lot are REPRESENTATIVE of the POPULATION ?

Types of Sampling CONSCRIPTIVE sampling Ethically unsound Bias QUOTA SAMPLING sampling Favourite of ICM and MORI Quotas of the population Efficient Flaw potential RANDOM sampling Theoretically ideal Costly Time-consuming All elements of the population OPPORTUNISTIC sampling Desperate measure Take any subject available Cheap Fast Bias

RANDOM sampling OPPORTUNISTIC sampling CONSCRIPTIVE sampling QUOTA sampling Distributions N of population 5’6” 5’7” 5’8” 5’9” 5’10” 5’11” 6’ 6’1” 6’2” 6’3” 6’4” Height

Population size Relative population size Acceptable sample size General Pop Working Pop Specific Pop Rare Pop Specificity and the acceptable N Jackson’s paradox As study populations become smaller, acceptable study sample sizes reduce

Specificity and the acceptable N Student Pop N indepth I.D Forces yachting training schools 300 E.M Companies using stress counselling 150 S.M Divers and ear barotrauma 142 N.O Solvent exposure in Myanmar 80 V.W Routine flu vaccinations 900 A.F Dermatitis in hairdressers 102 S.M O.H needs of NHS staff 23 yes T.R NIHL in student employees 14 yes I.C Blood tests in British Army pilots 408 O.Y Upstream oil company deaths 161 A.A Renal colic in flight deck crew 254 A.C Hepatitis B in army regulars and territorials 476

Selection Bias Sampling properly is Crucial Samples may be askew Specialist publications attract a specialist response group Exists a self-selection bias of those with special interests Controversial topics, or litigious areas Gulf War Syndrome A&E Violence C dif Call Centres Depleted Uranium Weaponry THIS IS AN INHERENT PROBLEM WITH HEALTH RESEARCH COMBAT IT WITH LARGE SAMPLES AND CLEVER METHODOLOGY Organophosphate Pesticides Stress Telecomms

POPULATIONS Can be mundane or extraordinary SAMPLE Must be representative INTERNALY VALIDITY OF SAMPLE Sometimes validity is more important than generalizability SELECTION PROCEDURES Random Opportunistic Conscriptive Quota Sampling Keywords

Sampling Keywords THEORETICAL Developing, exploring, and testing ideas EMPIRICAL Based on observations and measurements of reality NOMOTHETIC Rules pertaining to the general case (nomos - Greek) PROBABILISTIC Based on probabilities CAUSAL How causes (treatments) effect the outcomes

Exposed Controls T P n = 5 n = 5 Age 25.2 (sd 2.7) 26.4 (sd 2) -.77 .46 Psych 16.8 (sd 4.7) 14.8 (sd 4.9) .65 .53 Example 1 - Independent Design Workers exposed to pesticide versus controls (not exposed to pesticide) Independent T test

Exposed Controls T P n = 5 n = 5 Age 30.8 (sd 7.6) 30.8 (sd 7.6) Psych 13.8 (sd 2.1) 19.8 (sd 4.5) -4.8 .008 Example 2 - Matched Design Workers exposed to pesticide versus controls not exposed to pesticide Paired Samples T test

Pre Post T P n = 10 n = 10 Psych 14.1 (sd 5.7) 19.9 (sd 4.2) 2.5 .02 Example 3 - Repeated Design Workers before and after exposure to pesticide Independent T test N numbers doubled from independent methods Repeated subjects is efficient

Sampling & Deployment RANDOM SAMPLING Selecting a sample from the POPULATION Related to the EXTERNAL VALIDITY of the research, Related to the GENERALIZABILITY of the findings to the POPULATION RANDOM ASSIGNMENT How to assign the sample into different treatments or groups Related to the INTERNAL VALIDITY of the research Ensures groups are similar (EQUIVALENT) to each other prior to TREATMENT Both RANDOM SAMPLING and RANDOM ASSIGNMENT can be used together, or singularly, or not all… Waste of time randomly sampling but not randomly allocating Having a choice in this matter is a luxury

Quantitative Research Methods for PhD Students

Quantitative Research Methods for PhD Students

Presentation Transcript

THE FOLLOWING LECTURE HAS BEEN APPROVED FOR ALL STUDENTS BY BIRMINGHAM CITY UNIVERSITY

The following lecture has been approved for University Undergraduate Students

THE FOLLOWING LECTURE HAS BEEN APPROVED FOR ALL STUDENTS BY BIRMINGHAM CITY UNIVERSITY

The following lecture has been approved for University Undergraduate Students

THE FOLLOWING LECTURE HAS BEEN APPROVED FOR ALL STUDENTS BY BIRMINGHAM CITY UNIVERSITY

THE FOLLOWING LECTURE HAS BEEN APPROVED FOR ALL STUDENTS BY BIRMINGHAM CITY UNIVERSITY

The following lecture has been approved for University Undergraduate Students