Slide1 l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 65

ST1232 Statistics in the Life Sciences PowerPoint PPT Presentation


  • 251 Views
  • Uploaded on
  • Presentation posted in: General

ST1232 Statistics in the Life Sciences. YY Teo Associate Professor Saw Swee Hock School of Public Health, NUS Department of Statistics & Applied Probability, NUS Life Sciences Institute, NUS Genome Institute of Singapore, A*STAR. Lesson Structure. 13 weeks of 2 lectures (of 2 hours) per week

Download Presentation

ST1232 Statistics in the Life Sciences

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Slide1 l.jpg

ST1232

Statistics in the Life Sciences

YY TeoAssociate ProfessorSaw Swee Hock School of Public Health, NUSDepartment of Statistics & Applied Probability, NUSLife Sciences Institute, NUSGenome Institute of Singapore, A*STAR


Slide2 l.jpg

Lesson Structure

  • 13 weeks of 2 lectures (of 2 hours) per week

  • Practically, 17-18 lectures planned, newspaper statistics, conferences, etc.

  • Tutorials in computer labs from week 3 onwards (11 weeks of tutorials)

  • Consultation (Fridays 2pm – 3.30pm)

  • 3 assessments:

    • tutorial participation (10%)

    • mid-term quiz (30%)

    • end-of-term exam (60%)


Slide3 l.jpg

Resources

  • Lectures, slides, tutorials

  • Fred Ramsey and Dan Schafer (2001) The Statistical Sleuth. 2nd edition, Duxbury Press

  • Julie Pallant. SPSS Survival Manual: A Step-by-Step Guide to Data Analysis Using SPSS for Windows. 3rd edition, Open University Press

  • http://www.statistics.nus.edu.sg/~statyy/ST1232


Slide4 l.jpg

Tutorials

  • Note the available time slots and sign up at the CORS system: http://www.nus.edu.sg/cors/. The tutorial will be at S16-05-102 (Com lab 2)

  • T1: Mondays (8am – 9am)

  • T2: Mondays (9am – 10am)

  • T3: Tuesdays (8am – 9am)

  • T4: Tuesdays (9am – 10am)

  • T5: Wednesdays (9am – 10am)

  • T6: Wednesdays (10am – 11am)

  • T7: Wednesdays (11am – 12pm)

  • T8: Thursdays (9am – 10am)

  • T9: Thursdays (10am – 11am)

  • T10: Thursdays (11am – 12pm)

  • T11: Fridays (8am – 9am)

  • T12: Fridays (9am – 10am)

  • T13: Fridays (10am – 11am)


Slide5 l.jpg

Medical Statistics

  • Quantitative basis to human diseases and traits

  • Progression from observational science!

  • Statistics and mathematics required for this advancement, from observational to quantitative


Slide6 l.jpg

Statistics in medical research

Identification of risk factors

Disease prevention / treatment

Association with genes and environment

Disease risk modeling and prediction

Pharmaceutical developments / clinical trials

Understand inter-population risks to diseases

Establish population-specific risk architecture

Relevance of international trials and findings

Applications in a multi-ethnic setting

Medical statistics


Slide7 l.jpg

Pregnancy Test Kit

A woman buys a pregnancy test kit, and is interested to find out whether she is pregnant.

One hypothesis in this case (status quo), is that she is not pregnant.

The other hypothesis(hypothesis of interest), is that she is pregnant.

Test kit may show:

+ve: indicating there isevidenceto suggest pregnancy

–ve: indicating lack of evidence to suggest pregnancy


Slide8 l.jpg

Pregnancy Test Kit

The test kit may either be accurate, or inaccurate.

Actually pregnant

Actually not pregnant

Incorrect +ve diagnosis

Correct +ve diagnosis(Sensitivity, or Power)

Test kit shows +ve

Correct –ve diagnosis(Specificity)

Incorrect –ve diagnosis

Test kit shows –ve


Slide9 l.jpg

Sensitivity or Specificity?

  • Objective of the experiment


Slide10 l.jpg

Sensitivity or Specificity?

  • Objective of the experiment

  • HIV diagnostic kit, 99.9% sensitive and 99.5% specific


Slide11 l.jpg

Sensitivity or Specificity?

  • Objective of the experiment

  • HIV diagnostic kit, 99.9% sensitive and 99.5% specific

Correct identification of HIV +ves


Slide12 l.jpg

Sensitivity or Specificity?

  • Objective of the experiment

  • HIV diagnostic kit, 99.9% sensitive and 99.5% specific

Correct identification of HIV -ves

Correct identification of HIV +ves


Slide13 l.jpg

Sensitivity or Specificity?

  • Objective of the experiment

  • HIV diagnostic kit, 99.9% sensitive and 99.5% specific

  • Tests on immigrants, assume 1,001,000 applications each month, of which 1000 are truly HIV-positive


Slide14 l.jpg

Sensitivity or Specificity?

  • Objective of the experiment

  • HIV diagnostic kit, 99.9% sensitive and 99.5% specific

  • Tests on immigrants, assume 1,001,000 applications each month, of which 1000 are truly HIV-positiveHIV +ve x 1000HIV –ve x 1,000,000


Slide15 l.jpg

Sensitivity or Specificity?

  • Objective of the experiment

  • HIV diagnostic kit, 99.9% sensitive and 99.5% specific

  • Tests on immigrants, assume 1,001,000 applications each month, of which 1000 are truly HIV-positiveHIV +ve x 1000HIV –ve x 1,000,000

On average, 999 correctly identified, 1 incorrectly diagnosed as HIV -ve

On average, 995,000correctly identified as HIV -ve, 5000 incorrectly diagnosed as HIV +ve


Slide16 l.jpg

Sensitivity or Specificity?

995,001 identified as HIV –ve in total

5,999 identified as HIV +ve in total

BUT…

Almost 5 in 6 of those identified as HIV +ve are FALSE!

On average, 999 correctly identified, 1 incorrectly diagnosed as HIV -ve

On average, 995,000correctly identified as HIV -ve, 5000 incorrectly diagnosed as HIV +ve


Slide17 l.jpg

Height and Weight


Slide18 l.jpg

Height and Weight


Slide19 l.jpg

Height and Weight


Slide20 l.jpg

Height and Weight


Slide21 l.jpg

Scientific Process


Slide22 l.jpg

Research hypothesis: - What is your scientific question? - What are you trying to achieve?


Slide23 l.jpg

Scientific Process


Slide24 l.jpg

Human Diversity


Slide25 l.jpg

Human Diversity

  • Even within human race, variation exists between people of different ethnicities, cultures and populations

  • Genetic basis to a substantial fraction of such variation


Slide26 l.jpg

Human Diversity

  • Even within human race, variation exists between people of different ethnicities, cultures and populations

  • Genetic basis to a substantial fraction of such variation

  • Observable differences – physical appearances, build, weight


Slide27 l.jpg

Human Diversity

  • Even within human race, variation exists between people of different ethnicities, cultures and populations

  • Genetic basis to a substantial fraction of such variation

  • Observable differences – physical appearances, build, weight

  • Variation in susceptibility to diseases

  • Influenced by evolutionary processes, over many generations

  • Cross-sectional observation of adaptation and natural selection


Slide28 l.jpg

Target population

  • Depends entirely on your research hypothesis!


Slide29 l.jpg

Target population:

- Everyone in Singapore?

- Every female individuals in Singapore?

- Every female individuals of a certain age in Singapore?

- Every femal individuals of a certain age in Singapore, and who could be pregnant?


Slide30 l.jpg

Target population: - Everyone in Singapore?

- Everyone of a certain age in Singapore?

- Everyone of a certain age in NUS?

- Everyone of a certain age from a specific population group in Singapore


Slide31 l.jpg

Target populations

  • Depends entirely on your research hypothesis!

  • Example: Interest to investigate the genetic factors that increase the risk to type 2 diabetes in Chinese adults in Singapore.

  • Target population(s):

    • Every Chinese adult in Singapore that is affected by type 2 diabetes

    • Normal Chinese adults (unaffected by type 2 diabetes) of the same age band

    • Classic case-control design in medical epidemiology.

But, is this sufficient???


Slide32 l.jpg

Samples versus Population

  • Obviously not possible to perform an experiment on every diabetic Chinese adult in Singapore

  • Select a representative set of individuals from the appropriate population to perform the experiment on

  • This set of individuals is known as your samples.

All diabetic Chinese adults in Singapore

Selected samples in research


Slide33 l.jpg

Scientific Process


Slide34 l.jpg

What is your intuition?

  • A pharmaceutical firm is developing a medical drug, that purportedly treats severe headache.

  • During the clinical trials (testing the efficacy and safety of the drug), it was tested on 10 people, of which 7 reported that it worked to reduce headaches, while 3 claimed it had no effect.

  • Another pharma also developed a competing treatment, but tested on 1000 people, of which 704 reported it helped to reduce headaches, while 294 claimed it had no effect, and 2 people claimed their headaches worsen.

Which setting do you think gives you more information about the developed drug? And why?


Slide35 l.jpg

Sample Size Determination

200 cases and 200 controls

RR = 2.5

RR = 1.8

RR = 1.2

1000 cases and 1000 controls

For complex diseases!

4000 cases and 4000 controls

  • Types of effects that can be detected depends entirely on sample sizes.


Slide36 l.jpg

Pregnancy Test Kit

The test kit may either be accurate, or inaccurate.

Actually pregnant

Actually not pregnant

Incorrect +ve diagnosis

Correct +ve diagnosis(Sensitivity, or Power)

Test kit shows +ve

Correct –ve diagnosis(Specificity)

Incorrect –ve diagnosis

Test kit shows –ve


Slide37 l.jpg

Sample Size Determination

  • An issue commonly discussed in medical research!

  • Power calculations, sample size, effect sizes, statistical significance?

Power calculations

Sample size

Effect sizes

Statistical Significance

Recall: Your ability to identify a true pregnancy

Require  evidence, means Power

What level of statistical evidence do you consider “believable”?


Slide38 l.jpg

Scientific Process


Slide39 l.jpg

Sample Selection

  • Simple Random Sample

    • Every sample in the population has an equal chance of being selected (e.g. phonebook sampling)

  • Stratified Sample

    • Every sample in the population belongs uniquely to a specific category (e.g. gender)

  • Cluster Sampling

    • Each cluster has the characteristics of the population, and sampling is performed within the cluster rather than in the population (e.g. diabetic patients in one hospital in Singapore, compared to all diabetic patients in Singapore)

  • Multistage Sampling

    • A combination of different sampling schemes


Slide40 l.jpg

Scientific Process


Slide41 l.jpg

Data exploration and Statistical analysis

  • Exploratory data analysis

  • Probability and Bayes Theorem

  • Theoretical distributions (Uniform, Bernoulli, Binomial, Poisson, Normal)

  • Confidence Interval

  • Hypothesis testing (t-test, ANOVA, test of proportions, Chi-square tests)

  • Non-parametric tests

  • Linear regression and correlation

  • Logistic regression

GIBBERISH?!


Slide42 l.jpg

Data exploration and Statistical analysis

  • Data checking, identifying problems and characteristics

  • Understanding chance and uncertainty

  • How will the data for one attribute behave, in a theoretical framework?

  • Theoretical framework assumes complete information, need to address uncertainties in real data

  • Testing your beliefs, do the data support what you think is true?

  • What happens when the assumptions of the theoretical framework are not valid

  • Modeling relationships between multiple outcomes and a numerical response

  • Ditto, but with a two-state outcome.


Slide43 l.jpg

Data exploration, categorical / numerical outcomes

Data

Model each outcome with a theoretical distribution

Model relationships between different outcomes

Estimation of parameters, quantifying uncertainty

Estimation of parameters, quantifying uncertainty

Hypothesis testing

Linear regression (Numerical response)

Logistic regression (Categorical response)

Parametric tests

(t-tests, ANOVA, test of proportions)

Non-parametric tests

(Wilcoxon, Kruskal-Wallis, rank test)

Confidence intervals, to quantifying uncertainty


Slide44 l.jpg

Scientific Process


Slide45 l.jpg

Statistics – Truths or Lies

  • 21st century – age of information

  • Responsible for driving scientific progress in multiple disciplines

  • Core skills for data analysis

  • Ability and knowledge to ingest and digest information is at a premium


Slide46 l.jpg

Statistics – Truths or Lies


Computers and statistics l.jpg

Computers and Statistics

Computers and Statistics

  • Excel, SPSS, Minitab, Stata, Mathlab, R, etc…

  • RExcel for this course:

  • http://www.stat.nus.edu.sg/~statyy/ST1232/bin/RExcel_installation.docx

  • Advantages

  • Speed, accuracy, ease of data manipulation

  • Easy to produce plots, cross-tabulation tables, summary statistics

  • Disadvantages

  • Inappropriate analysis / use of wrong tests

  • Data dredging


Slide48 l.jpg

Brief introduction to RExcel and SPSS


Slide49 l.jpg

Features

  • RExcel and SPSS – extremely similar in terms of data entry and usage

  • Spreadsheet-based data entry system


Slide52 l.jpg

Link data in Excel to R


Slide55 l.jpg

Features

  • RExcel and SPSS – extremely similar in terms of data entry and usage

  • Spreadsheet-based data entry system

  • Remember: a unique individual/entry per row!

  • Drop-down menu option for data analysis


Slide58 l.jpg

Features

  • RExcel and SPSS – extremely similar in terms of data entry and usage

  • Spreadsheet-based data entry system

  • Remember: a unique individual/entry per row!

  • Drop-down menu option for data analysis

  • While both are extremely intuitive, SPSS is slightly more user-friendly, in terms of defining variables and format of output


Slide61 l.jpg

In RExcel


Slide63 l.jpg

Output is in the R Commander tab


Slide64 l.jpg

Features

  • RExcel and SPSS – extremely similar in terms of data entry and usage

  • Spreadsheet-based data entry system

  • Remember: a unique individual/entry per row!

  • Drop-down menu option for data analysis

  • While both are extremely intuitive, SPSS is slightly more user-friendly, in terms of defining variables and format of output

  • Details will be given in the subsequent lectures

    Important to know the usage and interpretation of both SPSS and RExcel well, examinable and practically important!


Slide65 l.jpg

Reminders

Book your tutorial slots!

Work on your tutorials before going to the classes!


  • Login