School of Education University of Tampere, Finland

Issues in Study Design School of Education University of Tampere, Finland Petri Nokelainen petri.nokelainen@uta.fi

CONTENTSIssues in Study DesignWritingScientificReportsTenQuestions About YOUR Research

Issues in Study Design Scientificresearch Theoretical Empirical AND OR ANDOR E2: Textual / nominal data T2: Innovation T1: Research body E1: Numerical data AND

Issues in Study Design Qual: {T1,T2,E2} {T1,T2} Quan: {E1} ! A: {T1,T2} Theoreticalresearch B: {T1,T2,E1} Empirical research C: {T1,T2,E2} D: {T1,T2,E1,E2}

Description of person’s attitudes, feelings, meanings, knowledge, etc. aboutS. B: {T1,T2,E2} C: {T1,T2,E1} Howmanycases in the data Dhavecertainattributionx? Howmanycases in the data Dhavecertainattributiony? Is there a relationship between x and y? Is that relationshipcausal? Issues in Study Design Research question

Issues in Study Design (Nokelainen, 2008, p. 119)

Issues in Study Design

Issues in Study Design • The left-hand side of the figure shows two main categories of data collection: • Probability sample (PS) and • Non-probability sample (NPS). • Both methods aim to produce a scientific, representative sample from the target population.

Issues in Study Design • According to Jackson (2006), a representative sample is “like” the population. • Thus, we can be confident that the results we find based on the sample also hold for the population. • This is not a problem with PS, which is based on random, stratified or cluster sampling. • In random sampling each member of the population has an equal likelihood of being selected into the sample. • Stratified random sampling allows taking into account different subgroups in the population. • If the population is too large for random sampling of any sort, cluster sampling is applied.

Issues in Study Design • Problems arise with NPS as the individual members of the population do not have an equal likelihood of being selected to be a member of the sample. • The most commonly applied NPS technique is convenience sampling (CS) in which participants are obtained wherever they can be found and wherever is convenient for the researcher (Hair, Anderson, Tatham & Black, 1998).

Issues in Study Design • Why, then, educational scientists use NPS, typically CS? • Simply because it “tends to be less expensive [than RS] and it is easier to generate samples using this technique” (Jackson, 2006, p. 84).

Issues in Study Design • However, on the lower left-hand part of the figure, it is shown that when researcher ensures that the CS is like the population on certain characteristics (location and dispersion descriptive statistics about, for example, age and job title), it becomes a quota sample (QS). • A quota sample is better than a CS as it allows us to ensure that the results we find based on the sample also hold for the population.

Issues in Study Design • The upper part of the figure contains two sections, namely “parametric” and “non-parametric” divided into eight sub-sections (“DNIMMOCS OLD”). • Parametric approach is viable only if • 1) Both the phenomenon modeled and the sample follow normal distribution. • 2) Sample size is large enough (at least 30 observations). • 3) Continuous indicators are used. • 4) Dependencies between the observed variables are linear. • Otherwise non-parametric techniques should be applied.

Issues in Study Design • First, study design (D) is made on the basis of the research question and major goal. • According to de Vaus (2004, p. 9), “… research design is to ensure that the evidence obtained enables us to answer the initial question as unambiguously as possible.”

Issues in Study Design • In order to obtain relevant evidence, we need to specify the type of evidence needed to answer the research question. • More specifically, we need to ask: Given this research question, what type of evidence (data) is needed to answer the question in a convincing way?

Issues in Study Design • Sometimes we proceed with the so-called qualitative designs, sometimes a quantitative orientation is more appropriate, and sometimes we work both qualitatively and quantitatively (mixed-methods research, for a thorough discussion, see Brannen, 2004). • Methodological, conceptual etc. triangulation. • Design research is quite new approach, see Bannan-Ritland (2003).

Issues in Study Design • Experimental design (a.k.a. ‘pretest post-test randomized experiment’) is the most recommended approach, but only possible with a random sample (a.k.a ‘probability sample’) and random assignment (participants are randomly selected for the experimental and control groups). • Research is conducted in a controlled environment (e.g., laboratory) with experiment and control groups (threat to external validity due to artificial environment). • Using experimental design, both reliability and validity are maximized via random sampling and control in the given experiment (de Vaus, 2004).

Random assignment Exp. I Pre Post Random sample Contr. Pre Post - Issues in Study Design

Issues in Study Design Random assignment to groups Pretest Intervention Post-test Experimentalgroup Measurement (X) Treatment Measurement (Y) Control group Measurement (X) No treatment Measurement (Y)

Issues in Study Design • Quasi-experimental design (a.k.a. non-equivalent groups design) resembles experimental design but lacks random assignment (sometimes also random sampling) and controlled research environment. • This type of design is sometimes the only way to do research in certain populations as it minimizes the threats to external validity (natural environments instead of artificial ones). Exp. I Pre Post Random / convenience sample Contr. Pre Post -

Issues in Study Design • The most popular quantitative approach in educational research, correlational design (a.k.a. ‘descriptive study’ or ‘observational study’), allows the use of non-probability sample (a.k.a ‘convenience sample’). • Most correlational designs are missing control, and thus loose some of their scientific power (Jackson, 2006). • Some research journals accept factorial analysis (main and interaction effects, e.g., MANOVA) based on quasi-experimental design. Convenience sample I Pre Post Exp.

Issues in Study Design • Observational studies can further be classified into cross-sectional and longitudinal studies (see Caskie & Willis, 2006). • Longitudinal design includes series of measurements over time. • Change over time, age effect. • Cross-sectional study involves usually one measurement and is thus considerably cheaper and faster to conduct (although producing less controllable and less powerful results). • If there are several measurements, individual participants answers are not connected over time (e.g., due to anonymity). • Causal conclusions are usually out of scope of this research type (ibid.).

Issues in Study Design • Longitudinal design • One sample that remains the samethroughout the study. • Longitudinal study produces more convincing results as it allows the understanding of change in a construct over time and variability and predictors of such change over time (ibid.). • However, it takes naturally more time to carry out and suffers from participant drop-out. Sample Pretest Intervention Post-test Random sample Measurement (X) Treatment Measurement (Y)

Issues in Study Design • Cross-sectional design • Measurement is conducted once (or several times) and the sample varies throughout the study. Sample Pretest Intervention Post-test Convenience or random sample Treatment Measurement (Y) Convenience orrandom sample No treatment Measurement (Y)

Issues in Study Design RANDOM SAMPLING RANDOM SELECTION pretest-posttest randomized experiment Pre Post TEST I RS Pre Post - CONTROL Non-Equivalent Groups Design Pre Post I TEST RS Pre Post - CONTROL Correlational design CS Pre Post I TEST

Issues in Study Design • Why do, then, educational scholars use correlational designs over controlled experiments? • The first answer is simple: Correlational designs are far easier, faster and inexpensive to conduct than experimental designs. • The second answer is more complex as we need to ask if the controlled experiment approach is at all viable method to study educational research questions.

Issues in Study Design • In science and psychology, most areas of interest are quite easily quantifiable and replicable (like, for example, freezing point of chocolate or systolic blood pressure). • However, in educational research we study, for example, topics like ‘pedagogical aspects of digital learning material’ (Nokelainen, 2006) or compare pre-existing characteristics of interest (e.g., gender, age, educational level). • In such situations researchers do apply correlational designs, but still aim to employ different types of data in the analysis with a complementary way (quasi-experimental study).

Issues in Study Design • Case study design is applied in qualitative research. • The aim is to collect information from one or more cases and stydy, describe and explain them through how and why questions. • Cases are represented, for example, by individuals, their communication and experiences. (For thorough discussion, see Flyvbjerg, 2004.)

Issues in Study Design • As a conclusion, Abelson’s (1995) concept of statistics as principled argument becomes useful: • Data analysis should not be pointlessly formal, but instead “ ... it should make an interesting claim; it should tell a story that an informed audience will care about and it should do so by intelligent interpretation of appropriate evidence from empirical measurements or observations” (p. 2).

Issues in Study Design • Second, optimal sample size (N) is divided into two sections in the figure: • Samples that operate in the optimal area (n 30 – 250) for traditional parametric frequentistic techniques (Black, 1993; Tabachnick & Fidell, 1996), such as t-test or exploratory factor analysis, and the samples that fail to do so (n < 30 or n > 250).

Estimation of samplesize • N • Population size. • n • Estimated sample size. • Sampling error (e) • Difference between the true (unknown) value and observed values, if the survey were repeated (=sample collected) numerous times. • Confidence interval • Spread of the observed values that would be seen if the survey were repeated numerous times. • Confidence level • How often the observed values would be within sampling error of the true value if the survey were repeated numerous times. (Murphy & Myors, 1998)

Issues in Study Design • Traditional non-parametric techniques, such as Mann-Whitney U-test, are considered to operate robustly, also with small samples (-> lack of power?). • Bayesian approach, however, is free of such restrictions.

Issues in Study Design • Third, independent observations (IO) are always expected, also in time series analysis.

Issues in Study Design • Controlled experiment designs, when conducted properly, rule out IO violations quite effectively (Martin, 2004), but correlational designs usually lack such control (e.g., to rule out employee’s co-operation when they respond to the survey questions). • On the other hand, some qualitative techniques, like focus group analysis (Macnaghten & Myers, 2004), are heavily based on non-independent observations as informants are asked to talk to each other as an important part of the data collection.

Issues in Study Design • Fourth, parametric techniques assume continuous (c) measurement level (ML) of indicators (i.e., so called ‘quantitative’ variables).

Discrete Discrete 2, .. 2, .. 1 1 0 0 Issues in Study Design PHENOMENON OBSERVATION

Continuous Continuous ∞ ∞ 0 0 Discrete 2, .. 1 0 Issues in Study Design PHENOMENON OBSERVATION

Issues in Study Design Measurements Qualitative Quantitative Discrete Continuous Nominal Ordinal Ordinal Interval Ratio

Issues in Study Design • Non-parametric analysis is based on ordering of values and thus discrete (d) or, when applicable, nominal (n) values are expected (i.e., so called ‘qualitative’ variables). • A respondent’s income level (euros) or age (years or months) is a representative example of the first indicator type. • A Likert scale from 1 to 5 is an example of the second indicator type (ordered discrete values). • Respondent’s gender is an example of the third indicator type (nominal discrete values).

Issues in Study Design • It is important to note that the central limit theorem, discovered by Pierre-Simon Laplace (1749 - 1827), assures an approximate normal distribution for practically all sums of independent random variables. • For example, it allows the use of parametric t-test with binomial or ordinal indicators (as the sample of normally distributed group means are compared, not the indicator values themselves). • Bayesian analysis is based on discrete values, and thus, continuous values must be disceticized (automatically or manually) before the analysis.

Issues in Study Design • Fifth, parametric techniques are technically based on the assumption of the multivariate distribution (MD) that is normal (n) by nature. • Non-parametric techniques expect any shaped similar distributions (s). • This is a great news to anyone who has collected real-life educational science empirical data and checked both univariate and multivariate variable distributions as usually almost all variables violate quite heavily against the normal distribution assumption with small sample sizes (e.g., below n = 100).

Issues in Study Design • Some researchers try to force their indicators to follow multivariate normal distribution by applying various transformation techniques (e.g., logarithmic, square), but with varying success. • The motivation for transformations lies behind the fact that in order to enable parametric analysis (i.e., based on, e.g., normal distribution) the bivariate or multivariate statistical dependencies (S) must be linear (l). • It is important to note that this assumption does not hold for the Bayesian techniques.

School of Education University of Tampere, Finland

School of Education University of Tampere, Finland

Presentation Transcript

Centres of Excellence in University Education in Finland

School of Education University of Tampere, Finland

Tampere, Finland

Seppo Hölttä Higher Education Group / School of Management University of Tampere

Petri Nokelainen petri.nokelainen@uta.fi School of Education University of Tampere Finland

Introducing t he University of Tampere

Research Centre for Vocational Education University of Tampere Finland

Tampere Unit for Human-Computer Interaction University of Tampere

Dr Vuokko Kohtamäki, University of Tampere, Finland 19.9.2011

GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland

Petri Nokelainen University of Tampere Research Centre for Vocational Education Finland

Marjatta Hietala, Academy Professor Department of History University of Tampere, Finland

Mohammed Asaduzzaman, PhD Department of Management Studies, University of Tampere, Finland

Samran Khezri University of Tampere 16.11.2011

Seppo Hölttä seppo.holtta@uta.fi Higher Education Group School of Management University of Tampere

Matti Hyvärinen School of Social Sciences and Humanities , University of Tampere, Finland

Studying at the University of Tampere

TUT – Tampere University of Technology

School of Education University of Brighton

Alpo Värri Institute of Signal Processing, Tampere University of Technology Tampere, Finland

University of Eastern Finland - School of Computing