210 likes | 768 Views
Qualitative data Statistical Analysis. Xin Zhenhui Department of Preventive Medicine May 15 th , 2014. Main Content. Qualitative Data (E numeration Data ) Statistical Description Statistical Inference. Qualitative Data (E numeration Data ).
E N D
Qualitative data Statistical Analysis Xin Zhenhui Department of Preventive Medicine May 15th, 2014
Main Content Qualitative Data (Enumeration Data) Statistical Description Statistical Inference
Qualitative Data (Enumeration Data) • Qualitative Data has no quantitative interpretation, and they can only be classified into categories. • Example: • nationality(China, America, others) • Racial-ethnic group (white, black, others) • Sex(male or female)
Statistical Description Numeric Description Absolute number: The number of observations that fall in that category. Relative number: The proportion of the total number of observation that fall in that category. Rate Constituent ratio Ratio When describing qualitative observations, we define the categories in such a way that each observations can fall in one and only one category. The data set is then described by giving the number of observations (Absolute number), or the proportion of the total number of observations that fall in each of the categories (Relative number).
e.g.1 Relative number Absolute number
Graphical presentation Bar graphs Pie charts percentage bars
Statistical Inference A point estimate of a parameter is a statistic, a single value computed from the observation in a sample that is used to estimate the value of the target parameter. As we need to be able to state how close our estimate is likely to be to the true value of the population, the interval estimation is necessary. A confidence Interval for a parameter is an interval of numbers within which we expect the true value of the population parameter to be contained. The endpoints of the interval are computed based on sample information. Populations are characterized by parameters, and that inferences about parameter values are based on statistics computed from the information in a sample selected from the population of interest. How to estimate the parameter based on a sample from a single population? • Parameter estimation • Point estimate • Interval estimation(confidence Interval)
How to estimate if is it different between two population parameter based on two/more samples from? • Hypothesis test • U test • Chi-square test U test: The sample size must be sufficiently large to guarantee approximate normality of the sampling distribution of the sample proportion, p.
Chi-square test Chi-square is a statistical test mainly used to compare observed data with expected data according to a specific hypothesis. Chi Square compares the counts of categorical responses between two (or more) independent groups. (note: Chi square tests can only be used on actual numbers and not on percentages, proportions, means, etc.)
contingency tables There are several types of chi square tests depending on the way the data was collected and the hypothesis being tested. A contingency tableis to determine whether a dependence exists among the different qualitative variables. useful in situations involving multiple population proportions Used to classify sample observations according to two or more characteristics Table 1. General notation for a contingency table
1、2×2 contingency table χ2test 11 两组人群尿棕色素阳性率比较
3、R×C contingency table χ2test 公式 13 v=(R-1)(C-1) 应用:1)多个样本率的比较:
2)多个构成比的比较 χ2=297.59 ν=(3-1)(4-1)=6 P<0.005 14 三个不同地区血型构成比比较
15 3)分类资料的关联性检验——列联表的相关分析 问:两种血型系统之间是否有关联?
针刺不同穴位的镇痛效果 问题:若上题用2检验,其结果说明什么? 2检验只能反映其构成比有无差异,不能比较效应的平均水平 16 注意: 例1: 资料特点:行单向有序(分组变量无序,指标变量有序),应用秩和检验
例2:60只大鼠肿瘤发生情况 资料特点:列单向有序(分组变量有序,指标变量无序),应用线性趋势检验 目的:检验两有分类变量间是否存在线性变化趋势 17
例3:两位医师独立检查100例可疑视网膜病病例结果的比较例3:两位医师独立检查100例可疑视网膜病病例结果的比较 资料特点:双向有序,一个变量。 目的:检验两医师的诊断结果是否一致——一致性检验(kappa) 18
19 例 4 资料特点:两个变量,双向有序。 目的:研究二者之间是否存在一定关系(年龄与疗效、病程与疗效、疗程与疗效是否有关及关系密切程度),利用等级相关分析或线性趋势性检验
R×C表 χ2检验应用小结 1.双向无序R×C表 1)若进行多个样本率或构成比的比较,可用R×C表χ2检验; 2)若分析两个分类变量之间有无关联性以及密切程度时,可用R×C表χ2检验以及Pearson列联系数进行分析。 2.单向有序R×C表 1)分组变量有序,指标变量(效应)无序,可用R×C表χ2检验,线性趋势检验; 2)指标变量(效应)有序,分组变量无序,可用秩和检验。 3.双向有序R×C表 1)R×C表中两分类变量皆为有序且属性相同,宜用一致性检验; 2)R×C表中两分类变量皆为有序且属性不同:A.若分析两个分类变量之间有无相关关系,宜用等级相关分析;B.若分析两有序分类变量间是否存在线性变化趋势用线性趋势性检验 20