slide1
Download
Skip this Video
Download Presentation
人类群体遗传学 基本原理和分析方法

Loading in 2 Seconds...

play fullscreen
1 / 75

人类群体遗传学 基本原理和分析方法 - PowerPoint PPT Presentation


  • 490 Views
  • Uploaded on

中国科学院上海生命科学研究院研究生课程 人类群体遗传学. 人类群体遗传学 基本原理和分析方法. 中科院 - 马普学会计算生物学伙伴研究所. 徐书华 金 力. 第六讲. 人群遗传结构分析 ( I ). 第四讲. 群体遗传学中的基本概念( 4 ) 群体遗传结构 描述群体遗传结构的统计量 Hierarchical F statistics 软件演示 利用 Arlequin 计算人群的 Hierarchical F statistics. 什么是遗传结构?. 从 差异 中发现 结构 ! 遗传多态性在时间上和空间上的不同分布 模式就是 遗传结构 。

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '人类群体遗传学 基本原理和分析方法' - rianna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide2
第六讲

人群遗传结构分析(I)

slide3
第四讲
  • 群体遗传学中的基本概念(4)
  • 群体遗传结构
  • 描述群体遗传结构的统计量
    • Hierarchical F statistics
  • 软件演示
    • 利用Arlequin计算人群的Hierarchical F statistics
slide4
什么是遗传结构?
  • 从差异中发现结构!
  • 遗传多态性在时间上和空间上的不同分布 模式就是遗传结构。
    • 时间:不同时代;

不同世代。

    • 空间:不同地理分布;

同域不同人群;

不同基因组区域。

slide6
为什么研究人类的遗传结构?
  • 人类起源、迁徙、进化历史及前景
  • 现代人群(民族)之间的亲缘关系
  • 复杂疾病的遗传基础和基因定位
    • 癌症
    • 肥胖
    • 哮喘
    • 精神病
    • II型糖尿病
    • 心血管系统疾病
  • 公共卫生保健
  • 个性化用药和个性化治疗
  • 法医学
an example
An example
  • Population structures and association studies
population structures make trouble in association studies
Population structures make trouble in association studies
  • Population stratification in Epidemiology.
  • Analysis of mixed samples having different allele frequencies is a primary concern in human genetics, as it leads to false evidence for allelic association.
odds ratio
Odds ratio

Disease

Exposure

yes

no

total

yes

a

b

a + b

no

c

d

c + d

total

a + c

b + d

a + b + c + d

Odds for case: a/c

Odds for control: b/d

Odds ratio

explanation of or
Explanation of OR
  • OR>1: exposure factors increase the risk of disease; positive association
  • OR<1: exposure factors decrease the risk of disease; negative association
  • OR=1: no association
slide11
Example

Odds for case 50:50 = 1

Odds for control 20:80 = 0.25

Odds ratio = 50:50/20:80 = 1/0.25 = 4

heterogeneity stratification
Subpopulation 1Subpopulation 2

casecontrol casecontrol

exp(+) 5050 100 exp(+) 19 10

exp(-) 450450 900 exp(-) 99891 990

500 500 1,000 100 900 1000

Total Population

case control

exp(+) 5159 110

exp(-) 5491,341 1,890

600 1,400 2,000

51

600

59

1,400

= 8.5%

= 4.2%

Heterogeneity/Stratification

OR=2.02

human migration
Human migration
  • Anatomically modern humans evolve in Africa > 160,000 ybp.
  • Some leave Africa sometime around 75,000 - 55,000 ybp.
  • Replace Neanderthals in Europe and archaic humans around the world.
  • Arrive in Western hemisphere between 34,000 and 18,000 ybp.
  • Multiple migrations in different pre-historic periods, followed by different migrations in historical periods.
note on definitions biological race
Note on Definitions: Biological Race
  • morphology (phenotype)
  • Geographical location
  • Population based (frequency of genes)

Socially Constructed Race: Arbitrarily utilizes

aspects of morphology, geography, culture,

language, religion, etc. in the service of a

social dominance hierarchy.

slide19
描述遗传结构的统计量
  • Hierarchical F statistics
slide20
固定指数
  • 固定指数(F):
  • 如果一个座位上有两个等位基因,Hardy-Weinberg比率的任何偏差可以由参量F来度量,F称为固定指数,则基因型频率可以由下式给出:
  • 由以上第二式可得:
slide21
随机交配(h)情况下杂合子的预期频率

群体(h0)中下杂合子的观察频率

  • 固定指数F可正可负,视情况而定。
  • 可以看出,当h0小于h时,F取正值;当h0大于h时,F取负值。在近亲交配时,杂合子频率的观察值减小,F就取正值。

上式可写成

sub population
亚群体(sub-population)
  • 以上考虑的是一个简单的群体,不论其是否近亲交配。
  • 然而,实际上大多数的自然群体可被再分为许多不同的繁殖单位或亚群体(sub-population),尽管这些群体并不是完全隔离的。这种情况下,研究群体内和群体间的遗传变异就显得十分重要。
slide23
可再分群体中的基因型频率
  • 假定一个群体可分为s个亚群体,每一个亚群体都满足Hardy-Weiberg平衡。设xk为第k个亚群体中等位基因A1的频率,则基因型A1A1,A1A2,A2A2的频率分别为
  • 我们用wk来表示第k个亚群体的相对大小,且总和为1。则A1A1,A1A2,A2A2在整个群体中的频率为:

其中

是亚群体中等位基因频率的均值和方差。

slide24
可再分群体中的固定指数
  • 比较

,因此

我们知道

slide25
Wahlund定律
  • 表明如果一个群体被分为多个交配单位,纯合子的频率要高于Hardy-Weinberg比率。这个性质首先由Wahlund(1928)发现,被称为Wahlund定律,也称Wahlund现象。
  • 当等位基因频率在所有亚群体中一致时,F为0;而当每个亚群体都被固定为某一个等位基因时,F为1。
wahlund
Wahlund现象的启示
  • 群体结构(population structure)的存在!
  • 反之,当F为负值的时候,

杂合子频率比Hardy-Weinberg平衡时预期的要高,意味着杂合优势,某种程度的自然选择发生。

杂合优势与平衡选择(后面“自然选择”章节细谈)

wright s fixation index f st
Wright’s Fixation Index (FST)

Sewall Wright

1889-1988

f statistics
F-statistics
  • Different F-statistics for different scales
    • Individual (I)
    • Subpopulation (S)
    • Total population (T)
  • Those are the traditional scales but in theory there can be no limit to the # of levels of analysis .
  • Originally defined for 2 alleles
  • Extended to >2 alleles as G-statistics
f statistics derived from inbreeding coefficient
F-statistics Derived from inbreeding coefficient
  • FIS
    • inbreeding in individuals relative to subpopulation (Weir and Cockerham’s f)
  • FST
    • inbreeding among subpopulations relative to total population (Weir and Cockerham’s )
  • FIT
    • inbreeding among individuals relative to total population (Weir and Cockerham’s F)
slide32
Remember that inbreeding coefficient, F, is related to loss of heterozygosity

F = 1 – (Ho/He)

  • F-statistics can be expressed in the same way

FIS = 1 – (HI/HS)

FST = 1 – (HS/HT)

FIT = 1 – (HI/HT)

HI= HO averaged across subpopulations

HS= He averaged across subpopulations

HT= He for total population = He

slide33
Deficit of heterozygote

aa

AA

aa

AA

FST = 1 – (HS/HT)

aa

AA

aa

AA

P(A) = p = 1

P(a) = q = 0

p = 0

q = 1

HS = Hewithin subpopulation

HS = 1 - pi2

= 1 - (12 + 02) = 0

HS = 0

Mean HS = 0

HT= He for total population

For total population, p = 0.5 & q = 0.5

HT =1 - pi2 = 1- [(0.5)2 + (0.5)2] = 0.5

FST = 1 – (HS/HT) = 1 - (0/0.5) = 1

slide34
Deficit of homozygote

Aa

Aa

Aa

Aa

FST = 1 – (HS/HT)

Aa

Aa

Aa

Aa

P(A) = p = 0.5

P(a) = q = 0.5

p = 0.5

q = 0.5

Mean HS = 0.5

HS = 1 - pi2 =

= 1 - (0.52 + 0.52) = 0.5

HS = 0.5

HT= He for total population

For total population, p = 0.5 & q = 0.5

HT =1 - pi2 = 1- [(0.5)2 + (0.5)2] = 0.5

FST = 1 – (HS/HT) = 1 - (0.5/0.5) = 0

slide35
P(A) = p = 0.5

P(a) = q = 0.5

p = 0.5

q = 0.5

Mean HS = 0.5

HS = 1 - pi2 =

= 1 - (0.52 + 0.52) = 0.5

HS = 0.5

HT= He for total population

For total population, p = 0.5 & q = 0.5

HT =1 - pi2 = 1- [(0.5)2 + (0.5)2] = 0.5

FST = 1 – (HS/HT) = 1 - (0.5/0.5) = 0

AA

AA

Aa

Aa

FST = 1 – (HS/HT)

Aa

Aa

aa

aa

FST uses expected heterozygosity, not observed heterozygosity!!

f statistics1
F statistics
  • FIS tells us if there is inbreeding within subpopulations by comparing HI and HS:
  • Bars mean that the values are the averages over all the subpopulations that we are considering.
  • So FIS measures whether there is, on average, a deficit of heterozygotes within subpopulations.
f statistics2
F statistics
  • FST is the statistic that tells us how differentiated the subpopulations are. Formally, FST tells us if there is a deficit of heterozygotes in the metapopulation, due to differentiation among subpopulations:
  • Bars mean that the values are the averages over all the subpopulations that we are considering.
f statistics3
F statistics
  • FIT tells us how much population structure has affected the average heterozygosity of individuals within the population:
  • Also (1-FIS) (1-FST) = (1-FIT).
f statistics measure departure from hardy weinberg equilibrium
F-statistics Measure departure from Hardy-Weinberg equilibrium
  • FIS = departure from HW in local subpopulations
  • FST = genetic divergence among subpopulations
  • FIT = total departure from HW including that within and among subpopulations
partitioning of structure
FIT

FIS

FST

Partitioning of structure

Individuals

Subpopulations

Total population

Inbreeding

Wahlund Effect or fragmentation

1 – FIT= (1 – FST)(1 – FIS)

FIT = FIS + FST – (FIS)(FST)

the three f statistics are related to each other
The three F statistics are related to each other
  • FST = (FIT - FIS) / (1 - FIS)
  • FST is always positive
  • FIS is frequently positive, is negative if there is systematic avoidance of inbreeding
  • FIT is positive unless there are not clear subdivisions and there is avoidance
extensions
Extensions
  • Variance of allele frequencies across subpopulations
  • When in HW, Var(q) = 0, therefore FST = 0
  • As Var(q) increases, divergence of subpopulations increases
intuitive meaning of f st
Intuitive meaning of FST
  • The proportion of total genetic variation that is distributed among subpopulations, rather than within subpopulations.
unbiased estimates of f st
Unbiased estimates of FST
  • Unbiased estimates of FST were calculated as described by Weir and Hill 2002.
  • Suppose we have i subpopulations (where i = 1,…, r), we denote sample allele frequency as , and denote the average frequency over samples as
  • and denote the average frequency over samples as
slide48
Then FSTcan be estimated as follows:

Where is the average sample size across samples that also incorporates and corrects for the variance in sample size over subpopulations:

problems with f st
Problems with FST
  • Assumes Infinite Alleles Model (IAM) or K-alleles model with very low mutation rates (not appropriate for microsat data)
  • All alleles differ equally from each other (magnitude of difference between alleles ignored)
  • Does not work well with high heterozygosity
  • Assumes alleles arrive in population via migration rather than mutation
special version for microsatellites
Special version for microsatellites
  • RST (Slatkin 1995)
  • Analogue of FST
  • Assumes Stepwise Mutation Model (mutation model most appropriate for microsats)
  • Allows for high mutation rates
  • Allows differences in magnitude between alleles to be accounted for
  • Where S = average sum of differences in allele sizes in total population, and SW = average sum within populations
which to use for sats
Which to use for sats?
  • FST and RST can differ using same data
  • If loci don’t conform to SMM model, RST will be underestimated
  • If mutation rates are large relative to migration rates, RST is superior
  • Longer divergence times between populations favors RST
  • RST favored under ideal conditions and with large samples
  • FST favored with small samples and when a more conservative estimator is desired
distance measures for microsatellites
Distance measures for microsatellites
  • µ2 (Goldstein et al. 1995): ∑(µx-µy)2/L
    • µx is the mean allele size in population x
    • µy is the mean allele size in population y
    • Summed across all loci and divided by # of loci (L)
    • Allele size expressed as # repeat units
    • Stepwise mutation model (SMM)
    • E(µ2) = 2αt
      • α = mutation rate per generation
      • t = # generations
    • Problems
      • α not constant among different loci
      • Variance very high
      • µsats don’t strictly follow the SMM
distance measures for microsatellites1
Distance measures for microsatellites
  • DSA (Bowcock et al., 1994, Nature)
      • SA = shared alleles
      • PSA = (∑S)/2U
        • Where S = # shared alleles at a locus between 2 populations
        • U = # loci
      • DSA = 1 –PSA
      • IAM
      • May be superior to µ2 for closely related populations, even for µsat data
degree of f statistics
Degree of F statistics

According to Sewall Wright:

  • FST ranges from 0-1
  • 0 = no genetic differentiation; panmixia
  • 0.00–0.05 = little genetic diff
  • 0.05-0.15 = moderate genetic diff
  • 0.15-0.25 = great genetic diff
  • 0.25-1.00 = very great genetic diff
  • 1 = complete genetic differentiation
calculate hierarchical f st by arlequin
Calculate hierarchical FST by Arlequin

Chromosome 21 SNP data

#Asian

Group ={

"CHB"

"JPT"

"CHU"

"HMO"

"AVA"

}

#European

Group ={

"CEU"

"NEuro"

"Basque"

"Italian"

"Hungarian"

}

#African

Group ={

"YRI"

}

meta population structure drift within populations migration between populations
Meta-population structure: Drift within populations, migration between populations

p=0.7

N=15

m=.02

m=.07

p=0.4

N=70

p=0.6

N=50

m=.01

p=0.3

N=10

p=0.5

N=150

p=1.0

N=20

drift and migration have opposite effects
Drift and migration have opposite effects
  • Drift makes subpopulations differerent
  • Migration homogenizes subpopulations
useful for estimating gene flow
Useful for estimating gene flow
  • If you know FST and Ne, you can calculate m
slide60
In addition, very little migration is required to prevent substantial genetic divergence among subpopulations resulting from random genetic drift
this can be shown by the following equation
This can be shown by the following equation:

1

Fst ~

4Nm + 1

Equilibrium

fixation

index

# of migrants/generation

estimation of gene flow
Estimation of gene flow
  • Indirect (based on FST)
      • Nm = (1 - FST)/4FST
      • Some drawbacks but often acceptable if limitations are considered
      • High variance at low values of FST
problems with f st1
Problems with FST
  • Assumptions of model not realistic
    • All populations have same N
    • Nm is equal among all demes
    • Mutations do not occur
    • Markers are truly neutral
    • Selection not operating (local adaptation causes overestimate of FST estimate and underestiamte of Nm; uniform selection underestimates FST and overestimates Nm
    • Recent isolation of demes won’t be detected
  • Related to gene flow on evolutionary time scales
  • Not appropariate for ecological time scales
    • Ignores ongoing dynamics in allele frequencies (rare alleles)
slide64
Best in situations where
    • Spatial scale small (island model holds and spatially varying selection unlikely)
    • Migration rates high (rapid attainment of genetic equilibrium)
    • Sample sizes and number of loci used are large - accuracy of estimates
    • Long-term estimate of Nem “averaged” over many generations desired
    • Not useful for short-term nonequilibrium situations e.g. recently fragmented, rapidly declining populations
population differentiation under migration and drift
Population differentiation under migration and drift
  • If Ne and m are small, FST is large
  • If Nem < 1 then
  • FST > 0.2
  • “If there is > 1 migrant per generation, populations do not diverge much.”
slide66
Fst

Fixation

Index

0 1 2 3 4 5 6 7 8 9 10

# migrants/generation

Nm

ompg rule of thumb
OMPG rule of thumb
  • From this analysis emerged a genetic rule of thumb that one migrant individual per local population per generation (OMPG) is sufficient to obscure any disruptive effects of drift.
slide68
Biologists concerned with population insularization caused by habitat fragmentation began advocating the application of this principle for conservation purposes
examples
Examples:

1. Mace and Lande (1991) used the OMPG rule as a criterion in defining threatened species categories of the World Conservation Union

2. In the U.S. nearly every recovery plan that considers genetic issues and insularization applies the OMPG rule

3. Widely applied by managers charged with initiating connectivity between isolated populations - e.g., reduce concerns about inbreeding depression

important aspects of ompg
Important Aspects of OMPG

Unlikely that polymorphism will be lost within subpopulations - unlikely to reach equilibrium gene frequencies where one allele or the other is lost or “fixed”

Provides a desirable balance between drift and gene flow by preventing the loss of alleles and minimizing loss of heterozygosity within subpopulations but allowing genetic divergence to exist among subpopulations

how much gene flow might be too much

How much gene flow might be too much?

Difficult to answer without extensive genetic and demographic information on the population

slide72
Frankel and Soule (1981) proposed an upper limit of 5 migrants per generationMills and Allendorf (1996) suggest that a minimum of 1 and a maximum of 10 migrants per generation would be the appropriate general rule of thumb for genetic purposes
slide73
Mutation has the same effect

Fst

Fixation

Index

0 1 2 3 4 5 6 7 8 9 10

# mutation/generation

Nu

slide74
常用软件
  • Arlequin 3.01
    • http://anthro.unige.ch/software/arlequin/
slide75
练习
  • 利用HapMap数据进行群体结构分析;
    • http://www.hapmap.org
ad