- 490 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about '人类群体遗传学 基本原理和分析方法' - rianna

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### How much gene flow might be too much?

第六讲

人群遗传结构分析（I）

第四讲

- 群体遗传学中的基本概念（4）
- 群体遗传结构
- 描述群体遗传结构的统计量
- Hierarchical F statistics
- 软件演示
- 利用Arlequin计算人群的Hierarchical F statistics

为什么研究人类的遗传结构？

- 人类起源、迁徙、进化历史及前景
- 现代人群（民族）之间的亲缘关系
- 复杂疾病的遗传基础和基因定位
- 癌症
- 肥胖
- 哮喘
- 精神病
- II型糖尿病
- 心血管系统疾病
- 公共卫生保健
- 个性化用药和个性化治疗
- 法医学

An example

- Population structures and association studies

Population structures make trouble in association studies

- Population stratification in Epidemiology.
- Analysis of mixed samples having different allele frequencies is a primary concern in human genetics, as it leads to false evidence for allelic association.

Odds ratio

Disease

Exposure

yes

no

total

yes

a

b

a + b

no

c

d

c + d

total

a + c

b + d

a + b + c + d

Odds for case: a/c

Odds for control: b/d

Odds ratio

Explanation of OR

- OR>1: exposure factors increase the risk of disease; positive association
- OR<1: exposure factors decrease the risk of disease; negative association
- OR=1: no association

Subpopulation 1Subpopulation 2

casecontrol casecontrol

exp(+) 5050 100 exp(+) 19 10

exp(-) 450450 900 exp(-) 99891 990

500 500 1,000 100 900 1000

Total Population

case control

exp(+) 5159 110

exp(-) 5491,341 1,890

600 1,400 2,000

51

600

59

1,400

= 8.5%

= 4.2%

Heterogeneity/StratificationOR=2.02

Human migration

- Anatomically modern humans evolve in Africa > 160,000 ybp.
- Some leave Africa sometime around 75,000 - 55,000 ybp.
- Replace Neanderthals in Europe and archaic humans around the world.
- Arrive in Western hemisphere between 34,000 and 18,000 ybp.
- Multiple migrations in different pre-historic periods, followed by different migrations in historical periods.

Note on Definitions: Biological Race

- morphology (phenotype)
- Geographical location
- Population based (frequency of genes)

Socially Constructed Race: Arbitrarily utilizes

aspects of morphology, geography, culture,

language, religion, etc. in the service of a

social dominance hierarchy.

描述遗传结构的统计量

- Hierarchical F statistics

随机交配（h）情况下杂合子的预期频率

群体（h0）中下杂合子的观察频率

- 固定指数F可正可负，视情况而定。
- 可以看出，当h0小于h时，F取正值；当h0大于h时，F取负值。在近亲交配时，杂合子频率的观察值减小，F就取正值。

上式可写成

亚群体（sub-population）

- 以上考虑的是一个简单的群体，不论其是否近亲交配。
- 然而，实际上大多数的自然群体可被再分为许多不同的繁殖单位或亚群体（sub-population），尽管这些群体并不是完全隔离的。这种情况下，研究群体内和群体间的遗传变异就显得十分重要。

可再分群体中的基因型频率

- 假定一个群体可分为s个亚群体，每一个亚群体都满足Hardy-Weiberg平衡。设xk为第k个亚群体中等位基因A1的频率，则基因型A1A1，A1A2，A2A2的频率分别为

- 我们用wk来表示第k个亚群体的相对大小，且总和为1。则A1A1，A1A2，A2A2在整个群体中的频率为：

其中

和

是亚群体中等位基因频率的均值和方差。

Wahlund定律

- 表明如果一个群体被分为多个交配单位，纯合子的频率要高于Hardy-Weinberg比率。这个性质首先由Wahlund（1928）发现，被称为Wahlund定律，也称Wahlund现象。
- 当等位基因频率在所有亚群体中一致时，F为0；而当每个亚群体都被固定为某一个等位基因时，F为1。

Wahlund现象的启示

- 群体结构（population structure）的存在！
- 反之，当F为负值的时候，

杂合子频率比Hardy-Weinberg平衡时预期的要高，意味着杂合优势，某种程度的自然选择发生。

杂合优势与平衡选择（后面“自然选择”章节细谈）

F-statistics

- Different F-statistics for different scales
- Individual (I)
- Subpopulation (S)
- Total population (T)
- Those are the traditional scales but in theory there can be no limit to the # of levels of analysis .
- Originally defined for 2 alleles
- Extended to >2 alleles as G-statistics

F-statistics Derived from inbreeding coefficient

- FIS
- inbreeding in individuals relative to subpopulation (Weir and Cockerham’s f)
- FST
- inbreeding among subpopulations relative to total population (Weir and Cockerham’s )
- FIT
- inbreeding among individuals relative to total population (Weir and Cockerham’s F)

Remember that inbreeding coefficient, F, is related to loss of heterozygosity

F = 1 – (Ho/He)

- F-statistics can be expressed in the same way

FIS = 1 – (HI/HS)

FST = 1 – (HS/HT)

FIT = 1 – (HI/HT)

HI= HO averaged across subpopulations

HS= He averaged across subpopulations

HT= He for total population = He

Deficit of heterozygote

aa

AA

aa

AA

FST = 1 – (HS/HT)

aa

AA

aa

AA

P(A) = p = 1

P(a) = q = 0

p = 0

q = 1

HS = Hewithin subpopulation

HS = 1 - pi2

= 1 - (12 + 02) = 0

HS = 0

Mean HS = 0

HT= He for total population

For total population, p = 0.5 & q = 0.5

HT =1 - pi2 = 1- [(0.5)2 + (0.5)2] = 0.5

FST = 1 – (HS/HT) = 1 - (0/0.5) = 1

Deficit of homozygote

Aa

Aa

Aa

Aa

FST = 1 – (HS/HT)

Aa

Aa

Aa

Aa

P(A) = p = 0.5

P(a) = q = 0.5

p = 0.5

q = 0.5

Mean HS = 0.5

HS = 1 - pi2 =

= 1 - (0.52 + 0.52) = 0.5

HS = 0.5

HT= He for total population

For total population, p = 0.5 & q = 0.5

HT =1 - pi2 = 1- [(0.5)2 + (0.5)2] = 0.5

FST = 1 – (HS/HT) = 1 - (0.5/0.5) = 0

P(A) = p = 0.5

P(a) = q = 0.5

p = 0.5

q = 0.5

Mean HS = 0.5

HS = 1 - pi2 =

= 1 - (0.52 + 0.52) = 0.5

HS = 0.5

HT= He for total population

For total population, p = 0.5 & q = 0.5

HT =1 - pi2 = 1- [(0.5)2 + (0.5)2] = 0.5

FST = 1 – (HS/HT) = 1 - (0.5/0.5) = 0

AA

AA

Aa

Aa

FST = 1 – (HS/HT)

Aa

Aa

aa

aa

FST uses expected heterozygosity, not observed heterozygosity!!

F statistics

- FIS tells us if there is inbreeding within subpopulations by comparing HI and HS:
- Bars mean that the values are the averages over all the subpopulations that we are considering.
- So FIS measures whether there is, on average, a deficit of heterozygotes within subpopulations.

F statistics

- FST is the statistic that tells us how differentiated the subpopulations are. Formally, FST tells us if there is a deficit of heterozygotes in the metapopulation, due to differentiation among subpopulations:
- Bars mean that the values are the averages over all the subpopulations that we are considering.

F statistics

- FIT tells us how much population structure has affected the average heterozygosity of individuals within the population:
- Also (1-FIS) (1-FST) = (1-FIT).

F-statistics Measure departure from Hardy-Weinberg equilibrium

- FIS = departure from HW in local subpopulations
- FST = genetic divergence among subpopulations
- FIT = total departure from HW including that within and among subpopulations

FIT

FIS

FST

Partitioning of structureIndividuals

Subpopulations

Total population

Inbreeding

Wahlund Effect or fragmentation

1 – FIT= (1 – FST)(1 – FIS)

FIT = FIS + FST – (FIS)(FST)

The three F statistics are related to each other

- FST = (FIT - FIS) / (1 - FIS)
- FST is always positive
- FIS is frequently positive, is negative if there is systematic avoidance of inbreeding
- FIT is positive unless there are not clear subdivisions and there is avoidance

Extensions

- Variance of allele frequencies across subpopulations
- When in HW, Var(q) = 0, therefore FST = 0
- As Var(q) increases, divergence of subpopulations increases

Intuitive meaning of FST

- The proportion of total genetic variation that is distributed among subpopulations, rather than within subpopulations.

Unbiased estimates of FST

- Unbiased estimates of FST were calculated as described by Weir and Hill 2002.
- Suppose we have i subpopulations (where i = 1,…, r), we denote sample allele frequency as , and denote the average frequency over samples as

- and denote the average frequency over samples as

The observed mean square for loci within populations are denoted by MSG:

The observed mean square for between populations are denoted by MSP:

Then FSTcan be estimated as follows:

Where is the average sample size across samples that also incorporates and corrects for the variance in sample size over subpopulations:

Problems with FST

- Assumes Infinite Alleles Model (IAM) or K-alleles model with very low mutation rates (not appropriate for microsat data)
- All alleles differ equally from each other (magnitude of difference between alleles ignored)
- Does not work well with high heterozygosity
- Assumes alleles arrive in population via migration rather than mutation

Special version for microsatellites

- RST (Slatkin 1995)
- Analogue of FST
- Assumes Stepwise Mutation Model (mutation model most appropriate for microsats)
- Allows for high mutation rates
- Allows differences in magnitude between alleles to be accounted for
- Where S = average sum of differences in allele sizes in total population, and SW = average sum within populations

Which to use for sats?

- FST and RST can differ using same data
- If loci don’t conform to SMM model, RST will be underestimated
- If mutation rates are large relative to migration rates, RST is superior
- Longer divergence times between populations favors RST
- RST favored under ideal conditions and with large samples
- FST favored with small samples and when a more conservative estimator is desired

Distance measures for microsatellites

- µ2 (Goldstein et al. 1995): ∑(µx-µy)2/L
- µx is the mean allele size in population x
- µy is the mean allele size in population y
- Summed across all loci and divided by # of loci (L)
- Allele size expressed as # repeat units
- Stepwise mutation model (SMM)
- E(µ2) = 2αt
- α = mutation rate per generation
- t = # generations
- Problems
- α not constant among different loci
- Variance very high
- µsats don’t strictly follow the SMM

Distance measures for microsatellites

- DSA (Bowcock et al., 1994, Nature)
- SA = shared alleles
- PSA = (∑S)/2U
- Where S = # shared alleles at a locus between 2 populations
- U = # loci
- DSA = 1 –PSA
- IAM
- May be superior to µ2 for closely related populations, even for µsat data

Degree of F statistics

According to Sewall Wright:

- FST ranges from 0-1
- 0 = no genetic differentiation; panmixia
- 0.00–0.05 = little genetic diff
- 0.05-0.15 = moderate genetic diff
- 0.15-0.25 = great genetic diff
- 0.25-1.00 = very great genetic diff
- 1 = complete genetic differentiation

Calculate hierarchical FST by Arlequin

Chromosome 21 SNP data

#Asian

Group ={

"CHB"

"JPT"

"CHU"

"HMO"

"AVA"

}

#European

Group ={

"CEU"

"NEuro"

"Basque"

"Italian"

"Hungarian"

}

#African

Group ={

"YRI"

}

Meta-population structure: Drift within populations, migration between populations

p=0.7

N=15

m=.02

m=.07

p=0.4

N=70

p=0.6

N=50

m=.01

p=0.3

N=10

p=0.5

N=150

p=1.0

N=20

Drift and migration have opposite effects

- Drift makes subpopulations differerent
- Migration homogenizes subpopulations

Useful for estimating gene flow

- If you know FST and Ne, you can calculate m

In addition, very little migration is required to prevent substantial genetic divergence among subpopulations resulting from random genetic drift

This can be shown by the following equation:

1

Fst ~

4Nm + 1

Equilibrium

fixation

index

# of migrants/generation

Estimation of gene flow

- Indirect (based on FST)
- Nm = (1 - FST)/4FST
- Some drawbacks but often acceptable if limitations are considered
- High variance at low values of FST

Problems with FST

- Assumptions of model not realistic
- All populations have same N
- Nm is equal among all demes
- Mutations do not occur
- Markers are truly neutral
- Selection not operating (local adaptation causes overestimate of FST estimate and underestiamte of Nm; uniform selection underestimates FST and overestimates Nm
- Recent isolation of demes won’t be detected
- Related to gene flow on evolutionary time scales
- Not appropariate for ecological time scales
- Ignores ongoing dynamics in allele frequencies (rare alleles)

Best in situations where

- Spatial scale small (island model holds and spatially varying selection unlikely)
- Migration rates high (rapid attainment of genetic equilibrium)
- Sample sizes and number of loci used are large - accuracy of estimates
- Long-term estimate of Nem “averaged” over many generations desired
- Not useful for short-term nonequilibrium situations e.g. recently fragmented, rapidly declining populations

Population differentiation under migration and drift

- If Ne and m are small, FST is large
- If Nem < 1 then
- FST > 0.2
- “If there is > 1 migrant per generation, populations do not diverge much.”

OMPG rule of thumb

- From this analysis emerged a genetic rule of thumb that one migrant individual per local population per generation (OMPG) is sufficient to obscure any disruptive effects of drift.

Biologists concerned with population insularization caused by habitat fragmentation began advocating the application of this principle for conservation purposes

Examples:

1. Mace and Lande (1991) used the OMPG rule as a criterion in defining threatened species categories of the World Conservation Union

2. In the U.S. nearly every recovery plan that considers genetic issues and insularization applies the OMPG rule

3. Widely applied by managers charged with initiating connectivity between isolated populations - e.g., reduce concerns about inbreeding depression

Important Aspects of OMPG

Unlikely that polymorphism will be lost within subpopulations - unlikely to reach equilibrium gene frequencies where one allele or the other is lost or “fixed”

Provides a desirable balance between drift and gene flow by preventing the loss of alleles and minimizing loss of heterozygosity within subpopulations but allowing genetic divergence to exist among subpopulations

Difficult to answer without extensive genetic and demographic information on the population

Frankel and Soule (1981) proposed an upper limit of 5 migrants per generationMills and Allendorf (1996) suggest that a minimum of 1 and a maximum of 10 migrants per generation would be the appropriate general rule of thumb for genetic purposes

常用软件

- Arlequin 3.01
- http://anthro.unige.ch/software/arlequin/

练习

- 利用HapMap数据进行群体结构分析；
- http://www.hapmap.org

Download Presentation

Connecting to Server..