SJTU CMGPD 2012 Methodological Lecture

SJTU CMGPD 2012Methodological Lecture Day 9 Kinship

Ancestry identifiersSpecific patrilineal ancestors • In the Basic file… • FATHER_ID • GRANDFATHER_ID • In the Kinship file… • F_ID_1 – same as FATHER_ID • F_ID_2 – same as GRANDFATHER_ID • F_ID_3 – Great-grandfather • F_ID_4 – Great-great-grandfather

Ancestry identifiersSpecific patrilineal ancestors • Wives of paternal ancestors • M_ID_1 – Mother • Same as MOTHER_ID in Basic • M_ID_2 – Paternal grandmother • Father’s mother (fm) • M_ID_3 – Paternal great-grandmother • ffm • M_ID_4 – Paternal great-great-grandmother • fffm

Ancestry identifiersInferred ancestors • Most identifiers refer to actual individuals observed in the dataset • In some cases, the existence of a common ancestor whose death predated the earliest available register is inferred. • Based on relationship codes • Brothers in the earliest available register are inferred to have a common father. • Cousins in the earliest available register are inferred to have a common father. • For grouping purposes, an identifier is assigned that doesn’t refer to anyone observed in the dataset • No corresponding PERSON_ID • FATHER_ID_IMPUTED, GRANDFATHER_ID_IMPUTED are flags indicating that the IDs don’t refer to anyone observed in the dataset

Distributions of men by numbers of descendants

use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\27063-0001-Data.dta" if SEX == 2 & PRESENT bysort PERSON_ID: keep if _n == 1 keep FATHER_ID keep if FATHER_ID != "-99" bysort FATHER_ID: generate sons = _N bysort FATHER_ID: keep if _n == 1 rename FATHER_ID PERSON_ID save Sons, replace use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\27063-0001-Data.dta" if SEX == 2 & PRESENT bysort PERSON_ID: keep if _n == 1 keep GRANDFATHER_ID keep if GRANDFATHER_ID != "-99" bysort GRANDFATHER_ID: generate grandsons = _N bysort GRANDFATHER_ID: keep if _n == 1 rename GRANDFATHER_ID PERSON_ID save Grandsons, replace use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\27063-0001-Data.dta" if SEX == 2 & PRESENT bysort PERSON_ID: keep if _n == 1 merge 1:1 RECORD_NUMBER using "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0004\27063-0004-Data.dta", keepusing(F_ID_3) keep(match master) keep F_ID_3 keep if F_ID_3 != "-99" replace F_ID_3 = substr(F_ID_3,3,.) bysort F_ID_3: generate ggrandsons = _N bysort F_ID_3: keep if _n == 1 rename F_ID_3 PERSON_ID save GGrandsons, replace

use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\27063-0001-Data.dta" if SEX == 2 & PRESENT bysort PERSON_ID: keep if _n == 1 merge 1:1 RECORD_NUMBER using "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0004\27063-0004-Data.dta", keepusing(F_ID_4) keep(match master) keep F_ID_4 keep if F_ID_4 != "-99" replace F_ID_4 = substr(F_ID_4,3,.) bysort F_ID_4: generate gggrandsons = _N bysort F_ID_4: keep if _n == 1 rename F_ID_4 PERSON_ID save GGGrandsons, replace use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\27063-0001-Data.dta" if SEX == 2 & PRESENT bysort PERSON_ID (YEAR): keep if _n == 1 & YEAR <= 1810 keep PERSON_ID merge 1:1 PERSON_ID using Sons, keep(match master) replace sons = 0 if sons == . drop _merge merge 1:1 PERSON_ID using Grandsons, keep(match master) replace grandsons = 0 if grandsons == . drop _merge merge 1:1 PERSON_ID using GGrandsons, keep(match master) replace ggrandsons = 0 if ggrandsons == . drop _merge merge 1:1 PERSON_ID using GGGrandsons, keep(match master) replace gggrandsons = 0 if gggrandsons == . drop _merge

replace sons = 20 if sons >= 20 bysort sons: generate first_in_sons = _n == 1 bysort sons: generate sons_number = _N label variable sons_number "Sons" replace grandsons = 20 if grandsons >= 20 bysort grandsons: generate first_in_grandsons = _n == 1 bysort grandsons: generate grandsons_number = _N label variable grandsons_number "Grandsons" replace ggrandsons = 20 if ggrandsons >= 20 bysortggrandsons: generate first_in_ggrandsons = _n == 1 bysortggrandsons: generate ggrandsons_number = _N label variable ggrandsons_number "Great-grandsons" replace gggrandsons = 20 if gggrandsons >= 20 bysortgggrandsons: generate first_in_gggrandsons = _n == 1 bysortgggrandsons: generate gggrandsons_number = _N label variable gggrandsons_number "Great-great-grandsons" twoway line sons_number sons if first_in_sons, sort yscale(log) || line grandsons_number grandsons if first_in_grandsons, sort || line ggrandsons_numberggrandsons if first_in_ggrandsons, sort || line gggrandsons_numbergggrandsons if first_in_gggrandsons, sort ||, scheme(s1mono) xtitle("Number of descendants") ytitle("Number of men") ylabel(1 10 100 1000 10000)

Kinship variables for groupingUses • Controlling for kin group membership • Via random-effects models • Alongside village, household, other levels • Multiple levels are computationally demanding • Often need tricks to collapse observations or otherwise reduce the dataset • Computation of explanatory variables • Aggregate measures of kin network status to use as right-hand side variables • Units of analysis in their own right • See yesterday

Kinship variables for groupingAscending order of kin distance • FOUNDER_ID • Descent from a common male ancestor in the registers • FOUNDER_INFERRED_ID • Descent from a common male ancestor inferred from relationship codes in the earliest available register • UNIQUE_YI_HU • Descent from members of the same yihu in the earliest available register • UNIQUE_GROUP • Descent from members of the adjacent yihu with the same surname in the earliest available register

Numbers and average sizes of units

Kinship variables for groupingFOUNDER_ID • PERSON_ID of earliest male ancestor located in the registers. • Most narrowly-defined grouping variable • Based on descent from a single observed individual. • Many extinctions • Within one or two generations • Causes average size of groups defined by FOUNDER_ID to rise over time

bysort FOUNDER_ID: generate founder_id_obs = _N bysort FOUNDER_ID: generate first_in_founder_id = _n == 1 replace founder_id_obs = 200 if founder_id_obs > 200 histogram founder_id_obs if first_in_founder_id, width(10) scheme(s1mono) xtitle("Number of observations with same FOUNDER_ID") fraction

bysort FOUNDER_ID YEAR: generate founder_id_obs_year = _N bysort FOUNDER_ID YEAR: keep if _n == 1 collapse founder_id_obs_year, by(YEAR) line founder_id_obs_year YEAR, scheme(s1mono) ytitle("Mean number of observations per FOUNDER_ID") ylabel(0(2)12)

Kinship variables for groupingFOUNDER_ID_INFERRED • Uses earliest available inferred ancestor • Based on relationship codes in earliest available register • Useful for grouping records in earliest registers • Until 1789, relationships were to head of yihu, not linghu. • Allowed for inference of common ancestry • Average size of groups defined by FOUNDER_ID_INFERRED increases over time because of extinction of smaller groups

bysort FOUNDER_INFERRED_ID: generate founder_id_obs = _N bysort FOUNDER_INFERRED_ID: generate first_in_founder_id = _n == 1 replace founder_id_obs = 200 if founder_id_obs > 200 histogram founder_id_obs if first_in_founder_id, width(10) scheme(s1mono) xtitle("Number of observations with same FOUNDER_INFERRED_ID") fraction

bysort FOUNDER_INFERRED_ID YEAR: generate founder_id_obs_year = _N bysort FOUNDER_INFERRED_ID YEAR: keep if _n == 1 collapse founder_id_obs_year, by(YEAR) line founder_id_obs_year YEAR, scheme(s1mono) ytitle("Mean number of observations per FOUNDER_INFERRED_ID") ylabel(0(2)12))

Kinship variables for groupingUNIQUE_YIHU • Descendants of members of the same yihu in the earliest available register. • Clusters are much larger than the ones defined by FOUNDER_ID or FOUNDER_INFERRED_ID

bysort UNIQUE_YI_HU YEAR: generate founder_id_obs_year = _N bysort UNIQUE_YI_HU YEAR: keep if _n == 1 collapse founder_id_obs_year, by(YEAR) line founder_id_obs_year YEAR line founder_id_obs_year YEAR, scheme(s1mono) ytitle("Mean number of observations per UNIQUE_YI_HU") ylabel(0(5)60)

Kinship variables for groupingUNIQUE_GROUP • Descendants of members of consecutive yihuin earliest available register who have same surname. • Most stable over time in terms of size and number • Ideal for analysis of change over the long term

bysort UNIQUE_GROUP YEAR: generate founder_id_obs_year = _N bysort UNIQUE_GROUP YEAR: keep if _n == 1 collapse founder_id_obs_year, by(YEAR) line founder_id_obs_year YEAR, scheme(s1mono) ytitle("Mean number of observations per UNIQUE_GROUP") ylabel(0(5)60)

SJTU CMGPD 2012 Methodological Lecture