The statistical analysis of personal network data

The statistical analysis of personal network data Part I: Cross-sectional analysis Part II: Dynamic analysis

A word about quantitative and qualitative approaches • Quantitative and qualitative approaches play complementary roles in personal network analysis • A qualitative pilot study can help to identify important predictors / Qualitative analyses can provide insights into the sources of error/ temporal instability • Quantitative analyses are crucial to determine the statistical effect of characteristics / Individuals do not know how for example their own constant characteristics influence their network.

In summary, types of information collected with Egonet: • Information about the respondent (ego; e.g., age, sex, nationality) • Information about the associates (alters) to whom ego is connected (e.g., age, sex, nationality) • Information about the ego-alterpairs (e.g., closeness, frequency and or means of contact, time of knowing, geographic distance, whether they discuss a certain topic, type of relation – e.g.,family, friend, neighbour, workmate -) • Information about the relations among alters as perceived by ego (simply whether they are related or not, or strong/weak/no relation)

The statistical analysis of personal versus sociocentric networks: what are the differences? • Whereas sociocentric network researchers often (yet not always) concentrate on a single network, personal network researchers typically investigate a sample of networks. • The dependency structure of sociocentric networks is complex, therefore leading to the need of specialized social network software, but personal network researchers, as they often hardly use the data on alter-alter relations*, have a simpler dependency structure...

Personal network data have a “multilevel structure” E.g.: sample of 20 respondents, for each respondent, we collected data of 45 alters, so we have in total a collection of 900 dyads ego alter

Type I: Aggregated analysis Type II: Disaggregated analysis (not okay, forget about it quickly!) Type III: Multilevel analysis Three types of analysis have been used in past research

Type 1: Aggregated analysis • First, aggregate all information to the ego-level: • Compositional variables (aggregated characteristics of alters or ego-alter relations): e.g., percentage of women, average age of the alters, average time of knowing, average closeness • Structural variables (aggregated characteristics of alter-alter relations): e.g., network size, density of the network, betweenness, number of isolates, cliques • Then use standard statistical procedures to e.g.: • Describe the network composition or structure or compare them across populations • Explain the networks (network as a dependent variable) • Relate the networks to some variable of interest (network as an explanatory variable) • Statistically correct provided that you are aware of your level of analysis

Example: Effect at network level cannot be interpreted at tie level

Type 2: Disaggregate analysis • Disaggregated analysis of dyadic relations (e.g., run an linear regression analysis on the 900 alters) is statistically not correct even though it has been done (e.g. Wellman et al., 1997, Suitor et al., 1997) • Observations of alters are not statistically independent as is assumed by standard statistical procedures • Standard errors are underestimated, and consequently significance is overestimated

Type 3: Multilevel analysis • Multilevel analysis of dyadic relations • Multilevel analysis is a generalization of linear regression, where the variance in outcome variables can be analyzed at multiple hierarchical levels. In our case, alters (level 1) are nested within ego’s / networks (level 2), hence variance is decomposed in variance between and within networks. • Software: e.g., MLwiN, HLM, VarCL • Dependent variable: Some characteristic of the dyadic relation (e.g., strength of tie) - Networks as the dependent variables. Note: Special multilevel models have been developed for discrete dependent variables. • Explanatory variables can be (among others): • characteristics of ego (level 2), • characteristics of alters (level 1), • characteristics of the ego-alter pairs (level 1).

See for a good article about the possibilities of multilevel analysis of personal networks (incl. a quick comparison with aggregated and disaggregated types of analysis): • Van Duijn, M. A. J., Van Busschbach, J. T., & Snijders, T. A. B. (1999). Multilevel analysis of personal networks as dependent variables. Social Networks, 21, 187-209.

In summary, cross-sectional analysis... The two types of analysis, even when focusing on the same variable, address different types of questions: □ Multilevel analysis: e.g., what predicts the strength of ties? □ Aggregated analysis: e.g., what predicts the average strength of ties in personal networks?

Illustration of type I: Aggregate analysis The case of migrants in Spain • We collected information of about 300 migrants in Catalonia with Egonet (in 2004-2005), from four countries of origin • For each respondent, information was collected about: • Ego (country of origin, years of residence in Spain, sex, age, marital status, level of education, etc.) • Alters (country of origin, country of living, etc.) • Ego-alter pairs (closeness, tie strength, type of relation, etc.) • Relations among alters

Illustration: The case of migrants in Spain • Our research questions were: • Can we distinguish different types of personal networks (profiles) among migrants? • Can the type of personal network be predicted by the years of residence of a migrant? • If so, do years of residence still predict network profiles when controlled for other important background characteristics?

Method • For each personal network (excluding ego), we first calculated compositional and structural characteristics (aggregate level) • Then, we used the following statistical procedures to analyse the 286 valid cases: • K-means cluster analysis based on various network characteristics (see next slide), to identify homogeneous groups of networks (“network profiles”) • ANOVA to see whether profiles differ in years of residence • Multinomial logistic regression to predict profile membership from years of residence controlled for background variables age, sex, country of origin, employment

K-means cluster analysis (SPSS) • Based on the network variables (all standardized): • 1. Proportion of alters whose country of origin is Spain • 2. Proportion of fellow migrants • 3. Density • 4. Network betweenness centralization • 5. Number of clusters (“subgroups”) within the network • 6. Subgroup homogeneity regarding living in Spain • 7. Average frequency of contact (7-point scale) • 8. Average closeness (5-point scale) • 9. Proportion of family in the network

Results cluster analysis • Five-cluster solution was best interpretable and reasonably balanced • Cluster sizes: • Profile 1, “the scarce network”: N = 54 • Profile 2, “the dense family network”: N = 28 • Profile 3, “the multiple subgroups network”: N = 73 • Profile 4, “the two worlds connected network”: N = 75 • Profile 5, “the embedded network”: N = 50 • Characteristics that most contributed to the cluster partition are: • density • homogeneity of the subgroups regarding living in Spain • percentage of Spanish in the network

Description of profiles

Profile 1. Scarce network Color: country of origin (white = foreign, black = Spain); Size: country of living (large = Spain, small = other country)

Profile 2. Dense family network Color: country of origin (white = foreign, black = Spain); Size: country of living (large = Spain, small = other country)

Profile 3: Multiple subgroups network Color: country of origin (white = foreign, black = Spain); Size: country of living (large = Spain, small = other country)

Profile 4: Two worlds connected Color: country of origin (white = foreign, black = Spain); Size: country of living (large = Spain, small = other country)

Profile 5: Embedded network Color: country of origin (white = foreign, black = Spain); Size: country of living (large = Spain, small = other country)

Is the partition related to years of residence? (ANOVA in SPSS) Overall: F (4, 2.67) = 6.634, p < .001 Per profile: There are two homogeneous subsets that differ significantly in years of residence: Profiles 1 and 2, versus profiles 3, 4, and 5.

Is the partition also related to years of residence when controlled for background characteristics? Multinominal logistic regression (SPSS) • Age and employment status did not have significant effects • Sex and country of origin, however, influenced profile membership significantly: e.g., Senegambians had a higher probability to have a “dense family network” than others. • However, even controlled for these background characteristics, years of residence still predicts cluster membership.

Conclusion of our illustration • The network profiles give valuable information about adaptation to a host country • The scarce network and the dense family network seem “transitional networks”, whereas the other three seem more settled.

But... • In order to investigate whether the networks of migrants really follow a certain pattern of change (or multiple patterns depending on for example country of origin or entry situation), we need a longitudinal model.

... and what about the analysis of alter-alter relations? • Most researchers are only interested in alter-alter relations to say something about the structure of personal networks of respondents: • Use structural measures (density, betweenness, number of cliques etc.) in an aggregated analysis • Apply triad census analysis (Kalish & Robins, 2006) • If you’re interested in predicting who is related to whom (among the alters): • Specify Exponential Random Graph Model (ERGM) for each network and then run a meta-analysis over the results (cf., Lubbers, 2003; Lubbers & Snijders, 2007)

ERGMs • ERGMs are available in, among others, the software StOCNET (where you can find SIENA as well) • Dependent variable: whether alters are related or not • Independent variables: characteristics of alters, the relation alters have with ego, the alter-alter pair, endogenous network characteristics such as transitivity (in the meta-analysis, characteristics of ego can be added as well) • Type of analysis: Apply a common ERGM to each network (leaving ego out), then run a meta-analysis (cf. Lubbers, 2003; Snijders & Baerveldt, 2003; Lubbers & Snijders, 2007).

Part II. Dynamic analysis • How do personal networks change over time? • Data on personal networks are collected in two or more waves in a panel study

Interest in dynamic analysis • “Networks at one point in time are snapshots, the results of an untraceable history” (Snijders) • E.g., personal communities in Toronto (Wellman et al.) • Changes following a focal life event (individual level) • E.g., transition from high school to university (Degenne & Lebeaux, 2005); childbearing, moving, return to school in midlife (Suitor & Keeton, 1997); retirement (Van Tilburg, 1992); marriage (Kalmijn et al., 2003); divorce (Terhell, Broese Van Groenou, & Van Tilburg, 2007); widowhood (Morgan, Neal, & Carder, 2000); migration(Molina et al.) • Broader studies of social change: Social and cultural changes in countries with dramatic institutional changes • E.g., post-communism in Finland, Russia (Lonkila, 1998), and Eastern Germany (Völker & Flap, 1995)

Types of dynamic personal network research (networks as dependent variables) • Feld et. al. (2007), Field Methods 19, 218-236:

Types of dynamic personal network research • Feld et. al. (2007), Field Methods 19, 218-236:

Illustration: The case of migrants in Spain • Migrants in Catalonia (Barcelona, Vic, Girona). • We collected information about the personal networks of about 300 migrants (in 2004-2005). • Sample of 90 individuals for the second wave (1,5 - 2 years later on average). • Questionnaire at t2 identical to t1, but supplemented with queries about the changes, such as about alters who disappeared from the network • For the present illustration, we are focusing on Argentinean migrants only (part of the interviewsN=22).

Type 1: Persistence of ties with alters across time • Dependent variable: whether a tie persists or not to a subsequent time (dichotomous) • Explanatory variables: characteristics of ego, alter, the ego-alter pair, and the situation, especially in combination with the initial characteristics of the relationship • Type of analysis: Logistic multilevel analysis

Illustration type 1: The case of migrants in Spain • Cases: 900 alters nested within 20 respondents • Descriptive: How persistent are ties over time? • 53% of these alters were again nominated in Wave 2 (N = 473), whereas 47% of the nominations was not repeated (N = 427). • Explanatory: What predicts the persistence of ties over time? • Logistic multilevel analysis (see Table 1)

Table 1. Regression coefficients and standard errors (between brackets) of the logistic multilevel regression model predicting persistence of ties (N = 900).

Additionally: Differences between dissolved and new ties • Are the new ties qualitatively better than the broken ones? • Alters newly nominated in Wave 2 were somewhat • frequently contacted (3.2 versus 2.8 on frequency of contact scale, t = 5.32, df = 888, p < .001), and somewhat • closer (2.9 versus 2.4 on closeness, t = 3.70, df = 888, p < .001) than the alters who were not nominated again in Wave 2. • Furthermore, new relations were somewhat more often family members (18%) than relations that were broken (12%; χ2 = 6.03, df = 1, p < .05).Involution?

Type 2: Changes in characteristics of persistent ties across time • Dependent variable: change in some characteristic of the relationship (e.g., change in strength of tie) • Explanatory variables:characteristics of ego, alter, the ego-alter pair, and the situation, especially in combination with the initial characteristics of the relationship • Type of analysis: Multilevel analysis

Illustration Type 2: The case of migrants in Spain • Cases: 473 persistent ties • Descriptive: • There was a fair amount of change in frequency of contact (Mt1 = 3.50, Mt2 = 2.94; t = 8.231, df = 472, p < .05) and less change in closeness in stable ties (Mt1 = 3.68, Mt2 = 3.87; t = -4.065, df = 472, p < .05) • Explanatory: • Multilevel analysis (see Table 2).

Table 2. Regression coefficients and standard errors (between brackets) of the multilevel regression model predicting changes in frequency of contact and closeness in stable ties (N = 473). * p < .05

The statistical analysis of personal network data

The statistical analysis of personal network data

Presentation Transcript

the statistical analysis of data

Statistical Data Analysis

The statistical analysis of fMRI data

Statistical Data Analysis

Statistical Analysis of Microarray Data

The statistical analysis of personal network data

Statistical Data Analysis

Statistical Analysis of Quantitative Data

Personal Network Analysis

Statistical Data Analysis

Statistical Analysis of Microarray Data

Statistical Analysis of Data

STATISTICAL DATA ANALYSIS

Statistical Analysis of Reservoir Data

Statistical Analysis of Decay Data

Statistical methods of data analysis

Statistical analysis of expression data:

Multivariate Data/Statistical Analysis

Statistical Analysis of EO Data

Statistical Data Analysis

The statistical analysis of fMRI data

Statistical Analysis of Microarray Data