Social Network Analysis American Sociological Association San Francisco, August 2004 James Moody

Social Network Analysis American Sociological Association San Francisco, August 2004 James Moody

Introduction We live in a connected world: “To speak of social life is to speak of the association between people – their associating in work and in play, in love and in war, to trade or to worship, to help or to hinder. It is in the social relations men establish that their interests find expression and their desires become realized.” Peter M. Blau Exchange and Power in Social Life, 1964 "If we ever get to the point of charting a whole city or a whole nation, we would have … a picture of a vast solar system of intangible structures, powerfully influencing conduct, as gravitation does in space. Such an invisible structure underlies society and has its influence in determining the conduct of society as a whole." J.L. Moreno, New York Times, April 13, 1933 These patterns of connection form a social space, that can be seen in multiple contexts:

Introduction Source: Linton Freeman “See you in the funny pages” Connections, 23, 2000, 32-42.

Introduction High Schools as Networks

Introduction • And yet, standard social science analysis methods do not take this space into account. • “For the last thirty years, empirical social research has been dominated by the sample survey. But as usually practiced, …, the survey is a sociological meat grinder, tearing the individual from his social context and guaranteeing that nobody in the study interacts with anyone else in it.” • Allen Barton, 1968 (Quoted in Freeman 2004) • Moreover, the complexity of the relational world makes it impossible to identify social connectivity using only our intuitive understanding. • Social Network Analysis (SNA) provides a set of tools to empirically extend our theoretical intuition of the patterns that construct social structure.

Introduction Why do Networks Matter? Local vision

Introduction • Why networks matter: • Intuitive: “goods” travel through contacts between actors, which can reflect a power distribution or influence attitudes and behaviors. Our understanding of social life improves if we account for this social space. • Less intuitive: patterns of inter-actor contact can have effects on the spread of “goods” or power dynamics that could not be seen focusing only on individual behavior.

Introduction • Social network analysis is: • a set of relational methods for systematically understanding and identifying connections among actors. SNA • is motivated by a structural intuition based on ties linking social actors • is grounded in systematic empirical data • draws heavily on graphic imagery • relies on the use of mathematical and/or computational models. • Social Network Analysis embodies a range of theories relating types of observable social spaces and their relation to individual and group behavior.

Introduction • Social Network Data • Basic data Elements • Collecting network data • Basic data structures • Measuring Networks • Flows within of goods in networks • Topology • Time • Structure of Social Space • Small Worlds, Scale-Free, Triads • Cohesive Groups • Role Positions • Modeling with Networks • Modeling Behaviors with Networks • Peer attribute models • Network Autocorrelation Models • Dyad / QAP Models • Modeling Network Network Structure • QAP for network structure • Exponential Random Graph Models • SNA Computer Programs

Social Network Data The unit of interest in a network are the combined sets of actors and their relations. We represent actors with points and relations with lines. Actors are referred to variously as: Nodes, vertices or points Relations are referred to variously as: Edges, Arcs, Lines, Ties Example: b d a c e

b d b b d d a c e a a c c e e Social Network Data In general, a relation can be: Binary or Valued Directed or Undirected Directed, binary Undirected, binary b d 1 2 1 3 4 a c e Directed, Valued Undirected, Valued

Social Network Data • Social network data are substantively divided by the number of modes in the data. • 1-mode data represents edges based on direct contact between actors in the network. All the nodes are of the same type (people, organization, ideas, etc). Examples: • Communication, friendship, giving orders, sending email. • 1-mode data are usually singly reported (each person reports on their friends), but you can use multiple-informant data, which is more common in child development research (Cairns and Cairns).

Social Network Data Social network data are substantively divided by the number of modes in the data. 2-mode data represents nodes from two separate classes, where all ties are across classes. Examples: People as members of groups People as authors on papers Words used often by people Events in the life history of people The two modes of the data represent a duality: you can project the data as people connected to people through joint membership in a group, or groups to each other through common membership There may be multiple relations of multiple types connecting your nodes.

Social Network Data We can examine networks across multiple levels: • 1) Ego-network • - Have data on a respondent (ego) and the people they are connected to (alters). Example: 1985 GSS module • - May include estimates of connections among alters • 2) Partial network • - Ego networks plus some amount of tracing to reach contacts of contacts • - Something less than full account of connections among all pairs of actors in the relevant population • - Example: CDC Contact tracing data for STDs

Social Network Data We can examine networks across multiple levels: • 3) Complete or “Global” data • - Data on all actors within a particular (relevant) boundary • - Never exactly complete (due to missing data), but boundaries are set • Example: Coauthorship data among all writers in the social sciences, friendships among all students in a classroom • For the most part, I will be discussing techniques surrounding global networks today, though I will briefly mention some standard uses of ego-network data.

Social Network Data Collecting Network Data • Data capture any connection between the nodes. Sources include surveys, published accounts, special informants, etc. • In general, you can only make conclusions about relations among the set of nodes you have collected, so it is important to observe as much of the network as possible. • See W&F, chap 2 on different types of data collection

Social Network Data Collecting Network Data • If you use surveys to collect data, some general rules of thumb: • Network data collection can be time consuming. It is better (I think) to have breadth over depth. Having detailed information on <50% of the sample will make it very difficult to draw conclusions about the general network structure. • Question format: • If you ask people to recall names (an open list format), fatigue will result in under-reporting • If you ask people to check off names from a full list, you can often get over-reporting • c)It is common to limit people to ~5 nominations. This will bias network stats for stars, but is sometimes the best choice to avoid fatigue. • d) Concrete relational indicators are best (who did you talk to?) over attitudes that are harder to define (who do you like?)

Social Network Data Collecting Network Data • Existing Sources of Social Network Data • Check INSNA: The International Network of Social Network Analysis • Many secondary sources (particularly for 2-mode data) • National Longitudinal Survey of Adolescent Health (Add Health)

Social Network Data Basic Data Structures Working with pictures. No standard way to draw a sociogram: each of these are equal:

Social Network Data Basic Data Structures In general, graphs are cumbersome to work with analytically, though there is a great deal of good work to be done on using visualization to build network intuition. I recommend using layouts that optimize on the feature you are most interested in, and find that either a hierarchical layout or a force-directed layout are best.

a a b b c c d d e e b d b d a a 1 1 a c e a 1 1 c e b b 1 c c 1 1 1 1 1 1 d d 1 1 e e 1 1 1 1 Social Network Data Basic Data Structures From pictures to matrices Undirected, binary Directed, binary

a b b a c c b d e d c e e c d a b c d e a 1 1 b 1 c 1 1 1 d 1 1 e 1 1 Social Network Data Basic Data Structures From matrices to lists Arc List Adjacency List a b b a b c c b c d c e d c d e e c e d

Measuring Networks: Flow “Goods” flow through networks:

Measuring Networks: Flow • In addition to the simple probability that one actor passes information on to another (pij), two factors affect flow through a network: • Topology • the shape, or form, of the network • - Example: one actor cannot pass information to another unless they are either directly or indirectly connected • Time • - the timing of contact matters • - Example: an actor cannot pass information he has not receive yet

Measuring Networks: Flow Two features of the network’s topology are known to be important: connectivity and centrality • Connectivity refers to how actors in one part of the network are connected to actors in another part of the network. • Reachability: Is it possible for actor i to reach actor j? This can only be true if there is a chain of contact from one actor to another. • Distance: Given they can be reached, how many steps are they from each other? • Number of paths: How many different paths connect each pair?

Measuring Networks: Flow Without full network data, you can’t distinguish actors with limited information potential from those more deeply embedded in a setting. c b a

b f c e d Measuring Networks: Flow Reachability Indirect connections are what make networks systems. One actor can reach another if there is a path in the graph connecting them. a b d a c e f Paths can be directed, leading to a distinction between “strong” and “weak” components

Measuring Networks: Flow Reachability Reachability If you can trace a sequence of relations from one actor to another, then the two are reachable. If there is at least one path connecting every pair of actors in the graph, the graph is connected and is called a component. Intuitively, a component is the set of people who are all connected by a chain of relations.

Measuring Networks: Flow Reachability This example contains many components.

Measuring Networks: Flow Distance & number of paths Distance is measured by the (weighted) number of relations separating a pair: Actor “a” is: 1 step from 4 2 steps from 5 3 steps from 4 4 steps from 3 5 steps from 1 a

Measuring Networks: Flow Distance & number of paths Paths are the different routes one can take. Node-independent paths are particularly important. There are 2 independent paths connecting a and b. b There are many non-independent paths a

1.2 1 10 paths 0.8 5 paths probability 0.6 2 paths 0.4 1 path 0.2 0 2 3 4 5 6 Path distance Measuring Networks: Flow Distance & number of paths Probability of transfer by distance and number of paths, assume a constant pij of 0.6

Reachability in Colorado Springs (Sexual contact only) • High-risk actors over 4 years • 695 people represented • Longest path is 17 steps • Average distance is about 5 steps • Average person is within 3 steps of 75 other people • 137 people connected through 2 independent paths, core of 30 people connected through 4 independent paths (Node size = log of degree)

Measuring Networks: Flow Centrality • Centrality refers to (one dimension of) location, identifying where an actor resides in a network. • For example, we can compare actors at the edge of the network to actors at the center. • In general, this is a way to formalize intuitive notions about the distinction between insiders and outsiders.

Measuring Networks: Flow Centrality • At the individual level, one dimension of position in the network can be captured through centrality. • Conceptually, centrality is fairly straight forward: we want to identify which nodes are in the ‘center’ of the network. In practice, identifying exactly what we mean by ‘center’ is somewhat complicated, but substantively we often have reason to believe that people at the center are very important. • Three standard centrality measures capture a wide range of “importance” in a network: • Degree • Closeness • Betweenness

Measuring Networks: Flow Centrality The most intuitive notion of centrality focuses on degree. Degree is the number of ties, and the actor with the most ties is the most important:

Measuring Networks: Flow Centrality If we want to measure the degree to which the graph as a whole is centralized, we look at the dispersion of centrality: Simple: variance of the individual centrality scores. Or, using Freeman’s general formula for centralization (which ranges from 0 to 1):

Measuring Networks: Flow Degree Centralization Scores Centrality Freeman: 0.0 Variance: 0.0 Freeman: 1.0 Variance: 3.9 Freeman: .02 Variance: .17 Freeman: .07 Variance: .20

Measuring Networks: Flow Centrality A second measure of centrality is closeness centrality. An actor is considered important if he/she is relatively close to all other actors. Closeness is based on the inverse of the distance of each actor to every other actor in the network. Closeness Centrality: Normalized Closeness Centrality

Measuring Networks: Flow Centrality Closeness Centrality in the examples C=0.0 C=1.0 C=0.36 C=0.28

Measuring Networks: Flow Centrality Betweenness Centrality: Model based on communication flow: A person who lies on communication paths can control communication flow, and is thus important. Betweenness centrality counts the number of shortest paths between i and k that actor j resides on. b a C d e f g h

Measuring Networks: Flow Centrality Betweenness Centrality: Where gjk = the number of geodesics connecting jk, and gjk(ni) = the number that actor i is on. Usually normalized by:

Measuring Networks: Flow Centrality Betweenness Centrality: Centralization: 1.0 Centralization: 0 Centralization: .59 Centralization: .31

Measuring Networks: Flow Centrality Actors that appear very different when seen individually, are comparable in the global network. (Node size proportional to betweenness centrality )

Measuring Networks: Flow Time • Two factors that affect network flows: • Topology • - the shape, or form, of the network • - simple example: one actor cannot pass information to another unless they are either directly or indirectly connected • Time • - the timing of contacts matters • - simple example: an actor cannot pass information he has not yet received.

Measuring Networks: Flow Time Timing in networks • A focus on contact structure has often slighted the importance of network dynamics,though a number of recent pieces are addressing this. • Time affects networks in two important ways: • The structure itself evolves, in ways that will affect the topology an thus flow. • 2) The timing of contact constrains information flow

Measuring Networks: Flow Time Drug Relations, Colorado Springs, Year 1 Data on drug users in Colorado Springs, over 5 years

Social Network Analysis American Sociological Association San Francisco, August 2004 James Moody

Social Network Analysis American Sociological Association San Francisco, August 2004 James Moody

Presentation Transcript

Introduction to Social Network Analysis Duke University May 2012 James Moody Duke University

Social Network Analysis

Social Network Analysis

Social Network Analysis

The San Francisco Chapter of the American Marketing Association

Social Network Analysis

American Sociological Review

The San Francisco Chapter of the American Marketing Association

American Public Health Association November 19, 2003 San Francisco, CA

SOCIAL NETWORK ANALYSIS

Social Network Analysis

Social Network Analysis

American Public Power Association Legal Forum San Francisco, November 8, 2004

Structuring Our Network San Francisco

ARMENIAN SOCIOLOGICAL ASSOCIATION

Introduction to Social Network Analysis Columbia University April 2007 James Moody

Social Network Analysis

American Sociological Association August 11, 2009 San Francisco, California

San Francisco Symphony Social Media Outreach

Universal Access to Care: Healthy San Francisco American Public Health Association

Social Network Analysis