Introduction to Social Network Analysis

Introduction to Social Network Analysis Anne ter Wal February 12, 2008http://econ.geo.uu.nl/terwal/terwal.html

Structure • A. Networks in cluster research • B. What is different about network data? • C. Basic terminology • D. Analysis • E. Network data: primary vs. secondary GEOGRAPHY OF NETWORKS

A. Networks in cluster research Applying network analysis, one makes flows between agents explicit Agents (nodes): firms, inventors, technicians Flows (links): knowledge, labour, capital, goods: • Business relationships (incl. buyer-supplier relationships) • Knowledge exchange • Cooperation (incl. joint-ventures and strategic alliances) • Labour mobility of workers • Spin-off relationships • Social relations between entrepreneurs / technicians GEOGRAPHY OF NETWORKS

A. Three central questions Three questions in applied SNA research: • What is the structure of the network? • ORIGINS: How can this network structure be explained? • EFFECTS: How does this network structure affect the performance of its agents (or of the region in which it is located)? GEOGRAPHY OF NETWORKS

B. What is different about network data? According to usual statistics person A and person U are similar: they both have two five friends. GEOGRAPHY OF NETWORKS

GEOGRAPHY OF NETWORKS

B. What is different about network data? • Characteristics of an actor are described in terms of the position in a wider structure of interrelated actors.  THESE CHARACTERISTICS ARE INTERDEPENDENT • The behavioural choices of individual actors taken together constitutes a self-organizing system  THE PROPERTIES OF A NETWORK DEPEND ON INDIVIDUALS’ CHOICES GEOGRAPHY OF NETWORKS

C. Some basic terminology Node (computer science) Vertex (physics) Actor (sociology) Link (computer science) Edge (physics) Tie (sociology) GEOGRAPHY OF NETWORKS

C. Network data Matrix Graph Basic network data: presence or absence of links between a set of nodes. NOTE: graphs visualize the presence or absence of links. The location of the nodes and the length of the links do not have any meaning! GEOGRAPHY OF NETWORKS

GEOGRAPHY OF NETWORKS

C. Graphs Undirected graph • Direction of the links does not matter • Also called bonded graph • Examples:- Friendship network - Railroad network- Cooperation between firms Directed graph • Examples:- A street map with one-way streets- Labour mobility between firms GEOGRAPHY OF NETWORKS

C. Graphs Binary graphs / simple graphs • A link can be either present (value is 1) or absent (or value is 0) Valued graphs • The present links in a graph can have a valueFor example:- distance- size of the flow (e.g. trade)- intensity- capacity- frequency- cost- time GEOGRAPHY OF NETWORKS

C. Valued network of international calls in Europe Source: TeleGeography GEOGRAPHY OF NETWORKS

C. Graphs • Simplex versus multiplex graphs • Multiplex graphs visualize links of various types in a single graph Examples: cities connected by motorway and/or by railroad firms having a business relationship and/or a cooperative relationship GEOGRAPHY OF NETWORKS

Binary matrix Undirected links: symmetric Directed links: asymmetric Valued matrix Undirected links: symmetric Differential weights: asymmetric C. Matrices Matrices are squared: rows and columns contain the same actors, in the same order. The diagonal of the matrix refers to self-loops and are usually left out of consideration to to to to from from GEOGRAPHY OF NETWORKS

C. Levels of analysis In social network analysis three levels of analysis can be distinguished: • Individual nodes (section D1) • The dyad: pairs of actors (section D2) • The network as a whole (section D3) GEOGRAPHY OF NETWORKS

D1 Nodes • persons, families, animals • firms, organizations, universities, political parties, scientific communities • continents, countries, regions, cities, neighbourhoods • newspaper articles, patents, scientific articles, words, letters, languages, web pages • atoms, molecules, cells, bacteria, ions • etc., etc., etc., etc., etc., etc., etc., etc., etc. GEOGRAPHY OF NETWORKS

D1 Nodes: degree • The basic property of a node is its degree: the number of direct relationships. isolate GEOGRAPHY OF NETWORKS

D1 Nodes: in-degree and out-degree • In directed graphs in-degree and out-degree can be distinguished. GEOGRAPHY OF NETWORKS

D1 Nodes: attributes Beside by their degree nodes can de described by their attributes. Attributes are characteristics of a node, not related to its position in a network. • Age, gender, religion, residence, income of people in a friendship network. • Location, sector, number of employees, revenues, age of firms in an inter-firm cooperation network • Number of inhabitants, geographical coordinates, average income of cities connected in a high-speed train network GEOGRAPHY OF NETWORKS

D1 Centrality • Degree • Betweenness • Closeness GEOGRAPHY OF NETWORKS

D1 Degree Centrality • The number of nodes adjacent to given node Highest Degree Centrality GEOGRAPHY OF NETWORKS

D1 Betweenness Centrality • Loosely: number of times that a node lies along the shortest path between two others Highest Betweenness Centrality GEOGRAPHY OF NETWORKS

D1 Closeness Centrality • Sum of geodesic distances to all other nodes • Inverse measure of centrality “Highest” Closeness Centrality GEOGRAPHY OF NETWORKS

Degree Betweenness Closeness Data courtesy of David Krackhardt GEOGRAPHY OF NETWORKS

D1 Centrality • Degree • how well connected; direct influence • Closeness • how far from all others • how long information takes to arrive • Betweenness • brokerage, gatekeeping, control of info GEOGRAPHY OF NETWORKS

D1 Clustering Coefficient “The extent to which friends of friends are friends.” The extent to which the direct neighbours of a node are linked. CC=1/3 GEOGRAPHY OF NETWORKS

D2 Dyads • A dyad is a pair of nodes in a network. • In a simple graph:A dyad can have value 0: the link is absent.A dyad can have value 1: the link is present. • In a valued graph:A dyad can have value 0: the link is absentA dyad can have a value other than 0: a link is present, with value x. • The cells of a matrix display all values of a dyad. GEOGRAPHY OF NETWORKS

D2 Dyads: geodesic distance A particular type of path is the geodesic path: the shortest path between a pair of nodes. Matrix of geodesic distance GEOGRAPHY OF NETWORKS

D2 Dyads: reachability • A graph is not necessarily one integrated network. It can consist of several components. A reachability matrix displays for every dyad whether there is a path between them (1) or not (0). GEOGRAPHY OF NETWORKS

D2 Bridge • A tie that, if removed, would disconnect the net GEOGRAPHY OF NETWORKS

D2 Structural Holes • Basic idea: Lack of ties among alters may benefit ego • Benefits • Autonomy • Control • Information GEOGRAPHY OF NETWORKS

B B B A C A C A C B B A C A C D2 Brokerage Roles for node B Coordinator Representative Gatekeeper Consultant Liaison GEOGRAPHY OF NETWORKS

D3 Network: size and density • The size of a network can be expressed in total number of nodes N or total number of links L. • The density Δof a network is the total number of existing links divided by the total number of possible links. • Density Δis expressed in a number between 0 (a completely disconnected graph) and 1 (a completely connected graph).  Δ = 0.25  Δ = 0.39 GEOGRAPHY OF NETWORKS

D3 Network: average geodesic distance • Geodesic distance: shortest path between two nodes • Average geodesic distance: average over all dyads of a graph Average geodesic distance = 1.9“core-periphery” Average geodesic distance = 2.4“clique structure” GEOGRAPHY OF NETWORKS

D3 Network: diameter • Diameter D is the longest geodesic distance in the graph. Diameter = 4 Diameter = 3 GEOGRAPHY OF NETWORKS

Recent acquisition Older acquisitions Original company D3 Network: number of components Data drawn from Cross, Borgatti & Parker 2001. GEOGRAPHY OF NETWORKS

D3 Cliques • Maximum complete subgraph • Cliques have at least three members GEOGRAPHY OF NETWORKS

E Data collection Primary network data: • Full network methods • Snowball methods • Ego networks Secondary network data: • e.g. patent data GEOGRAPHY OF NETWORKS

E Data collection:Full network methods • Collecting all links for all actors in your population • A census of the whole population rather than a sample • Required for most of the node and network properties Popular method: roster-recall methodology + different kinds of links among the same set of actors + collecting data on characteristics of the links - high response rate required - time- and labour intensive - only static network data GEOGRAPHY OF NETWORKS

E Data collection: missing data For conducting a proper Social Network Analysis it is extremely important to have complete network data: all linkages for all nodes in your network. Full network data:26 nodes Missing data:1 node out of 26 (node l) refused to cooperate with the survey GEOGRAPHY OF NETWORKS

E Data collection: The snowball method • In the snowball method- you start asking for the links of a focal actor- you continue asking for the links of mentioned actors- you go on until now new actors are added to your list • You will identify a full network- if there are no isolates in the network- if the network consists of one large componentIn most cases you cannot come to know in advance if these conditions are satisfied! GEOGRAPHY OF NETWORKS

E Data collection: Ego networks If it is unfeasible to collect full network data or use a snowball method, you can rely on ego networks. There are two possibilities: - Ego network without alter connections: you identify their direct links for a sample of your population - Ego network with alter connections: in addition you identify whether the direct neighbours (alters) of a node are connected among themselves. GEOGRAPHY OF NETWORKS

E Data collection: Ego networks Suppose you do a network research on a population of 26 actors. You decide to ask one third of the actors of the population randomly. a d g j m p s v y (blue in the right graph) GEOGRAPHY OF NETWORKS

E Data collection: Ego networks You will discover only part of the network: - only the direct alters of the nodes in your sample; - for these alters only the links to the sample actors. Hence, it is not allowed to:- calculate network properties on a network obtained by aggregating ego networks (density, average geodesic distance, diameter etc.);- calculate the degree for nodes which were not in the sample. Actually ego networks are not real network data. Degree has become a node characteristic that is comparable to normal statistical attribute data (that do not depend on the wider network structure). GEOGRAPHY OF NETWORKS

E Secondary (patent) data Inter-firm network • Co-patenting • Multiple applicant inventorship Inventor network • Co-inventing GEOGRAPHY OF NETWORKS

E Secondary (patent) data + Possibility to do longitudinal network analysis + Less time consuming - Only cooperative links that have led to a patent are detected - Patenting behaviour varies strongly across sectors and over time - Patenting behaviour is strongly related to firm size - Universities and research institutes are underrepresented in patent data GEOGRAPHY OF NETWORKS

Introduction to Social Network Analysis