1 / 77

Flow Processes and the Structural Importance of Nodes

Flow Processes and the Structural Importance of Nodes. Mohamed Atta. Steve Borgatti Boston College. Data courtesy of Valdis Krebs. Attacking Terrorist Nets. Find and eliminate structurally important nodes and lines bridges, cut-points; minimum weight cutsets measures of centrality

maida
Download Presentation

Flow Processes and the Structural Importance of Nodes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Flow Processes and the Structural Importance of Nodes Mohamed Atta Steve BorgattiBoston College Data courtesy of Valdis Krebs

  2. Attacking Terrorist Nets • Find and eliminate structurally important nodes and lines • bridges, cut-points; minimum weight cutsets • measures of centrality • closeness, betweenness, eigenvector, etc.

  3. Terrorist Network

  4. Usman Bandukra Terrorist Network Djamal Beghal Essid Sami Ben Khemais Mohamed Atta Mamoun Darkazanli Nawaf Alhazmi Raed Hijazi Data courtesy of Valdis Krebs

  5. Many Problems • Data not good enough • Mostly known after an event • Sensitive to error • Benefits are short-term at best • Must address recruitment, training • it is precisely those organizations that make heavy use of suicide bombers that are organized as networks

  6. Was al Qaeda incapacitated by removal of 19 hijackers? Dead Alive? Data courtesy of Valdis Krebs

  7. One Additional Problem • Centrality measures make certain assumptions about how things flow • and may produce poor estimates when misapplied • need to work that out before deciding which node to remove

  8. Objective • Enumerate kinds of flow processes • Analyze properties • Relate to structural importance of nodes • Relate to existing measures of centrality

  9. Gift process Currency process Transport process Postal process Gossip process E-mail process Infection process Influence process Types of Flow Processes (several others)

  10. Gift Process • Canonical example: • passing along used paperback novel • Single object in only one place at a time • Doesn’t travel between same pair twice • Could be received by the same person twice • A--B--C--B--D--E--B--F--C ...

  11. Currency Process • Canonical example: • specific dollar bill moving through the economy • Single object in only one place at a time • Can travel between same pair more than once • A--B--C--B--C--D--E--B--C--B--C ...

  12. Gossip Process • Example: • juicy story moving through informal network • Multiple copies exist simultaneously • Person tells only one person at a time* • Doesn’t travel between same pair twice • Can reach same person multiple times * More generally, they tell a very limited number at a time.

  13. E-Mail Process • Example: • forwarded jokes and virus warnings • e-mail viruses themselves • Multiple copies exist simultaneously • All (or many) connected nodes told simultaneously (except the immediate source?)

  14. Influence Process • Example: • attitude formation • Multiple “copies” exist simultaneously • Multiple simultaneous transmission, even between the same pairs of nodes

  15. Infection Process • Example: • virus which activates effective immunological response • Multiple copies may exist simultaneously • Cannot revisit a node • A--B--C--E--D--F...

  16. Postal Process • Example: • package delivered by postal service • Single object at only one place at one time • Map of network enables the intelligent object to select only the shortest paths to all destinations

  17. Uncovering Flow Properties • Take componential analysis approach • identify a set of flow processes • compare and contrast to discover minumum set of attributes (properties) that distinguish them from each other • view each distinct flow process as unique bundle of properties -- typology

  18. Properties of Flow Processes • Sequence type: path, trail, walk • path: can’t revisit node nor edge (tie) • trail: can revisit node but not edges • walk: can revisit edges & nodes • Deterministic vs non-deterministic • blind vs guided • always chooses best route; aware of map • Combine into 4-way “pattern” property: • geodesics, paths, trails, walks

  19. Properties -- cont. • Duplication vs transfer (copy vs move) • transfer/move: only one place at one time • duplication/copy: multiple copies exist • Serial vs parallel duplication • serial: only one transmission at a time • parallel: broadcast to all surrounding nodes • Combine into “method” 3-way property: • parallel dup., serial dup., transfer

  20. Simplified Typology goods information

  21. So What? • The properties of a flow process (together w/ node position) determine which nodes are structurally important • a node that is important in one process is not important in another • off-the-shelf centrality measures implicitly assume certain flow properties and are only interpretable for certain flow processes (ala Friedkin)

  22. T L M P X Q S Closeness Centrality • A node’s centrality is sum of geodesic distances to all others. • Length of shortest paths • Is index of expected time until arrival of that-which-flows for consistent processes: • non-deterministic (e.g., postal) • parallel duplication (e.g., e-mail, nameserver)

  23. Calculating Closeness Centrality How long does a token take to reach a node?

  24. T L M P X Q S Betweenness Centrality • Count no. of geodesic paths from each node to every other node that pass through X • if there is more than one geo-desic from S to T, count the prop-ortion that pass through X • Interpret as • how often node utilized by others • potential for control & synthesis

  25. Betweenness Flow Processes • Consistent processes • postal process • Nearly consistent • parallel processes (all routes at same time) • but ... needn’t choose between geodesics • Implication • better for modeling transportation of goods than information

  26. Calculating Betweenness Centrality How often does a token pass through a node?

  27. Row sumsof kAk 1A1 2A2 3A3 A ΣkAk + + + + = ... Eigenvector Centrality • Eigenvector of adjacency matrix • in effect, counts number of walks of all lengths emanating from node, weighted inversely by length • Interpreted as popularity or being in the thick of things • Assumes flow can return to same nodes & lines k

  28. “Cross-Platform” Centrality • How far off are these centrality measures when used with wrong flow process? • How can we correctly measure closeness and betweenness concepts in different flow contexts? • Simulation modeling

  29. Realized Centrality • Essence of closeness is the expected time until arrival of fluenda • realized closeness is an empirical measurement of the avg time until arrival • Freeman closeness is an estimator of this • model-based formula that should correspond to actual longterm values if the model fits • Betweenness is expected number of times a fluendum passes through node

  30. Simulation Procedure (for deterministic flow processes) • For each of 10,000 trials* ... • For each node, • let token originate at the node & propagate according to flow process rules until it can go no further • record which nodes are visited along way and # of units of time needed to arrive at each node for first time • Cumulate realized closeness and realized betweenness *NOTE: Parallel processes only require 1 trial -- no randomness

  31. Simulation Procedure (for non-deterministic processes) • For each of 10,000 trials ... • For each ordered pair of (source,target) nodes • let token originate at source node & propagate according to flow process rules until it either reaches target node or can go no further • record which nodes are visited and # of units of time needed to arrive at each node for first time • Cumulate realized closeness and realized betweenness

  32. Alternative Methods • Can use non-deterministic procedure on all processes, for comparability to Freeman betweenness • numerical results quite different • but larger conclusions are the same • But, logically, not sensible • Freeman’s dyadic method presupposes source & target • i.e., non-deterministic process

  33. Empirical Results • Compare realized closeness & betweenness with Freeman measures across different flow processes • Dataset is known ties among terrorists compiled by Valdis Krebs • Start with betweenness

  34. Betweenness in Postal Proc. (all the rest are zeros on both measures)

  35. Sequential duplication across trails: rumors Scores standardized to =0, =1 ranks scores Betweenness / Gossip Process

  36. Betweenness / Gossip

  37. Betweenness in Gossip Proc. Under-estimated by betweenness centrality Over-estimated by betweenness centrality Over-estimated by betweenness centrality Freeman measure is zero when contacts are connected Token rarely gets to 46, so its realized betweenness cannot be as high as the Freeman measure estimates Data courtesy of Valdis Krebs

  38. Path redundancy Individual performance Type of flow Blind vs Guided Flows • Nodes embedded in dense regions are more important in blind processes than in nondeterministic processes. • It is in blind processes that we see bottling-up phenom. that Granovetter alludes to

  39. Physical transfer along trails: used paperback Scores standardized to =0, =1 Betweenness in Gift Process

  40. Betweenness / Gift

  41. Sequential duplication across trails: rumors Scores* standardized to =0, =1 Correlation is high -- much better than betweenness corr ranks scores Closeness in Gossip Process

  42. Closeness / Gossip

  43. Closeness in Gossip Process Under-estimated by closeness centrality Colors based on average arrival times Over-estimated by closeness centrality Data courtesy of Valdis Krebs In gossip process, token gets bottled up by dense regions, takes long time to escape to other groups. Hard for blind process to find way out.

  44. Closeness in Currency Process

  45. Lack of Symmetry • In many processes, avg distance to node does not equal distance from the node • even though network is symmetrical • People who can reach others in few steps are NOT the same as people who can be reached by others in few steps • Freeman closeness uncorrelated w/ former

  46. Asymmetry Due to Degree Variance To “Distance” Matrix From

  47. Lack of Computability • Closeness in Gift Process • Gift gets stuck in cul-de-sac, resulting in infinite time/distance • Can’t compute expected time til arrival

  48. Correlations Among Centralities

  49. MDS of Correlations Among Centrality Scores

  50. Summary • Variety of flow processes • Distinguished by a system of properties • Key properties include • blind / guided • copy / move • serial / parallel • path / trail / walk

More Related