Workshop on Complex Systems Research Initiative: An Introduction to Agent-Based Modelling

Workshop on Complex Systems Research Initiative: An Introduction to Agent-Based Modelling Edmund Chattoe-Brown (ecb18@le.ac.uk) Department of Sociology, University of Leicester, UK http://www.simian.ac.uk

Thanks This research funded by the Economic and Social Research Council of the UK (http://www.esrc.ac.uk) as part of the National Centre for Research Methods (http://www.ncrm.ac.uk). Thanks are due to Nigel Gilbert (SIMIAN Co Director) for the use of some training materials initially developed primarily by him. Thanks to you all for inviting me! The usual disclaimers applies. http://www.simian.ac.uk 2

Plan of the workshop • Mornings: Introductory lecture/discussion. • The rest: Discussion, questions. • Afternoon: Hands on, initially exploring existing models then (?) programming. • Generally: Your proposed research. http://www.simian.ac.uk

Plan for day 1 The role of research methods in shaping what we see. Examples of qualitative and quantitative research and the need for a “third way”. A very brief interlude on social versus physical science. A simple “running” example: The Schelling segregation model. (Microcosm.) What should we learn from this example? Key concepts: Emergence, non-linearity, complexity. The distinctive methodology of ABSS/MAM and data. http://www.simian.ac.uk 4

Opening Thoughts “I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.” (The Psychology of Science, Abraham Maslow) “When scientists and mathematicians fail to find positive clues leading towards solutions of their problems, they sometimes reverse their frontal strategies and employ reductio ad absurdum, which by a process of eliminating all the impossibles and improbables, leaves a residue of least absurd, ergo most plausible solutions, which may be reduced, by physically testing to unequivocable answers.” (Buckminster Fuller, foreword to Confessions of a Trivialist, p. ix) http://www.simian.ac.uk 5

Overall goals To introduce a novel method (MAM/ABSS) for understanding the social world using relevant examples. To distinguish it clearly from some existing methods and thus lay out a coherent research strategy arising from it. To introduce (and provide hands on experience for) a “typical” piece of software (NetLogo) for implementing that method. To offer a “vision” of the future in research of this kind. http://www.simian.ac.uk 6

What are we used to? I may well be talking to quite a diverse audience. I shall try not to assume too much. I’ll start with sociology and we can take it from there. I can’t always promise obvious relevance of examples but this isn’t just laziness! The two main methods of representing theory in sociology are narratives and equations. These are almost invariably associated with qualitative (ethnographic) and quantitative (statistical) analysis respectively. Other methods: Experiments/randomised control trials, history, analysis of artifacts/documents, monitoring … (Interesting!) http://www.simian.ac.uk 7

Example of narrative analysis “Turkish interviewees do not include themselves when they are evaluating the status of ‘Turkish women’ in general. While referring to ‘Turkish women’, most Turkish interviewees use the pronoun ‘they’: Turkish women are more home-oriented. I think that they are left in the backstage because they do not have education, because they are not given equal opportunities with men. (T3) One of the Turkish interviewees stated that it was difficult for her to answer the questions related to her status ‘as a woman’, because: I don’t think of myself as a Turkish women, but as a Turkish person. I mean I never think about what kind of role I have in the society as a woman. (T1) Most Norwegian interviewees, on the other hand, identify with ‘Norwegian women’ in general, and they refer to ‘Norwegian women’ as ‘we’: I think that in a way Norwegian women, that is we, at least have our rights on paper. We have equal rights for education and we have good welfare arrangements … (N1)” (Sümer, Acta Sociologica, 1998, 41(1), p. 122) http://www.simian.ac.uk 8

Narrative analysis pros and cons As rich as you want it to be. Crosses levels of analysis (self reports on decision making). Limited at some unknown and fuzzy barrier with psychology. Real dangers of subjectivity (should be “regulated” by the method though). The price of that richness is that incompleteness, ambiguity and inconsistency can exist within the narrative and be hard to spot. TANSTAAFL: Rich but “expensive” to collect and analyse, especially with observational data. Can it generalise? http://www.simian.ac.uk 9

Example of quantitative analysis “The most important empirical findings of this study can be summarized as follows: … there is a moderate tendency for individuals with higher service class origins to be more likely than others to enrol in PhD programmes. … The estimated effect of class drops to zero when controlling for parents’ education and employment in research or higher education. The overall implication of these findings is that the transition from graduate to doctoral studies is influenced by social origins to a considerable degree. Thus, the notion that such effects disappear at transitions at higher educational levels - due either to changes over the life course or to differential social selection - is not supported.” (Mastekaasa, Acta Sociologica, 2006, 49(4), pp. 448-449.) http://www.simian.ac.uk 10

Quantitative analysis pros and cons Can’t be too rich to “solve” or “fit”. Mostly completely explicit (though some methodological background may be tacit i. e. assumptions about distributions of data) thus avoiding ambiguity, incompleteness and inconsistency. Can it particularise? Hits data collection and analysis problem of “atomisation”: “50 cases per variable” rule of thumb in simple regression. http://www.simian.ac.uk 11

Aside: No theory, no data, no logo Example: Educational success. Girls and boys go through a school system, get grades/qualifications and reach different levels. They may start biologically different, be socialised differently, form different peer groups, be selected differently into schools or subjects, be treated differently by teachers, develop different interests and motivations, be offered different resources, choose differently and so on. All these processes unfold in parallel, in diverse combinations for diverse individuals. http://www.simian.ac.uk 12

What does that mean for methods? If individuals are unique, we can all give up (but there are reasons not to be so pessimistic). We often disagree (fruitlessly?) on where social regularities lie: Attributes versus practices. Clearly gender is associated with educational success through all these processes but the notion of causality is much harder to apply. Why would there be “big” patterns to find? Ethnography can subject tiny parts of sequences to detailed examination (and practices should generalise) but cannot look at the whole. http://www.simian.ac.uk 13

Stepping back: Levels of description A micro level, where individual action occurs in an “environment”. A macro level (environment), which shapes and is shaped by the micro level. The eminent sociologist James S. Coleman argues that in order to explain properly, a theory must link one level by a process description to another. (Mechanism/middle range sociology.) There are grounds for arguing that, although they may appear (or claim) to, neither statistical nor ethnographic accounts actually do this. http://www.simian.ac.uk 14

Aside: Physical and social systems Physical systems cannot give accounts of themselves nor respond adaptively to their “environment”. They “follow” the same “laws of nature” that we try to deduce from them. (Atoms in gas.) Regularities in social systems cannot be of this kind because of reflection and adaptation. The unique (but fuzzy edged) domain of social action arises from the almost unique ability of humans to make rich models of their world (including social science models). Marx? http://www.simian.ac.uk 15

Cashing this out: Segregation model Agents live on a square grid so each has maximum 8 neighbours. There are two “types” of agents (red and green) and some grid spaces are vacant. Initially agents/vacancies distributed randomly. All agents decide what to do in the same very simple way. Each agent has a preferred proportion (PP) of neighbours of its own kind (0.5 PP means you want at least half your neighbours to be your own kind - but you would accept all of them i. e. PP is minimum.) If an agent is in a position that satisfies its PP then it does nothing otherwise it moves to a vacancy chosen at random. A time period is defined (arbitrarily) as the time it takes for each agent (chosen in random order to avoid non robust patterns) to “take a turn at” deciding and possibly moving. http://www.simian.ac.uk 16

Marker I’m going to show you exactly how the computer does this before too long. In a nutshell, the description amounts to: Create the world. Do some things to each agent and repeat. http://www.simian.ac.uk 17

Initial random state http://www.simian.ac.uk 18

Clustering Aside: This is a NetLogo “world window”. http://www.simian.ac.uk 19

Two questions What is the smallest PP (i. e. a number between 0 and 1) that will produce clusters? What happens when the PP is 1? http://www.simian.ac.uk 20

Answers About 0.3. No clusters form. Revisit 1: Had you “seen” the cluster data generated by PP=0.3, might you (if of a particular political or sociological persuasion) have attributed xenophobia to the system? Reflection: Is PP=0.1 behaviourally indistinguishable in cross section from PP=1? Problem? http://www.simian.ac.uk 21

Why and so what? Because PP is a minimum, people are always happy “inside” a cluster of their own kind. If a cluster is “full” (no internal vacancies) then it cannot be disrupted. Whether clusters form depends on whether their shape is compatible with the PP for each “edge agent”. (No “sharp corners” possible: Minimum size?) When PP is 1, no shape of the cluster edge is compatible with the satisfaction of edge agents so the cluster cannot form. An aggregate entity (the cluster) thus becomes a structuring principle for individuals. http://www.simian.ac.uk 22

Simple individuals/complex system Counter-intuitive macro (social) results from simple micro interactions. A non-linear (and complex) system. http://www.simian.ac.uk 23

A vision: To be revisited/expanded Simulation is a “macroscope” (or “complexoscope”) because it allows us to “see” complexity in a way that is similar to the way that a microscope allows us to see very small things. The explicit process specification (that should mirror real social processes) shows us why existing methods have difficulty linking micro and macro levels. The “process” in a statistical model is just the equation system linking variables. In qualitative research there may be no such process. (The reasons why are interesting and puzzling.) http://www.simian.ac.uk 24

Connection 1: Data and methods This is a patently unrealistic model: Identical decisions, random movement, no housing market, no schools or jobs to attend to. (I chose it deliberately!) How, broadly, would it be made more realistic? Using qualitative methods to study neighbourhoods, perceptions and decision processes. Using quantitative methods to compare (in some sense) the simulated clusters with some real ones. Does this look anything like residential patterns by ethnicity in Toronto? How like? (I’ll return to this.) Existing research methods are used in ways that are clearly different but certainly not unrecognisable. http://www.simian.ac.uk 25

Connection 2: Explanation It is the simulation that links the interplay of situated micro processes (choosing agents with neighbours) with macroscopic patterns (clusters). A social theory is thus neither represented as a narrative or set of equations but as a computer programme. (Coleman is happy!) The rigour of quantitative research is retained (complete specification) but the behaviour only needs to be “generated” not “solved” or “fitted” so can be of arbitrary sophistication. (I’ll show this.) If we can “generate” something then we have explained it. (Methodology hazard!) http://www.simian.ac.uk 26

Connection 3: Complexity concepts Complexity: “Rich” patterns (here, non-linearity for example) do not need to come from “rich” agents or “rich” interactions. They can arise from simple interactions between simple agents. World view? Emergence: The need to use categories at one level of description that do not make sense at another. (You cannot have a one agent cluster or a one car traffic jam.) Non-linearity: We cannot assume things we often do assume (large effects imply large causes, similar effects have “close” causes). http://www.simian.ac.uk 27

Informal thoughts on methodology These will be made more rigorous later. Generally, don’t use MAM/ABSS to “explain” a straight line. The idea of “over fitting” (and Occam’s Razor) applies but we can’t formalise it as we can in statistics. We need to worry about “how many” simulations can match a given real system. This is our “leap of faith”. We discover this by general experience (clustering, Power Law, S-shaped innovation curve) and address it by “bar raising” and choice of research question/model. Some of these issues arise not from weaknesses in the methodology itself but from the fact it is still being established. (Equating poor methodology and poor practice is a defence mechanism against novelty.) http://www.simian.ac.uk 28

Similarity in the Schelling model A two (three?) state system. Hollow versus full clusters, direct red/green interfaces versus vacancy buffers. (Vacancy chains idea.) Exact match to Toronto? Cluster sizes of correct distribution (but no location stability across runs?) Cluster “shapes” correct? “There are clusters”: Actually pretty weak. Now consider 3 types: Separated versus concentric clusters. The latter is much more discriminating. Or, what is internal structure of clusters with regard to PP? (Most tolerant at edges?) Naïve (but useful?) notion: Ratio of possible world states to states compatible with your theory as measure of “power”. http://www.simian.ac.uk 29

Richness in the Schelling model Emphasis (so far) on spatial pattern. What about “biography” or “history” of agents? What are effects of in and out migration to produce a dynamic rather than static equilibrium? (Convergence as an “artefact” or a finding?) What are the distributions of any heterogeneous parameters (PP for example) with respect to clusters? Very loose idea: Can we “fit” on some comparisons of real and simulated data and then “explain” on others? (Hazard warning: We don’t know how orthogonal different “aspects” - like biographies and clusters - are.) http://www.simian.ac.uk 30

A speculation Some research methods must be “radical innovations” (rather than just “more of the same”). If MAM/ABSS is such an RI, what follows? Humility needed! Possible evidence of MAM/ABSS as an RI is its ability/requirement to reuse existing data and draw attention to novel data previously ignored. But this casts doubt on the “origins” of the Schelling model: “If I were you, I wouldn’t start from here at all”. http://www.simian.ac.uk 31

How to start with MAM/ABSS? Think of it just like research design. What (one sentence?) are we trying to do/explain? How does phosphorous “move” around Lake Simcoe? How best can we make it do something different? Why did we pick this method? (In some sense reasons already given but need to defend against claims of existing methods i. e. don’t “explain a straight line”.) What is known? (TANSTAAFL again: Not just in all the relevant domains but in MAM/ABSS too!) http://www.simian.ac.uk 32

Why do all this work? Read a few articles at random. Make a set of weakly grounded assumptions. Build a model: Throw in a few more invisible assumptions so it can’t be replicated. Play with the model, get “results” and publish in an enclave simulation journal. Defend your arbitrary assumptions to the death against others with equally arbitrary ones. Avoid collecting data to decide. Be ignored by domain experts. Ignore them. Wait for MAM/ABSS to become a footnote in social science history. http://www.simian.ac.uk 33

Plan for day 2 Going from informal methodology to formal. How to turn these general guidelines into a plan for a research project. What relevant parts of NetLogo do we need to know about and (broadly) how do they work? http://www.simian.ac.uk 34

Developing the vision 1 MAM/ABSS is new which offers huge opportunities for innovation and originality. I can “offer you” whole social science disciplines with barely any models. The price we pay is that we cannot yet fall back on a widely agreed “normal science”. We have to raise our own standards “from within” without wrecking the community. We have to “try harder” to convince the rest of the world. We have to manage the “us or them” boundary especially carefully. You may decide (quite reasonably) that you want to “come back later”. http://www.simian.ac.uk 35

Developing the vision 2 It can be done: We are looking for “win win” ideas. Social networks have an enormous number of potential characterisations courtesy of existing Social Network Analysis. If you can “generate” simulated networks that look like real networks according to many of these (potentially orthogonal) characterisations, you are really on to something. First win: Tools (Which n measures of social networks make the most effective index for similarity?) Second win: Perspective (What can we learn if we treat existing SNA data as a sample rather than a bunch of “ethnographically unique” case studies?) http://www.simian.ac.uk 36

Formalising the methodology 1 The “Gilbert and Troitzsch Box” http://www.simian.ac.uk 37

Formalising the methodology 2 Choice of target: Clear research question. Avoid TOE: Theory of Everything. (Geographer example from Borges.) Choice of target: Research question, theory (or theories), process in unknown environment, model. Process of abstraction: Start from key “stylised facts” in domain. (Class example. Citation test.) Process of abstraction: Not all abstractions are equally “harmful”. (“The assumptions you don’t realise you are making are the ones that will do you in”. Compare existing methods? More later.) http://www.simian.ac.uk 38

Formalising the methodology 3 Similarity: Already raised. How high can you go? Transparency and replication? Do it yourself like Darwin? (Commenting code.) Identification of novel data requirements may reintroduce the really strong falsification test that is so appealing in physical science (Einstein and Mercury perihelion, position of Pluto conditional on theory of gravitation being true). Can’t do this with statistics because model fitting requires all data “up front”. More on methodology later as needed. http://www.simian.ac.uk 39

MAM/ABSS abstraction example A return to the Turkish/Norwegian women. Is it significant that I sometimes say “we” and sometimes “they?” Perhaps groups behave in certain ways and I either wish to behave in that way or in some other. Suppose there are a number of “roles” that prescribe actions in different social settings. I may choose a role (self interest?) but will I be accepted in it? (White rastafarians.) It depends how I behave. Maybe the role I am “put in” most often shapes how I see the world and how it satisfies me. (Role strain? Roles are two sided.) A dynamic between behaviours, roles and interests? How does it unfold? Do roles mutate? Reality check: what do we need to know here? Very broadly, how people behave, how they think they ought to behave and how they feel about it. “Killer” app? Women and work? http://www.simian.ac.uk 40

Back to no theory, no data, no logo Phosphorous and Lake Simcoe: A blanket apology in advance. Surprising how often we are “following stuff” around a system whether the stuff is dirty syringe needles, phosphorous or gazelles. Goal is water phosphorous levels not much above the natural “carrying capacity”, a huge reduction over a short period. “Instant attention” of policy makers? What else are people doing with this? (NHS epidemics example.) Where does the phosphorous come from, how does it move or “stick” and what removes it from the area of study? Set of “phosphorous actors” and “P actions”: Golf courses and farmers fertilisation, waste water run off from residential areas, sewage works, manufacturing, other. Levels on rivers and open water are the key measurement points. Abstract by not modelling dog walkers … Padding? Exogenous processes: Air pollution from other regions, outflow from the study region, natural “leaching” from some patches perhaps? An “accounting” approach based on overview of existing knowledge? http://www.simian.ac.uk 41

NOTNODNOL 2 Physical processes: Can phosphorous be absorbed or naturally converted at some locations up to some level? How does it behave in ponds and lakes? Does it “coat” patches? (Relatively easy to “split” raindrops?) What “can” we do? “PhosLok?”, taxes and subsidies, dredging/scrubbing, prohibitions and enforcement, “giving up” on some rivers and making them “sewers”, relocation, drains and changes to wastewater management. Out of the box thinking (Perhaps motivated by the simulation itself: What if we could move this lake?) and collecting the union of suggestions from stakeholders and feeding them back iteratively. Back to Buckminster Fuller: Are there solutions that appear to cost impossibly much or are simply unacceptable to all but one stakeholder? Do some proposed solutions simply appear not to work? Interesting question: How much does it matter if the model is “wrong” in comparing the relative costs of different strategies? How “social” a model is this? Do we need to model how the local community forms advice networks or shops for groceries or just how they allocate crops to fields and decide when to feed their lawns? (This is why having a clear research question matters: Does this move phosphorous?) http://www.simian.ac.uk 42

Getting started We now have a pretty good NOTNODNOL blue print for our literature review (and phosphorous is a pretty good search term!) We also have some notion of what kind of team “leaders” we might need (hydrologist/chemist, some sort of social scientist/community studies person and modeller). Models as common language. We are looking for physical models, problem regions, management strategies, relevant social science on behaviour change in particular groups. (Don’t close in too soon though: What other water run off product problems are there?) http://www.simian.ac.uk 43

The world Patches and attributes: Altitude, water held, surface water on patch to flow away, even cloud saturation above? (Don’t dismiss “kludges”.) Rules of “transfer”: Water downhill, surface water by patch permeability, surface water by patch surface, clouds by (exogenous?) wind direction. I don’t know how much of this is “known” or how existing models “transate” to this level of abstraction. I know some atmosphere models do do this! Some aspects (water flow) are likely to be good approximations (pooling) at low cost. http://www.simian.ac.uk 44

Relevant NetLogo Earth Sciences (Grand Canyon). file-open “realplacedata.txt” (This file is just a list of altitudes extracted from other data. Not an NL issue.) let patch-elevations file-read file-close Note: The patch-elevations variable comes directly from creation of elevation variable in patches-own. NL does the mapping for you. See also how this programme makes buttons for “tracking” raindrops work. http://www.simian.ac.uk 45

Aside Back to the “us and them problem”. Except with policy makers/funders “in charge” (who have to be handled with tact), it is not enough to say that a phenomenon exists to require a model redesign. This is “death by detail”. There must be data and reasonable grounds (perhaps from other studies) for thinking that the effect “matters”. Clear research designs are also defensible. This is far from trivial. (Example of SNA and large scale survey data.) http://www.simian.ac.uk 46

What about “brains?” Schelling agents had decision processes based on observation but they didn’t have “memories” or “practices” to draw on in alternative situations. Mostly, agent brains are represented as sets of “if then” rules, partly for interpretability and partly for data access. (Other possibilities exist if needed like “learning systems”.) Like most programming languages NL has “data structures”. For example, lists representing the x, y co-ordinates of my “required” daily activities. Example: Social Science (El Farol). http://www.simian.ac.uk 47

Doing things to lists set foo (list (random 10) (random 10) 7 2) set foo (list (list 0 0 0) (list 1 1 1)) set foo but-first foo [Also but-last: Past behaviours being forgotten.] if empty? foo [ do-thing ] set foo filter [? < 3] [1 2 4 5 6 8 2] set fput 2 [3 4 5] set bar (item 2 [2 3 4 5]) (Note, starts from 0.) set foo (replace-item 2 [2 3 4] 15) Look at NetLogo Dictionary in Help. http://www.simian.ac.uk 48

Aside Strings are mixtures looking rather like lists but can include words, numbers and punctuation. A nifty trick (like LISP) is to use read-from-string to “execute” strings as NL code. So, for example, suppose you want an agent to act by if … then … rules. If you put these in procedures they are “hard coded” for each run but what you actually want is for agents to be able to change their set of practices (borrowing from others or deleting failed rules) then store them all as a string (or probably actually a string of strings) and then execute them one at a time in each situation. http://www.simian.ac.uk 49

Communication Once agents have “brains”, communication and imitation fall out very naturally. Examples: Reputation in the Prisoner’s Dilemma, the Gilbert and Troitzsch “shopping agents”. Warning! Don’t let your model develop feature creep. This is not a model of how we diffuse better practices in communities. We only want to know what happens if we change the distribution of behaviours. Is a farmer just a “ghostly presence” floating over a farm? http://www.simian.ac.uk 50

Workshop on Complex Systems Research Initiative: An Introduction to Agent-Based Modelling

Workshop on Complex Systems Research Initiative: An Introduction to Agent-Based Modelling

Presentation Transcript

Java Workshop (Part 2)

Distributed Object-Based Systems

Introduction to Environmental Management Systems

Detecting topological patterns in complex networks

Distributed Object-Based Systems

CS 328 Database Systems Chapter 8: SQL

Training Workshop for Field Staff

Regional Workshop

Introduction

Machine Learning and ILP for Multi-Agent Systems

Kaseya Fundamentals Workshop

Preparing for the future : the role of mathematical modelling

EPA’s Initiative for Web-based RMP Submissions

Distributed Object-Based Systems

Agent-Oriented Evolutionary Control Systems

Introduction to Rule-Based Reasoning

Process-based modelling of vegetations and uncertainty quantification

Models of Human Performance

Data-Based Modelling for Control

More “normal” than Normal: Scaling distributions in complex systems

Machine Learning and ILP for Multi-Agent Systems