The 10th HKBU-CSD Postgraduate Research Symposium Social Knowledge Dynamics:A Case Study on Modeling Wikipedia Presenter: Benyun Shi Supervisor: Prof. Jiming Liu Department of Computer Science Hong Kong Baptist University September, 2009
Outline • Wikipedia and Social Knowledge Dynamics • Previous Work on Wikipedia • Degree distribution • Reciprocity and feedback loops • Motifs • Modeling Wikipedia’s Growth • A model about reference • A model about degree distribution • AOC-based Models • Conclusion
Wikipedia • Anyone can create, edit, as well as delete; • Some properties: • Each article can be treated as a collective “knowledge”of a group of users; • Users can exchange “knowledge” through “talk” page; • Users with similar “knowledge” may form communities; • The underlying structure of some article may inversely influence users “knowledge”;
Social knowledge dynamics Culture dynamics Social Dynamics Language dynamics Crowed behaviors … … Social Knowledge Dynamics “Knowledge is embodied in people gathered in communities and networks. The road to knowledge is via people, conversations, connections and relationships. Knowledge surfaces through dialog, all knowledge is socially mediated and access to knowledge is by connecting to people that know or know who to contact.” -- Denham Grey • Social dynamics: • A society of individuals to react to inner and/or outer changes; • Global patterns can emerge from even simple individuals; • phase transitions, catastrophe, etc.
Difficulties and Motivations • Two levels of difficulty to discover global emergence by local dynamic models: • The definition of sensible and realistic microscopic models; (intact data is needed) • The usual problem of inferring the macroscopic phenomena out of the microscopic dynamic models; • Motivations of studying Wikipedia • The formation of Wikipedia is a kind of social knowledge dynamics; (if treat articles as knowledge) • Intact data for download; • Articles, categories, images and multimedia, talk pages, redirect and broken links, and so on.
Related Analysis on Wikipedia • Treat Wikipedia as complex networks, where the articles represent the nodes, and hyperlinks represent links. Degree distribution Reciprocity and feedback loops Motifs
Degree distribution • Degree: measure the number of articles that link into or out of • Meanings of degree: • Two articles sharing a link reflect some kind of relations in term of their contents; • Articles with high degree are more likely to be common knowledge;
Observations: Scale-free The out-degree distribution of Japan Wikipedia. (adopted from Fig. 3 in ref.) The in-degree distribution of Japan Wikipedia. (adopted from Fig. 3 in ref.) Reference  V. Zlatic, M. Bozicevic, H. Stefancic, and M. Domazet, “Wikipedias: Collaborative Web-based Encyclopedias as Complex Networks”, Physical Review E 74, 016615, 2006.
Scale-free and Phase Transition “The theory of phase transitions told us loud and clear that the road from disorder to order is maintained by the powerful forces of self-organization and is paved by power laws. It told us that power laws are the patent signatures of self-organization in complex systems….” --Barabasi AL. 2002. Linked: The new science of networks. Cambridge: Perseus Publishing. Similar results can be observed from Wikipedia with other languages. What are the fundamental principle behind the similar type of growth? – Preferential Attachment?
Reciprocity and Feedback Loops • Reciprocal links arejust the links pointing from the node i to the node j forwhich exists a link pointing from node j to the node i. Reciprocity qualifies mutual “exchange” between two articles. • Feedback loops:A loop with directed links that start from and end with the same node. The density of the links
Feedback Loops in Ecological System The ecological studyobserved that the number of feedback loops in thespecies network is correlated with system lifetime. State before crash Normal State Reference:  R. Mehrotra, V. Soni, and S. Jain. Diversity sustains anevolving network. Journal of the Royal Society Interface,6(38):793–799, 2009.
Feedback loops Triadic subgraphs Motifs • Motifs are small subgraphs of networks, which areused to systematically study similarity in the local structureof networks. Questions: Do Wikipedia with different languages share same functions? Is the formation of social knowledge driven by the same fundamental function? Reference:  R. Milo, S. Itzkovitz, N. Kashtan, R. Levitt, S. Shen-Orr,I. Ayzenshtat, M. Sheffer, and U. Alon. Superfamilies ofevolved and designed networks. Science, 303(5663):1538–1542, 2004.
Modeling Reference Growth At each time step t, A number of entries and rt references are added; The references are distributed among all entries following a probability Frequency distribution of the expected and actual number of references added each month to each article (adopted from Fig. 3b in ). The expected number of references added to entry i at time t is Reference:  D. Spinellis and P. Louridas. The collaborative organizationof knowledge. Communications of the ACM, 51(8):68–73, 2008.
Modeling about Degree Distribution • The model consists of two steps: • A new node t attaches to a network with m outgoing links. The probability that the given link will attach itself to some node s is proportional to the in-degree ki(s) of the node s. • Every new link with the probability r, a new reciprocal link is formed between node s and t. Comparison of in-degree distribution. Chosen parameters are t = 94094, m = 16.75, r=0.18. (adopted from ) Reference:  Vinko et al. Model of wikipedia growth basedon information exchange via reciprocal arcs. Physics andSociety, 2009.
Insufficiency (1) • The above two models seems to reflect the preferential attachment as a principlebehind scale-free phenomena • However, other researchers also show that selective removal  can also formed the scale-free distribution. • The models for scale-free canbe divided into two groups: • Scale-free as the result of anoptimization or phase transition process • Scale-free as the results of a growth model, such as preferentialattachment. Reference:  M. Salathé, Robert M May, and S. Bonhoeffer, “The Evolution of Network Topology by Selective Removal”, Journal of Royal Society, Interface, 2(5): 533–536, 2005.
Insufficiency (2) • The above two models are based on simple stochastic processes • we should realize that the realWikipedia is drivenby the social dynamics, including user-user interactions,use-group interactions, and group-group interactions, ratherthan the simple stochastic processes.
AOC-based Models • Components of Autonomy-Oriented Computing • Entities; • Interactions; • Behavioral rules; • Self-organizations • Collective regulations; • Aggregations; Wikipedia Users; Interact for a page; Behaviors; Self-organized groups; Feedbacks; Relationships Behaviors Used to solve large-scale dynamically-evolving, and/or highly distributed computational problems. Reference:  M. Salathé, Robert M May, and S. Bonhoeffer, “The Evolution of Network Topology by Selective Removal”, Journal of Royal Society, Interface, 2(5): 533–536, 2005.
Questions • What are the fundamental behavioral rules (e.g., explicit/implicit optimization objectives)of entities to form global patterns of Wikipedia? • How doentities self-organize themselves during the evolution ofWikipedia? • Do these rules and self-organization reflect theformation rule of social knowledge and social organization?
Three Possible Directions-1 • Wikipedia as a system • As a collaborative system based solely on users’spontaneous actions, what’s the driven of its birth, boom,and death? • Existing results on ecosystems: • Large randomly assembled ecosystems tend to be less stableas they increase in complexity, • the complexityis measured by the connectance and the average interactionstrength between species. • Thetypical lifetime of the system increase with the diversity ofits components.
Three Possible Directions-2 • Topic evolution onWikipedia • We can treat the topic evolution onWikipedia as a results of user-to-user interactions, or eventhe interaction among groups of users. (Like cultural dynamics) • Existing work: • Static data mining; (Time windows for dynamic data mining) • Semantic/content analysis; (What is the driven force?)
Three Possible Directions-3 • User community dynamics onWikipedia • Each user may associate with multiple articles; • For each article, there will be multiple users actingon it; • Communities may emerge from entities localinteractions, which may change over time; • Existing work • Modularity • The linkage-based measurementcannot reflect multiple relationships
Three Levels of Consideration • Describing the structure • Such as food webs in ecosystems, neural networks in organisms, etc. • How the structure influence what happens in the system • Such as the food-web structure affects the dynamics of population of species; • How the structure change over time • Species going extinct will influence the food-web structure
Conclusion • The relation of Wikipedia and social knowledge; (Motivations) • The current studies on Wikipedia and their insufficiency; • The possibility of adopting AOC-based modeling; • Three research directions;