Analyzing the Social Web

Analyzing the Social Web an introduction

Outline • Introduction • Network Structure and Measures • Social Information Filtering

1. Introduction • Social media has become the dominant method of using the Internet, &it has infiltrated &changed the way millions of people interact &communicate. • Social networking in particular has become extremely popular, with > 1 billion users on Facebook alone &billions more accounts across thousands of social networking sites online.

Understanding social networks—both those explicitly formed on social networking websites &those implicitly formed in many other types of social media—has taken on new importance in light of this astounding popularity. • Analysis of these social connections and interactions can help us understand who the important people are in a network, what roles a person plays, what subgroups of users are highly interconnected, how things like diseases or rumors will spread through a network, and how users participate.

Applications of these analyses • Organizations can prevent or control the spread of disease outbreaks. • Websites can support participation &contributions from many types of users. • Businesses can provide immediate assistance to customers who have problems or complaints. • Users can band together to better understand their communities &government or take collective action. • Content providers online can filter &sort information to show users the most relevant, interesting, &trusted content.

The methods for analyzing social networks have been around for decades or longer, but social media provides new challenges and opportunities. • Networks online are orders of magnitude larger than the networks analyzed in the past. • Often, networks are simply too big to be analyzed in their entirety. • A good social network analyst working with social media needs to know how to analyze the structure of networks, apply sociological principles to understand user behavior, &deal w/ size, scope, &application of networks.

Analyzing the social web • Classic social network analysis studies a network’s structure. • In a social network, a person is considered a ‘node’ or ‘vertex’, & relationship b/w people is a ‘link’ or ‘edge’. • When all the people and relationships are identified, there are many statistics that can provide insight into the network. • However, even before learning those statistics or anything about social network analysis, you can probably identify some important & interesting things in a network.

Tie Strength & Trust • Tie strength, which is the strength of the relationship b/w 2 people, &trust are 2relationship features that have great impact on what happens in a social network. • Furthermore, learning what role a person plays in a network by analyzing his or her behavior can link quantitative measures w/ qualitative analysis to help better understand what goes on in a social group.

With those analysis methods at hand, the next step is to use them to understand network phenomena. • One of the most important of these phenomena is propagation: How do things like information, diseases, or rumors spread in a network? • A combination of quantitative and qualitative features inform our understanding of propagation, &another set of analysis techniques is available to study the spread of things through networks.

Nodes, Edges, &Network Measures • The term ‘social network’ has entered common language &is understood to describe circles of friends, acquaintances, colleagues, &so on. • However, networks are well grounded in mathematics, and understanding how to represent, describe, and measure properties of networks will be the foundations of quantitative network analysis.

Representing networks • Adjacency lists

Adjacency matrix

2. Network Structure & Measures • Describing nodes and edges • Degree(undirected), • in-degree & out-degree(directed)

Centrality • Centrality is one of the core principles of network analysis. It measures how “central” a node is in the network. This is used as an estimate of its importance in the network. • However, depending on the application and point of view, what counts as “central” may vary depending on the context. Correspondingly, there are a number of ways to measure centrality of a node. • degree centrality, closeness centrality, betweennesscentrality, and eigenvector centrality

Degree centrality • Degree centrality is one of the easiest to calculate. The degree centrality of a node is simply its degree—the number of edges it has. The higher the degree, the more central the node is. This can be an effective measure, since many nodes with high degrees also have high centrality by other measures.

Closeness centrality • Closeness centrality indicates how close a node is to all other nodes in the network. It is calculated as the average of the shortest path length from the node to every other node in the network.

Betweenness centrality • Betweenness centrality measures how important a node is to the shortest paths through the network. • To compute betweenness for a node N, we select a pair of nodes &find all the shortest paths between those nodes. Then we compute the fraction of those shortest paths that include node N.

If there were 5 shortest paths between a pair of nodes, & 3 of them went through node N, then the fraction would be 3/5 = 0.6. • We repeat this process for every pair of nodes in the network. We then add up the fractions we computed, and this is the betweennesscentrality for node N.

Betweenness Centrality The betweenness centrality of A is zero, since no shortest paths between B, C, D, E, and F go through A. Betweenness centrality is one of the most frequently used centrality measures. It captures how important a node is in the flow of information from one part of the network to another.

Eigenvector centrality • Eigenvector centrality measures a node’s importance while giving consideration to the importance of its neighbors. • For example, a node with 300 relatively unpopular friends on Facebook would have lower eigenvector centrality than someone with 300 very popular friends (like Barak Obama).

It is sometimes used to measure a node’s influence in the network. It is determined by performing a matrix calculation to determine what is called the principal eigenvector using the adjacency matrix. • Not only is it used to determine influence in social networks, but a variant of eigenvector centrality is at the core of Google’s PageRankalgorithm, which they use to rank web pages.

The main principle is that links from important nodes (as measured by degree centrality) are worth more than links from unimportant nodes. • All nodes start off equal, but as the computation progresses, nodes with more edges start gaining importance. • Their importance propagates out to the nodes to which they are connected. • After re-computing many times, the values stabilize, resulting in the final values for eigenvector centrality.

3. Social Information Filtering

Social sharing and social filtering • One way to find useful info among all the links, news, videos, &photos posted each day is to rely on other people to find it for us. • Social sharing &social filtering use the interests of others, especially friends on social networks, to highlight info that is more likely to be of interest.

Social-sharing websites, like Digg, Slashdot, &reddit, are designed for people to share interesting content. • The community then votes items up or down, &the most interesting links are highlighted. The reliance on large #of people to help complete a task like this is a type of crowdsourcing. • From the “crowd” of people online, each contributes a tiny amount of work by sharing or voting on content, &the aggregate results are a valuable contribution.

Automated recommender systems • Recommender systems are major parts of e-commerce sites &social media sites. • We introduce the major types here &discuss how they take advantage of social patterns &connections to suggest items that users might like.

Even if the term recommender system is not a familiar one, nearly all Internet users will be familiar w/ them. These are the features of websites that suggest items a user might like. • Amazon.com uses to suggest other items a customer might want to buy.

Traditional recommender systems • Recommender systems basically work in one of two ways: • suggesting items similar to the ones a person likes (content-based approach) • suggesting items liked by people who are similar to the user. (collaborative filtering technique) • [… or hybrid]

Content-based approach

They might look at all the items that a user has rated &then look for items that are similar to the things the user likes. This is how Pandora, the online music streaming service, works.

A user starts with a song or artist, &Pandora creates a musical profile of it. Then, Pandora selects songs that are similar in profile &plays those.

Collaborative Filtering • Collaborative filtering looks at each pair of users, finds the items that both people have rated, & computes a similarity score for the 2people based on their ratings. • That similarity measure is then used to give similar people more say in how much the user might like a new item.

Consider this simple example of collaborative filtering. A user, Alice, has rated a set of movies. Two other users, Bob and Chuck, have also rated those movies. These are shown in Table 13.1.

Now assume Alice wants to know how much she might like the movie Vertigo, which she has never seen. Both Bob &Chuck have seen it. Bob rated it a 3 and Chuck rated it a 5. • What would be a good recommendation to Alice for how much she will like it?

One option is to show the average rating for the movie, which is a 4 in this case. However, that does not take into account that Chuck is more similar to Alice than Bob is. • A simple example of collaborative filtering will use the correlation #to compute a weighted average. • Bob and Chuck’s ratings will be multiplied by their correlation w/ Alice, and that total will be divided by the sum of the weights.

Examples of explicit data collection include the following: • Asking a user to rate an item on a sliding scale. • Asking a user to rank a collection of items from favorite to least favorite. • Presenting two items to a user and asking him/her to choose the better one of them. • Asking a user to create a list of items that he/she likes.

Examples of implicit data collection include the following: • Observing the items that a user views in an online store. • Analyzing item/user viewing times • Keeping a record of the items that a user purchases online. • Obtaining a list of items that a user has listened to or watched on his/her computer. • Analyzing the user's social network and discovering similar likes and dislikes

Analyzing the Social Web

Analyzing the Social Web

Presentation Transcript

social web

The Social Web

Analyzing Social Problems with the Public Policy Analyst

Web Science: AnalyZing the Web

Analyzing your web site

Building and Analyzing Social Networks Semantic Web and Social Networks

Analyzing Web Logs

The Social Web

Demystifying the Social Web

Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications

Analyzing Web Server Log Files

Social Web

Social Web

Analyzing Conversations of Web Services

Building and Analyzing Social Networks Semantic Web and Social Networks

Analyzing and Securing Social Networks

Analyzing Conversations of Web Services

Analyzing and Securing Social Networks Semantic Web and Social Networks