slide1
Download
Skip this Video
Download Presentation
Lada Adamic, HP Labs, Palo Alto, CA

Loading in 2 Seconds...

play fullscreen
1 / 63

Implicit Structure and Dynamics of BlogSpace - PowerPoint PPT Presentation


  • 263 Views
  • Uploaded on

Information dynamics in the networked world. Lada Adamic, HP Labs, Palo Alto, CA. Talk outline. Information flow through blogs. Information flow through email. Search through email networks. Search within the enterprise. Search in an online community. Blog use:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Implicit Structure and Dynamics of BlogSpace' - mike_john


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Information dynamics in the networked world

Lada Adamic, HP Labs, Palo Alto, CA

slide2

Talk outline

Information flow through blogs

Information flow through email

Search through email networks

Search within the enterprise

Search in an online community

implicit structure and dynamics of blogspace eytan adar li zhang lada adamic rajan lukose
Blog use:

Record real-world and virtual experiences

Note and discuss things “seen” on the net

Blog structure: blog-to-blog linking

Use + Structure

Great to track “memes” (catchy ideas)

Implicit Structure and Dynamics of BlogSpaceEytan Adar, Li Zhang, Lada Adamic, & Rajan Lukose
approaches and uses of blog analysis
Patterns of information flow

How does the popularity of a topic evolve over time?

Who is getting information from whom?

Ranking algorithms that take advantage of transmission patterns

Approaches and uses of blog analysis
tracking popularity over time

Slashdot Effect

BoingBoing Effect

Tracking popularity over time

Popularity

Time

Blogdex, BlogPulse, etc. track the most popular links/phrases of the day

different kinds of information have different popularity profiles
Different kinds of information have differentpopularity profiles

1

Major-news site (editorial content) – back of the paper

Products, etc.

Slashdotpostings

Front-pagenews

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

5

10

15

5

10

15

5

10

15

5

10

15

% of hits received on each day since first appearance

microscale dynamics
What do we need track specific info ‘epidemics’?

Timings

Underlying network

b2

b3

Microscale Dynamics

b1

t0

Time of infection

t1

microscale dynamics9
Challenges

Root may be unknown

Multiple possible paths

Uncrawled space, alternate media (email, voice)

No links

b2

b3

Microscale Dynamics

bn

b1

?

?

t0

Time of infection

t1

microscale dynamics who is getting info from whom
Explicit blog to blog links (easy)

Via links are even better

Implicit/Inferred transfer (harder)

Use ML algorithm for link inference problem

Support Vector Machine (SVM)

Logistic Regression

What we can use

Full text

Blogs in common

Links in common

History of infection

Microscale Dynamics who is getting info from whom
visualization
Zoomgraph tool

Using GraphViz (by AT&T) layouts

Simple algorithm

If single, explicit link exists, draw it

Otherwise use ML algorithm

Pick the most likely explicit link

Pick the most likely possible link

Tool lets you zoom around space, control threshold, link types, etc.

Visualization

http://www-idl.hpl.hp.com/blogstuff

slide12

Giant Microbes epidemic visualization

via link

inferred link

blog

explicit link

irank
Find early sources of good information

using inferred information paths or timing

iRank

b1

True source

b2

Popular site

b3

b4

b5

bn

irank algorithm
iRank Algorithm
  • Draw a weighted edge for all pairs of blogs that cite the same URL
  • higher weight for mentions closer together
  • run PageRank
  • control for ‘spam’

t0

Time of infection

t1

do bloggers kill kittens
02:00 AM Friday Mar. 05, 2004 PSTWired publishes:

"Warning: Blogs Can Be Infectious.”

7:25 AM Friday Mar. 05, 2004 PSTSlashdot posts:

"Bloggers\' Plagiarism Scientifically Proven"

9:55 AM Friday Mar. 05, 2004 PSTMetafilter announces

"A good amount of bloggers are outright thieves."

Do Bloggers Kill Kittens?
slide16

Information flow in social groups

Fang Wu, Bernardo Huberman, Lada Adamic, Joshua Tyler

slide17

Spread of disease is affected

by the underlying network

co-worker

mom

college

friend

co-worker

mike

co-worker

slide18

Spread of computer viruses

is affected by the

underlying network

co-worker

mom

college

friend

co-worker

mike

co-worker

slide19

Difference between information flow and disease/virus spread

Viruses (computer and otherwise) are shared

indiscriminately (involuntarily)

Information is passed selectively from one host to another based on knowledge of the recipient’s interests

slide20

Spread of information is affected

by its content, potential recipients,

and network topology

co-worker

mom

college

friend

co-worker

mike

co-worker

slide21

homophily: individuals with like interests associate with one another

personal homepages at Stanford

distance between personal homepages

slide22

m=2

m=0

m=1

The Model:

Decay in transmission probability as a function of the

distance m between potential target and originating node

T(m) = (m+1)-b T

power-law implies slowest decay

slide23

Virus, information transmission on a scale free network

P(k)

outdegree k

Degree distribution of all senders of email passing through the HP email server

slide24

Wu et al. (2004)

Newman (2002)

Pastor-Satorras

& Vespignani (2001)

epidemics on scale free graphs

106 nodes, epidemic if 1% (104) infected

1

k

¥

b

=

,

=0

0.8

k

b

=100,

=0

k

b

=100,

=1

0.6

critical threshold

0.4

0.2

0

1

1.5

2

2.5

3

3.5

4

a

slide25

Study of the spread of URLs and attachments

40 participants (30 within HPL, 10 elsewhere in HP & other orgs)

6370 URLs and 3401 attachments crypotgraphically hashed

Question: How many recipients in our sample did each item reach?

caveats:

messages are deleted (still, the median number of messages > 2000)

non-uniform sample

slide26

forwarded

message

forwarded URLs

Only forwarded messages are counted

slide27

4

10

email attachments

-4.1

x

URLs

-3.6

3

x

10

2

number of items with so many recipients

10

1

10

0

10

0

1

10

10

number of recipients

short term expense

control

Results

average = 1.1 for attachments, and 1.2 for URLs

ads at the

bottom of

hotmail &

yahoo

messages

slide28

Simulate transmission on email log

each message has a probability p of transmitting information from an infected individual to the recipient

02/19/2003 15:45:33 I-1 I-2

02/19/2003 15:45:33 I-1 I-3

02/19/2003 15:45:40 E-1 I-4

02/19/2003 15:45:52 I-5 E-2

02/19/2003 15:45:55 E-3 I-6

02/19/2003 15:45:58 I-7 I-8

02/19/2003 15:46:00 E-4 I-9

02/19/2003 15:46:05 I-10 I-11

02/19/2003 15:46:10 I-12 I-13

02/19/2003 15:46:10 I-12 I-14

02/19/2003 15:46:10 I-12 I-15

02/19/2003 15:46:14 I-16 E-5

. . . .

. . . .

internal

node

external

node

slide29

Simulation of information transmission on

the actual HP Labs email graph

an individual is infected if they receive a particular piece

of information

individuals remain infected for 24 hours

start by infecting one individual at random

every time an infected individual sends an email they have

a probability p of infecting the recipient

track epidemic over the course of a week, most run their

course in 1-2 days

slide30

distance 1

distance 1

Introduce a decay in the transmission probability

based on the hierarchical distance

hAB = 5

distance 2

distance 2

B

A

slide32

Conclusions on info flow in social groups

Information spread typically does not reach epidemic proportions

Information is passed on to individuals with matching properties

The likelihood that properties match decreases with distance

from the source

Model gives a finite threshold

Results are consistent with observed URL & attachment frequencies

in a sample

Simulations following real email patterns also consistent

slide33

MA

NE

How to search in a small world

Milgram’s experiment:

Given a target individual and a particular property, pass the

message to a person you correspond with who is “closest” to the

target.

slide34

Small world experiment at Columbia

Dodds, Muhamad, Watts, Science 301, (2003)

email experiement conducted in 2002

18 targets in 13 different countries

24,163 message chains

384 reached their targets

average path length 4.0

slide35

Why study small world phenomena?

Curiosity:

Why is the world small?

How are people able to route messages?

Social Networking as a Business:

Friendster, Orkut, MySpace

LinkedIn, Spoke, VisiblePath

slide36

Six degrees of separation - to be expected

Pool and Kochen (1978) - average person has 500-1500 acquaintances

Ignoring clustering, other redundancy …

~ 103 first neighbors, 106 second neighbors, 109 third neighbors

But networks are clustered:

my friends’ friends tend to be my friends

Watts & Strogatz (1998) - a few random links in an otherwise clustered graph give an average shortest path close to that of a random graph

slide37

But how are people are able to find short paths?

How to choose among hundreds of acquaintances?

Strategy:

Simple greedy algorithm - each participant chooses correspondent

who is closest to target with respect to the given property

Models

geography

Kleinberg (2000)

hierarchical groups

Watts, Dodds, Newman (2001), Kleinberg(2001)

high degree nodes

Adamic, Puniyani, Lukose, Huberman (2001), Newman(2003)

slide38

Spatial search

Kleinberg (2000)

“The geographic movement of the [message] from Nebraska to

Massachusetts is striking. There is a progressive closing in on the target

area as each new person is added to the chain”

S.Milgram ‘The small world problem’, Psychology Today 1,61,1967

nodes are placed on a lattice and

connect to nearest neighbors

additional links placed with f(d)~ d(u,v)-r

if r = 2, can search in polylog (< (logN)2) time

slide39

Kleinberg: searching hierarchical structures

‘Small-World Phenomena and the Dynamics of Information’, NIPS 14, 2001

Hierarchical network models:

h is the distance between two individuals in hierarchy

with branching b

f(h) ~ b-ah

If a = 1, can search in O(log n) steps

Group structure models:

q = size of smallest group that two individuals belong to

f(q) ~ q-a

If a = 1, can achieve in O(log n) steps

slide40

Identity and search in social networks

Watts, Dodds, Newman (2001)

individuals belong to hierarchically nested groups

multiple independent hierarchies coexist

pij ~ exp(-a x)

slide41

Identity and search in social networks

Watts, Dodds, Newman (2001)

There is an attrition rate r

Network is ‘searchable’ if a fraction q of messages reach the target

N=102400

N=204800

N=409600

slide42

High degree search

Adamic et al. Phys. Rev. E, 64 46135 (2001)

Mary

Who could

introduce me to

Richard Gere?

Bob

Jane

slide43

67

63

54

1

power-law graph

number of

nodes found

94

6

2

slide44

19

15

11

7

3

1

Poisson graph

number of

nodes found

93

slide45

3

10

2

10

1

10

0

10

1

2

3

4

5

10

10

10

10

10

Scaling of search time with size of graph

Sharp cutoff at k~N1/a , 2nd degree neighbors

random walk

a

= 0.37 fit

degree sequence

a

=0.24 fit

covertime for half the nodes

size of graph

slide46

Testing the models on social networks

(w/Eytan Adar)

Use a well defined network:

HP Labs email correspondence over 3.5 months

Edges are between individuals who sent

at least 6 email messages each way

Node properties specified:

degree

geographical location

position in organizational hierarchy

Can greedy strategies work?

slide47

Strategy 1: High degree search

Degree distribution of all senders of email passing through the HP email server

outdegree

slide48

Filtered network

(6 messages sent each way)

Degree distribution no longer power-law, but Poisson

450 users

median degree = 10

mean degree = 13

average shortest

path = 3

High degree search

performance (poor):

median # steps = 16

mean =40

slide49

Strategy 2:

Geography

slide50

Communication across corporate geography

1U

1L

87 % of the

4000 links are

between individuals

on the same floor

4U

3U

2U

2L

3L

slide51

optimum for search

Cubicle distance vs. probability of being linked

slide56

distance 1

distance 1

Example of search path

distance 2

distance 1

hierarchical distance = 5

search path distance = 4

slide57

Probability of linking vs. distance in hierarchy

in the ‘searchable’ regime: 0 < a < 2 (Watts 2001)

slide60

optimum for

search (Kleinberg 2001)

Group size and probability of linking

group size g

slide61

Search Conclusions

  • Individuals associate on different levels into groups.
  • Group structure facilitates decentralized search using social ties.
  • HP Labs as a social network is searchable but not quite optimal.
  • searching using the organizational hierarchy is faster
  • than using physical location
  • A fraction of ‘important’ individuals are easily findable
  • Humans may be much more resourceful in executing search tasks:
  • making use of weak ties
      • using more sophisticated strategies
slide62

PeopleFinder2 – a search engine for HP people

Extract & disambiguate names from publicly available documents

Enrich information available about individuals

Search for them by topic

Identify knowledge communities from co-occurrence of names

Live Demo

If live demo fails:

Current PeopleFinder functionality

PeopleFinder2 info on a person

Extracted topics for a person

Social network

Social network visualization

Search for individuals by topic

Visualize knowledge network

Find social network paths to experts

slide63

To find out more:

(papers, slides, other research in the group)

Information dynamics group (IDL) at HP Labs:

http://www.hpl.hp.com/research/idl

List of publications

http://www.hpl.hp.com/personal/Lada_Adamic/research.html

ad