1 / 8

Peeking into massive Online Social Networks (aka “Walking on Facebook”)

Miniprojects. Peeking into massive Online Social Networks (aka “Walking on Facebook”). Maciej Kurant. Miniprojects. Essentially, measure. LastFM. www.last.fm/user/rj. LastFM. API www.last.fm/api. LastFM. http://ws.audioscrobbler.com/2.0/?method=user.getfriends &user= rj &limit=10

xandy
Download Presentation

Peeking into massive Online Social Networks (aka “Walking on Facebook”)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Miniprojects Peeking into massive Online Social Networks (aka “Walking on Facebook”) MaciejKurant

  2. Miniprojects Essentially, measure

  3. LastFM • www.last.fm/user/rj

  4. LastFM API • www.last.fm/api

  5. LastFM http://ws.audioscrobbler.com/2.0/?method=user.getfriends &user=rj &limit=10 &page=1 &api_key=1b4218629b50c1159e15a6b8285b90ba API

  6. LastFM http://ws.audioscrobbler.com/2.0/?method=user.getfriends &user=rj &limit=10 &page=1 &api_key=1b4218629b50c1159e15a6b8285b90ba API In Python import urllib2 import re api_key = '1b4218629b50c1159e15a6b8285b90ba' user = "rj" command = "http://ws.audioscrobbler.com/2.0/?method= user.getfriends&user="+user+"&limit=10&page=1&api_key="+api_key data = urllib2.urlopen(command).read() # XML format degree = int(re.search('total="(\d+)"', data).group(1)) friends = re.findall("<name>(.*)</name>", data) print degree # number of friends of "rj" print friends # first 10 friends (because page=1 and limit=10). For BFS, you need all friends. Set “limit=500” and pull multiple pages if necessary. For Random Walks, you will need only the degree and one neighbor. Set “limit=1” and 1)learn the degree, 2) select the index i of the neighbor, 3) Get the name by setting “page=i”.

  7. Surprises • Banned user (once reached, seem to have 0 friends) • Server not responding • Friendship graph not connected (solution: consider only the component connected to user 'rj'.) • Case-sensitiveness? (rj == RJ ??) • … • Your program has to deal with them! I N E K G D M B H L A C J F

  8. Miniprojects Data: LastFM, the component connected to user 'rj' 1) Random node Use MHRW of length L=50 to select a node uniformly at random from LastFM. Repeat it 100 times. Report the average degree of selected nodes, and of their neighbors. What changes if L counts only unique nodes in MHRW? Why? What happens if you use RW instead of MHRW? 2) RW vs RWRW Run RW in LastFM. What are the average <playcount>, <playlists>, <age>, <id>, and number of friends observed in the sample. How do they change after correcting for the degree bias (RWRW)? 3) Component size Based on RW, estimate the size of the component connected to user 'rj'. Use two approaches: [Katzir’11] and [Kurant’13?]. 4) BFS Collect a BFS sample starting from user 'rj' in LastFM. What node degrees, <playcount>, <playlists>, <age>, <id>, do you sample as you collect more nodes? How about implementing it on multiple threads? 5) Barbarian sampling Try to download the entirecomponent connected to user ‘rj’. You will probably need to use a cluster of machines, multiple threads, etc. Use your own API-key, please. Once you have it, report basic properties: size, average degree, degree distribution, etc (e.g., average <age>?). Compare with others.

More Related