Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio Nucci Northwestern University Narus Inc. Measuring Serendipity: Connecting People, Locations and Interests in a Mobile 3G Network http://networks.cs.northwestern.edu http://www.narus.com
Online Social Networks • Social network websites among the most popular websites on the Internet
Mobage Town • Japan based mobile social network • 11 million users • Allows users to: • Send messages, chat in communities, exchange music, read pocket novels, write blogs, play games etc.
Loopt • Allows contacts to visualize one another’s location using mobile phones and share information • Available for Sprint, Verizon, At&t, T-Mobile on devices such as BlackBerry, iPhone and gPhone
Other Location Based Services • Sharing your location with friends (BuddyBeacon –for iPhone) • Location based searches (EarthComber) • Notifications about places and events around you (LightPole) • Tagging locations (Metosphere)
Research Questions How likely are we to meet in our daily lives people who share common interests in the cyber domain? What is the relationship between mobility properties, location, and application affiliation in the cyber domain? 3,162,818 packet data sessions generated by 281,394 clients in 1196 locations (Base Stations) across a large metropolitan area
Extracting Human Movement Base Station 1 1. Intra-session movement 2. Inter-session movement Base Station 2 RADA Start (contains BSID) RADA Update (contains BSID) RADA Stop (contains BSID) RADIUS Server Note that we have only a sampled view of human movement. How well can we do?
Extracting Human Movement Despite sampled observations we still do a good job at understanding user movement. The ordering of the curves accounts for the larger time span which can accommodate larger travel distances Most human movement is over short distances.
Extracting Application Interest Social networking website Music download website Dating website http://www.singlesnet.com http://www.facebook.com http://www.mp3.com Keyword based URL mining
Rule Mining Location A Rule support: Number of people present at A Rule confidence: Number of people that move from A to B Rule confidence probability: confidence/support Location B (A, B, w, δ) W δ
Rule Statistics Increase in number of active users at commute hours (8AM and 5PM) Movement rules are more active during day time, also less active during weekend Total confidence of rules
Location Rank – Application Accesses Music downloads – anti-correlation with mobility span Mail – correlation with mobility span Social netw. – dominates the medium mobility range
Location Ranking Comfort zone 3 All users spend most of their time in their top 3 locations
Location Rank – Application Accesses Comfort zone Music downloads, Dating, Trading heavily accessed in the comfort zone Social netw. News and Mail tend to be accessed outside too Note that Dating is accessed more in the Comfort Zone
Hotspots • Via rule mining we detect highly active locations • We identify 4 types of such locations • Noon hotspots – 28 such locations • Highly active during Noon hours • Night hotspots – 62 such locations • Highly active during night hours • Day-office hotspots – 23 such locations • Highly active during day hours • Evening hotspots – 8 such locations • Highly active during the evening
Biased Application Access at Hotspots Normalized user affiliation Despite similar userbase at hotspots during the seven day interval, application accesses are highly skewed towards certain applications. Application accesses hotspots
Application Access - Time of Day Application accesses non hotspots However the bias in application access is not entirely due to an illusive “time of day” effect ! Application accesses non hotspot times
Regional analysis – Spectral Clustering Using spectral clustering we: Cluster locations as belonging to regions Cluster users as belonging to regions Spectral clustering doesn’t make any assumptions on the shape of the clusters(opposed to k-means)
Regional Analysis – Research issues • Two relevant issues for location based services: • Time independent interactions(useful for tagging services) – part of user trajectories overlap irrespective of the time of the movement • Time dependent interactions – same location same time • Questions: • How many distinct people with the same interests do we meet? • Strongly dependent on userbase (probability to meet people higher in clusters with bigger userbase) • How often do we meet people?
Time Independent Interactions Cluster 1 has a higher number of interactions per location mainly because of larger hotspot density 27/162 (Cluster 1)> 26/257 (Cluster 4) for night hotspots
Who Will Win the Interaction Race? Mobile users clearly win the interaction race However it pays off to spend time in popular locations
Conclusions • First study at such a large scale aimed at correlating mobility, location, and application usage • Provided new insights from user perspective, location perspective, and provider perspective that shows the enormous location based service potential