1 / 42

Characterizing Instant Messaging Traffic in an Enterprise Network

2006 autumn intern presentation. Characterizing Instant Messaging Traffic in an Enterprise Network. Lei Guo, the Ohio State University Mentor: Zhen Xiao, manager: John Tracey. Instant messaging. Peak online users. Skype: 7 M QQ: 20 M. Quick response User presence service

xuxa
Download Presentation

Characterizing Instant Messaging Traffic in an Enterprise Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2006 autumn intern presentation Characterizing Instant Messaging Traffic in an Enterprise Network Lei Guo, the Ohio State University Mentor: Zhen Xiao, manager: John Tracey

  2. Instant messaging Peak online users Skype: 7 M QQ: 20 M • Quick response • User presence service • Interactive communication • Multitasking • Private chat • Enterprise cooperation AIM: 53 M users MSN: 29 M users Jabber: 13.5 M users SameTime:15M users

  3. Challenges of IM measurements • No large scale measurement study on IM traffic characterization so far • No server logs • In contrast to Web and streaming media servers • Difficulty of online packet analysis • User privacy concerns

  4. Our Objective and Methodology • First large scale IM traffic measurement • IM system design and optimization • Experimental basis for IM workload generation • Security in IM network • Online IM traffic parser with the protection of user privacy related information • Packet level workloads of AIM and MSN Messenger (by port number) • Packet headers of Yahoo and GTalk/Jabber (by port number) • Nearly one month in a large enterprise network with thousands of employees • More than 20,000 user conversations by 469 AIM users and 408 MSN users

  5. IM Sniffer • MSNP • AIM protocol • Classic: OSCAR • Triton: new, N/A 10% AIM traffic pcap library Network interface OS kernel Ethernet packets IM packet 1 IP packets Online packet reconstructor IM packet 2 AIM packet parser MSN packet parser Protect user privacy information Dump data lguo@us.ibm.com: hello, how are you doing MD5 hash Offline analysis pcap format file 4d347c1b: e51c49a1043fc

  6. Instant messaging in AIM BOS server BOS server • Authentication • Redirection • User-to-user chat • Multi-user chat • P2P communication Email server Authentication server Buddy icon server P2P voice/video chat, file transferring … Other services Chat room server

  7. Instant messaging in MSN Messenger Notification server Notification server Dispatch server Email server … P2P voice/video chat, file transferring Other services MSN passport server Switchboard server

  8. Outline • Overview of IM traffic • Online activity of IM users • Characterizing IM servers • Analysis of IM traffic • Conclusion

  9. Overview of IM traffic Traffic volume # of packets with TCP payload MB x106 For most IM systems, the traffic volume a client receives from IM servers is much greater than that it sends.

  10. IM servers in our workloads Total # of server IPs collected Cum. # of server IPs collected over time The number of IM servers is very large

  11. IM TCP connections Number of TCP requests Failed TCP requests (%) The percentage of failed TCP requests is non-trivial

  12. IM traffic rate IM traffic rate (sampled per minute) 8.9 Kbps in average IM traffic rate (sampled per hour) IM traffic is highly bursty: a lot of spikes

  13. IM traffic rate Hourly traffic rate of AIM Hourly traffic rate of MSN Each spike is due to a very limited number of TCP connections (typically one or two) -- due to voice/video chat and file transferring

  14. IM traffic rate Hourly traffic rate of Yahoo Hourly traffic rate of GTalk GTalk traffic rate has clear diurnal and weekly pattern, due to the less use of voice/video chat and file transfers

  15. Summary of IM traffic overview • The traffic volume a client receives from IM servers is much greater than that it sends (Yahoo is an exception) • A large number of servers are used for IM services • The failure ratio of IM TCP connections is non-trivial • IM traffic is highly bursty due to voice/video chat and file transfers

  16. Outline • Overview of IM traffic • Online activity of IM users • Characterizing IM servers • Analysis of IM traffic • Conclusion

  17. Online session and chat conversation: AIM • Online session duration • Login time to logout/disconnect time • Duration of TCP connection to BOS server • Conversation • All messages are forwarded by the BOS server • Interleaved in a TCP connection together • 5-minute threshold for msg inter-arrival time to identify a conversation B BOS server A > 5min AB2 AB1 AC1 > 5min conversations C

  18. Online session and chat conversation: MSN • Online session duration • Login time to logout/disconnect time • Duration of TCP connection to notification server • Conversation • Each conversation is forwarded by a new switchboard server • Disconnect automatically if idle > 5min • Removing conversations without chat messages Switchboard server Notification server

  19. Online activity of AIM users Number of online users 120 users Number of simultaneous chat conversations 12 chat conversations Clear diurnal and weekly patterns peak time about 2:00 PM # of chat conversations << # of online users

  20. Online activity of MSN users Number of online users 90 users Number of simultaneous chat conversations 14 chat conversations Clear diurnal and weekly patterns peak time about 2:00 PM (lunch break) # of chat conversations << # of online users

  21. Number of conversations per user • Users are idle in most time • Few users chatting simultaneously with two buddies average: 0.058 A I M average: 0.075 M S N

  22. Distribution of user online duration • Weibull distribution has been reported by a P2P study (IMC 2006) • Cumulative probability distribution: P (X > x) = exp[-(x/x0)c] • log(–log P) = log[(x/x0)c] = c log x – c log x0  straight line: not well fit AIM MSN

  23. Online duration of IM user sessions CCDF • Two-mode distribution • 10 hours – the divide between long online durations and short online durations CDF

  24. Online activity of AIM users Login events Logout events Peak time: about 9:00 AM Peak time: about 5:00 PM

  25. Online activity of MSN users Login events Logout events Peak time: about 9:00 AM Peak time: about 5:00 PM

  26. The 10-hour divide of online duration Login events Logout events A I M Online time roughly 10 hours: some employees working longer than 8 hours 10 hours M S N Online time longer than 10 hours: users do not turn off computer when leaving work 10 hours

  27. Number of online days AIM MSN • Not a heavy-tailed distribution, show user activity in another perspective • Inactive users: online occasionally • Active users: online every weekday • Random users: online sporadically MSN

  28. Summary of user online activity • Number of online users and simultaneous chat conversations have clearly diurnal and weekly patterns • Users are idle in most online time: # of chat conversations << # of online users • User online duration does not follow Weibull distribution • Most user sessions: login and logout events are highly related with working hours • Long duration user sessions (> 10 hours): users do not turn off computer when they leave work • Two-mode online duration distribution • Users can classified into three categories based on their online days: actively online, inactively online, and sporadically online

  29. Outline • Overview of IM traffic • Online activity of IM users • Characterizing IM servers • Analysis of IM traffic • Conclusion

  30. Characterizing IM servers Server response time measurement CRT SRT RTT IM client sniffer IM server Purpose: A first step to understanding the server load from client side CRT: client perceived response time SRT: server response time of MSNP commands RTT: packet round trip time (get from TCP handshake)

  31. MSN server response time Dispatch server Notification server Switchboard server • Response time for the first MSNP command of a TCP connection • RTT is still accurate • Reflects the server load • Some commands are responded with a long latency

  32. Outline • Overview of IM traffic • Online activity of IM users • Characterizing IM servers • Analysis of IM traffic • Conclusion

  33. Message level analysis of IM traffic AIM MSN • Inbound traffic >> outbound traffic • # of msgs: chat < hint < presence (AIM hint msg is small because OSCAR is binary based) • MSN has more bin msgs for user icons, voice/video chats

  34. Size of chat messages CDF (semi-log scale) CCDF (log-log scale) • AIM: messages are in html format (not extracted online) • MSN: format is described in message header and easy to remove • MSN: 90% messages are smaller than 50 bytes < 50 bytes

  35. Number of messages in a user conversation CDF (semi-log scale) CCDF (Weibull scale) • Most conversations have small number of messages • The number of msg in a conversation • Not power law • Follows Weibull distribution approximately 90% 25 40

  36. Number of messages in a user conversation Weibull fitting results AIM MSN

  37. Number of conversations by a user CDF (semi-log scale) CCDF (Weibull scale) • Most users have small number of conversations • Number of user conversations • Not power law • Follows Weibull distribution approximately

  38. Distribution of MSN conversation duration < 200 sec Most conversations are short MSN client will disconnect to the SB server after a long idle time

  39. IM social network: number of users contacted Rank (log-log scale) CCDF (Weibull scale) • Users in buddy list • Contact list packets may be lost or cannot completely parsed by IM sniffer • Users chat with • IM spammers • MSN: Weibull, AIM: a little rough

  40. Number of buddies an IM user chats with MSN AIM • A user only contacts with a small portion of of buddies in its contact list • MSN users are more active? • Not sure, we do not count AIM Triton users A user chat with 5.5 buddies (about 25%) in average A user chat with 1.9 buddies (about 7%) in average

  41. Concluding remarks • IM sniffer and measurement • Packet level • User privacy protection • IM traffic characterization • Diurnal and weekly patterns of IM traffic • The traffic volume a client receives is much greater than it sends • Chat msgs only account for a small percentage of total msgs • Online activity of IM users • Messages in conversations: Weibull • Conversations of users: Weibull • Social network: Weibull roughly

  42. Future work • Implement IM sniffer in Linux kernel • For heavy workload collection • Larger scale measurement in Cornell University • Larger user population, dominated by students • Collect SameTime workload on the server side • Understand IM servers better • How IM is used in work cooperation: a global map of IM user social network

More Related