Reliable, Scalable and Interoperable Internet Telephony

Reliable, Scalable and Interoperable Internet Telephony First practice talk for thesis defense by Kundan Singh Advisor: Henning Schulzrinne Computer Science Department, Columbia University, New York March 28, 2006

Outline of the presentation • Introduction • What is the problem? Why important? • My contributions • Server redundancy • Load sharing and failover in SIP telephony • Comparison of thread models for SIP server • Peer-to-peer (P2P) • SIP servers using external P2P network • Additionally, P2P maintenance using SIP • Enterprise IP telephony • Multi-platform collaboration using SIP • Scalable centralized conferencing architecture • Interworking between SIP/SDP and H.323 • Conclusion 46 slides

database (SCP)10 million customers 2 million lookups/hour database (SCP)for freephone, calling card, … local telephone switch(class 5 switch)10,000 customers 20,000 calls/hour signaling router (STP)1 million customers 1.5 million calls/hour signaling network (SS7) signaling router(STP) regional telephone switch(class 4 switch)100,000 customers 150,000 calls/hour Telephone reliability & scalability (PSTN: Public Switched Telephone Network) “bearer” network telephone switch(SSP) Switches strive for 99.999% availability. Lucent’s 5E-XC supports 4 million BHCA.

REGISTER INVITE INVITE DNS Internet telephony(SIP: Session Initiation Protocol) alice@yahoo.com yahoo.com example.com bob@example.com 129.1.2.3 192.1.2.4 DB

What are the problems? • Can SIP server provide carrier-grade reliability and scalability using commodity hardware? • What affects the server performance? • How can we build a server-less self-organizing peer-to-peer VoIP network? • Can this be done in standards compliant way? • Can communication be extended to multi-platform collaboration using existing protocols? • How well multi-party conferencing scales? • How to interoperate between SIP and H.323?

My contributions • Server redundancy • Implemented failover using database replication • Two-stage architecture for SIP load sharing • Comparison of thread models for SIP server • Peer-to-peer (P2P) • SIP servers using external P2P network • Additionally, P2P maintenance using SIP • Enterprise IP telephony • Multi-platform collaboration using SIP • Scalable centralized conferencing architecture • Interworking between SIP/SDP and H.323

Outline of the presentation • Introduction • What is the problem? Why important? • My contributions • Server redundancy • Load sharing and failover in SIP telephony • Comparison of thread models for SIP server • Peer-to-peer (P2P) • SIP servers using external P2P network • Additionally, P2P maintenance using SIP • Enterprise IP telephony • Multi-platform collaboration using SIP • Scalable centralized conferencing architecture • Interworking between SIP/SDP and H.323 • Conclusion

INVITE REGISTER INVITE REGISTER INVITE REGISTER Replicate registration or search on call Server redundancyThe problem: failure or overload

High availabilityFailover implementation in our test-bed - CINEMA MySQL: No locking protocol between master and slave. Race if insert into D1 and remove from D2 Web scripts Web scripts D1 D2 Master/ slave Slave/ master replication P1 P2 sipd has in-memory cache: REGISTER refresh much before expiry; web gets delayed data; not an issue for cisco phones phone.cs.columbia.edu sip2.cs.columbia.edu REGISTER _sip._udp SRV 0 0 5060 phone.cs.columbia.edu SRV 1 0 5060 sip2.cs.columbia.edu proxy1 = phone.cs backup = sip2.cs INVITE to P2 either on ICMP error or after 10 s

System reliability (1-(1-R)2) Call setup latency TR (1-R) P[tM<TD] where TD is DNS TTL, tM is time-to-repair, and P[tM<TD] = 1 – e-TD/TM User unavailability None (refresh; double register) For first time registration, probability that the server goes down before replication is: 1 – e-(Td/+Tc)/TF where TF is mean-time-to-failure Redundant servers Tradeoff: reliability vs capacity High availabilityAnalysis Master/ slave Slave/ master D2 D1 DNS Caller P2 P1 TR Callee D2 P2 P1 D1 Tc Td Tc A Tr TR A Tc

REGISTER INVITE ScalabilityLoad sharing: redundant proxies and databases • REGISTER • Write to D1 & D2 • INVITE • Read from D1 or D2 • Database write/ synchronization traffic becomes bottleneck P1 D1 P2 D2 P3

ScalabilityLoad sharing: divide the user space • Proxy and database on the same host • Hashing • Static vs dynamic • Stateless proxy can become overloaded • Use many P1 D1 a-h P2 D2 i-q P3 D3 r-z

=((tr/D)+1)TN =(A/D) + B =((tr+1)/D)TN =(A/D) + (B/D) ScalabilityComparison of the two designs P1 P1 a-h D1 D1 P2 P2 i-q D2 D2 P3 P3 D2 r-z Total time per DB System reliability =(1-(1-Rp)P).(1-(1-Rd)D) =R0.(RP)D.(Rd)D D = number of database servers N = number of writes (REGISTER) r = #reads/#writes = (INV+REG)/REG T = write latency t = read latency/write latency P = number of proxy servers Rp = reliability of the proxy server Rd = reliability of the database server Low Scalability High Reliability High Scalability Low Reliability

Master Slave Master Slave Scalability (and reliability)Two stage architecture I stage II stage a*@example.com a.example.com _sip._udp SRV 0 0 a1.example.com SRV 1 0 a2.example.com a1 s1 a2 sip:bob@example.com s2 sip:bob@b.example.com b*@example.com b.example.com _sip._udp SRV 0 0 b1.example.com SRV 1 0 b2.example.com s3 b1 b2 ex example.com _sip._udp SRV 0 40 s1.example.com SRV 0 40 s2.example.com SRV 0 20 s3.example.com SRV 1 0 ex.backup.com Capacity = f(#stateless, #groups)

Load SharingPerformance result (UDP, stateless, no DNS, no mempool) This means 10 million BHCA (busy hour call attempts) using S3P3. I(s) II(p) calls/s 3 3 2800 2 3 2100 2 2 1800 1 2 1050 0 1 900 Statefulproxy gave similar graphs with650 CPSfor single server. Line segmentsdue to non-uniform distribution in II stage; I have verifieduniform distributionalso. Regitration test also gave similar graphs with about2400 RPS(no auth). This means 10 million subscribers using S3P3. On commodity hardware: 3 GHz, Pentium 4, 1 GB memory

Match transaction Modify response stateful Stateless proxy Response sendto, send or sendmsg recvfrom or accept/recv Found Update DB parse Redirect/reject REGISTER Match transaction Build response Request Lookup DB other Stateless proxy Proxy Modify Request DNS Server performanceWhat happens inside a server? What thread/event models possible? • Pure event-based (one thread) • Thread-per-msg or transaction • Pool-thread per msg • Two stage thread pool • Process pool (Blocking) I/O Critical section (lock) Critical section (r/w lock)

Server performanceResults of my measurements; effect of multi-processor Both Pentium and Sparc took approx 2 MHz CPU cycles per call/s on single-processor Calls/s for stateless proxy, UDP, no DNS, 6 msg/call Better performance as this includes mempool changes Calls/s for stateful proxy, UDP, no DNS, 8 msg/call Software architecture further improves performance: S3P3 can support 16 million BHCA

Not much concurrency in stateful mode: needs more investigation

C C P P S C C P P C P Problem with servers • Server-based • Cost: maintenance, configuration • Central points of failures, catastrophic failures • Controlled infrastructure (e.g., DNS) • Peer-to-peer • Robust: no central dependency • Self organizing, no configuration • Scalability

We built: P2P-SIP • Unlike proprietary Skype architecture • Robust and efficient lookup using DHT • Interoperability • DHT algorithm uses SIP communication • Hybrid architecture • Lookup in SIP+P2P • Inter-domain P2P-SIP • Unlike file-sharing applications • Data storage, caching, delay, reliability • Disadvantages • Lookup delay and security

SIP-using-P2P Replace SIP location service by a P2P protocol P2P-over-SIP Additionally, implement P2P using SIP messaging How to combine SIP + P2P? P2P network REGISTER INVITE alice FIND P2P-SIP overlay INSERT Alice 128.59.19.194 INVITE sip:alice@128.59.19.194 Alice 128.59.19.194

Deployment scenarios? P P P P P P P P P P P2P proxies P2P clients Zero-conf server farm; Trusted servers and user identities Plug and play; May use adaptors; Untrusted peers; Super-nodes Interoperate among these!

Data model Treat DHT as database Service model Join DHT to provide service SIP-using-P2PUsing an External P2P network (distributed hash table - DHT) [5] bob 192.1.2.3 bob [3] [1] [2] [1] [3] DHT DHT Service node (128.3.4.5) [4] [2] [5] alice alice [1] join(128.3.4.5) [2] lookup(H(bob)) gives 128.3.4.5 [3] REGISTER sip:bob to 128.3.4.5 [4] lookup(H(bob)) gives 128.3.4.5 [5] INVITE sip:bob to 128.3.4.5 [1] put(k,192.1.2.3), k is H(bob) [2] get(k) gives 192.1.2.3 [3] INVITE sip:bob to 192.1.2.3

SIP-using-P2PLogical Operations • Contact management • put (user id, signed contact) • Key storage • User certificates and private configurations • Presence • put (subscribee id, signed encrypted subscriber id) • Composition needs service model • Offline message • put (recipient, signed encrypted message) • NAT and firewall traversal • STUN and TURN server discovery needs service model XML-based data format

SIP-using-P2PImplementation in SIPc with the help of Xiaotao Wu • OpenDHT • Trusted nodes • Robust • Fast enough (<1s) • Identity protection • Certificate-based • SIP id == email • P2P for Calls, IM, presence, offline message, STUN server discovery and name search • P2P clients better than proxies: • Less DHT calls • OpenDHT quota for fairness

Discover DHT (Chord) User location Audio devices User interface (buddy list, etc.) ICE RTP/RTCP Codecs SIP P2P-over-SIPNode architecture: registrar, proxy, user agent • DHT communication using SIP REGISTER • Known node: sip:15@192.2.1.3 • Unknown node: sip:17@sippeer.net • User: sip:alice@example.com Signup, Find buddies IM, call On reset Signout, transfer On startup Leave Find Join REGISTER, INVITE, MESSAGE Peer found/ Detect NAT Multicast REGISTER REGISTER SIP-over-P2P P2P-using-SIP

1 30 26 9 19 11 P2P-over-SIPImplementation 31 • sippeer: C++, Linux, Chord • Node join and form the DHT • Node failure is detected and DHT updated • Registrations transferred on node shutdown • Co-located sipc can use sippeer service 29 31 25 26 15

P2P-over-SIPAnalysis: scalability • Number of messages depends on • Number of peer nodes (N) • Keep-alive (rs) and finger table refresh rate (rf) • Call arrival distribution (c) • Node join, leave, failure rates () • Number of users (k.N) • User registration refresh rate (t) M={rs+ rf(log(N))2} + c.log(N) + (k/t)log(N) + (log(N))2/N • Number of nodes = f(node-capacity) Nmax  min[2M/(r+c),2M/r] for large N • Measured M = 800 reg/s and assuming aggressive refresh and call rate of 1/min, it gives 2219 nodes. • Even for a conservative 10 req/s capacity, it gives more than 16 million nodes (super nodes) in the network.

P2P-over-SIPAnalysis: availability and call setup latency • To increase user availability: • Increase keep-alive rate (fast failure detection) • Increase user registration refresh rate (reduce unavailability) • Replicate user and node registrations • Call setup latency: • Same as DHT lookup latency: O(log(N)) • Calls to known locations (“buddies”) is direct • DHT optimization further reduces latency • Chord: 10000 nodes => 6 hops • At most a few seconds • User availability and retransmission timers

Low transport and transaction overhead Not SIP-specific, hence no implementation overhead for non-VoIP but P2P applications No P2P security burden on SIP Single DHT implementation Reuse SIP naming, routing, security, NAT/firewall traversal Easily reuse existing SIP components without change voicemail, conference Readily supports service model SIP-using-P2P vs P2P-over-SIP

SIP VXML My work Web server Internet telephony infrastructureCINEMA: Columbia InterNet Extensible Multimedia Architecture CINEMA servers Telephone switch rtspd: media server Local/long distance 1-212-5551212 sipconf: Conference server Quicktime RTSP PSTN RTSP clients Department PBX sipum: Unified messaging sipd: Proxy, redirect, Registrar server Internal Telephone Extn: 7040 713x SQL database cgi SIP/PSTN Gateway vxml Web based configuration H.323 siph323: SIP-H.323 translator NetMeeting

Communication to collaboration • Comprehensive • Personalized view • Calendar, address book, groups and access control • Synchronous (tightly-coupled) collaboration • Conferencing: audio, video, IM, white-board, screen sharing, shared web browsing • Asynchronous (loosely-coupled) collaboration • Unified messaging, shared files, discussion forum, notification • Multi-platform (device) • Telephone: touch tone input and audio (IVR) • PC: multimedia client, email, IM • Reuse existing protocols and tools • Unified messaging • The gaps among different media (audio, video, text), devices (PC, phone) and means of communications (Email, SIP, IM) disappear for messaging

INVITE (1) INVITE CANCEL OK INVITE email (3) OK (5) BYE sipum (2) SETUP (4) RTP rtspd Implementation: voicemail • Goals • Universal access • Scalability • Provider independent • Why SIP and RTSP? • Reuse existing infrastructure and tools

Implementation: web interface • Retrieval • Web interface • rtsp://server/alice /inbox/1677.au, • sip:alice-1677-retrieve@server • press 1 to listen… • Configuration • Folders • Options • Email

A D A D A D   A B+C+D D  A+B+C A+C+D A+B+D B    C C C B C B B Conferencing models (non-multicast) Topology star full-mesh ad-hoc No central point of failure Advantages Heterogeneous simple clients Typically only three party conferences Complex endpoints Complex signaling Disadvantages External server with high bandwidth link

Implementation: conference

Periodic timer Playout delay Linear G711 G711 M - A=B+C D E A Linear Mixed linear DVI DVI M - B D E B M=A+B+C Linear GSM G711 D E M - C C Send Receive Implementation: conference • G.711, GSM, DVI, Speex, G.722 mixing (decode-mix-encode) • Video replication; IM; text; VNC screen sharing; floor control; IVR for joining • Optimization possible for same codecs

Performance evaluation • CPU usage = (.P + ).C  = (Me+a.B’+b) and  = (Md+c.B’+d) B’ = B + 320/T For C conferences, each with P participants. a,b,c,d are constants; b,d are comparatively insignificant • For G.711 codec Me and Md are insignificant (5.5 and 1.7 s), thus CPU = C.(a.P+c).(B+320/T) • For GSM, G.722, (or G.723.1), Me and Md are dominant (70 and 30/50 s), thus CPU = C.(Me.P+Md)

Performance evaluation Delay less than 20 ms: increases from first to last participant in a conference About 480 participants in a single conference with one speaker Packetization interval of 40 ms gives better performance: 720, but increases delay too About 80 four-party conferences Memory used 20 kB per call or participant Both Pentium and Sparc took about 6 MHz/participant

N.(N-1) participants Higher delay N2/4 participants 2/3 to ¾ lower delay Cascaded conference server SIP REFER message is used to create cascading       I measured the CPU usage for two cascaded servers: supports about 1000 participants

Terminal Control/Devices Terminal Control/Devices H.245 RTCP Q.931 RAS RTCP SIP SDP Codecs Codecs RTP RTP TPKT TCP UDP Transport Layer IP and lower layers Interworking between SIP and H.323 SDP is simple. H.245 is very exhaustive and can represent inter-codec dependencies. Both use RTP for media thus allows scalable translator H.323 has multi-stage call setup. SIP has single stage. H.323 single step fast connect is optional Basic calls are possible to translate. Complete interworking is not possible without modifying (conference, security).

Revisiting the problems Developed a two stage scalable and reliable SIP server architecture: linear scaling. Use event-based. • Can SIP server provide carrier-grade reliability and scalability using commodity hardware? • What affects the server performance? • How can we build a server-less self-organizing peer-to-peer VoIP network? • Can this be done in standards compliant way? • Can communication be extended to multi-platform collaboration using existing protocols? • How well multi-party conferencing scales? • How to interoperate between SIP and H.323? Developed P2P-SIP architecture: SIP-using-P2P and P2P-over-SIP Multi-platform collaboration using existing protocols and tools, unified messaging, centralized conferencing (cascaded), SIP-H.323 interworking.

My publicationsConference, workshop, technical report, magazine/journal • K. Singh and H. Schulzrinne, “Using an external DHT as a SIP location service", Columbia University Technical Report CUCS-007-06, NY, Feb’06. • K. Singh and H. Schulzrinne, “Peer-to-peer Internet telephony using SIP", NOSSDAV, Skamania, Washington, Jun 2005.. K. Singh and H. Schulzrinne, "Peer-to-peer Internet Telephony using SIP", New York Metro Area Networking Workshop, CUNY, NY, Sep 2004. K. Singh and H. Schulzrinne, "Peer-to-peer Internet Telephony using SIP", Columbia University Technical Report CUCS-044-04, NY, Oct 2004. • K. Singh and H. Schulzrinne, “Failover and load sharing in SIP telephony", SPECTS (Symposium on performance evaluation of computer and telecommunication systems). Philadelphia, PA, Jul 2005. K. Singh and H. Schulzrinne, "Failover and Load Sharing in SIP Telephony", Columbia University Technical Report CUCS-011-04, NY, May 2004. • H. Schulzrinne, K. Singh and X. Wu, "Programmable Conference Server", Columbia University Technical Report CUCS-040-04, NY, Oct 2004. • K. Singh, Xiaotao Wu, J. Lennox and H. Schulzrinne, "Comprehensive Multi-platform Collaboration", MMCN 2004 - SPIE Conference on Multimedia Computing and Networking, Santa Clara, CA, Jan 2004. K. Singh, Xiaotao Wu, J. Lennox and H. Schulzrinne, "Comprehensive Multi-platform Collaboration", Columbia University Technical Report CUCS-027-03, NY, Nov 2003. • M. Buddhikot, A. Hari, K. Singh and S. Miller, "MobileNAT: A new Technique for Mobility across Heterogeneous Address Spaces", ACM MONET journal, March 2005. M. Buddhikot, A. Hari, K. Singh and S. Miller, "MobileNAT: A new Technique for Mobility across Heterogeneous Address Spaces", WMASH 2003 - ACM International Workshop on Wireless Mobile Applications and Services on WLAN Hotspots, San Diego, CA, Sep 2003. • K. Singh, A. Nambi and H. Schulzrinne, "Integrating VoiceXML with SIP services", ICC 2003 - Global Services and Infrastructure for Next Generation Networks, Anchorage, Alaska, May 2003. K. Singh, A. Nambi and H. Schulzrinne, "Integrating VoiceXML with SIP services", Second New York Metro Area Networking Workshop, Columbia University, NY, Sep 2002. • K. Singh, W. Jiang, J. Lennox, S. Narayanan and H. Schulzrinne, "CINEMA: Columbia InterNet Extensible Multimedia Architecture", Columbia University Technical Report CUCS-011-02, NY, May 2002. W. Jiang, J. Lennox, H. Schulzrinne and K. Singh, "Towards Junking the PBX: Deploying IP Telephony", NOSSDAV 2001. W. Jiang, J. Lennox, S. Narayanan, H. Schulzrinne, K. Singh and X. Wu, "Integrating Internet Telephony Services", IEEE Internet Computing (magazine), May/June 2002 (Vol. 6, No. 3). • K. Singh, Gautam Nair and H. Schulzrinne, "Centralized Conferencing using SIP", 2nd IP-Telephony Workshop (IPTel'2001), April 2001. • K. Singh and H. Schulzrinne, "Unified Messaging using SIP and RTSP", IP Telecom Services Workshop 2000, Atlanta, Georgia, U.S.A, Sept 2000. K. Singh and H. Schulzrinne, "Unified Messaging using SIP and RTSP", Columbia University Technical Report CUCS-020-00, NY, Oct 2000. • K. Singh, H.Schulzrinne, "Interworking Between SIP/SDP and H.323", 1st IP-Telephony Workshop (IPTel'2000), April 2000. K. Singh and H. Schulzrinne, "Interworking Between SIP/SDP and H.323", Columbia University Technical Report CUCS-015-00, NY, May 2000.

Backup slides

My research timeline Conference evaluation P2P VoIP using SIP SIP Failover/load sharing Multimedia collaboration CINEMA user interface Interactive voice response Enterprise VoIP infrastructure Mobile NAT Libsip++ (SIP library) SIP conferencing SIP-RTSP voice mail SIP-H.323 translator H.323 client gateway PhD@columbia Reliability and scalability MS@columbia VoIP infrastructure Work@motorola India Undergrad@BITS India 1997 1999 2000 2001 2002 2003 2004 2005 2006

Interactive Interactive voice response voice response Internet Internet Internet Internet Telephony Telephony Radio/TV Radio/TV Messaging Messaging and Presence and Presence Unified Unified messaging messaging Video Video conferencing conferencing Media Media G.711 G.711 MPEG MPEG SIP SIP SAP SAP RSVP RSVP RTCP RTCP RTSP RTSP Application layer Application layer Application layer RTP RTP Transport (TCP, UDP) Transport (TCP, UDP) Transport (TCP, UDP) Network (IPv4, IPv6) Network (IPv4, IPv6) Network (IPv4, IPv6) Quality of service Quality of service Media transport Media transport Signaling Signaling Link layer Link layer Link layer Physical layer Physical layer Physical layer Program Program Voice Voice Speech/ Speech/ Call Call DTMF DTMF Mixing Mixing SDP SDP XML XML text text routing routing VoIP infrastructureCINEMA: multi-platform multimedia collaboration • Beyond voice: video, text, IM, presence, screen sharing, shared web browsing, … • Beyond SIP phone: regular telephone, email, web, … • Beyond synchronous communication: offline mails, discussion forum, file sharing, …

Failover: redundancy Load sharing: scalability VoIP infrastructureReliability and scalability P1 a-h P2 i-q INVITE REGISTER Use DNS P3 r-z Combine the two in a two stage architecture • Infinite scalability (linear with #servers) • High availability

Reliable, Scalable and Interoperable Internet Telephony