1 / 44

SIP as infrastructure

SIP as infrastructure. Henning Schulzrinne Dept. of Computer Science, Columbia University, New York hgs@cs.columbia.edu SIP 2007 (upperside.fr) Paris, France February 2007. Outline. Scaling SIP to the real world: emergency calling Scaling SIP to very large deployments

Download Presentation

SIP as infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SIP as infrastructure Henning Schulzrinne Dept. of Computer Science, Columbia University, New York hgs@cs.columbia.edu SIP 2007 (upperside.fr) Paris, France February 2007

  2. Outline • Scaling SIP to the real world: emergency calling • Scaling SIP to very large deployments • some measurements for designing large servers • congestion control and dealing with avalanche restart • P2P SIP • failure discovery • The state of SIP standardization, year 11 • developments in 2006 & upcoming highlights • trouble in standards land

  3. Roadmap • Introduction • Emergency calling • Server scaling • P2P SIP • End-to-end management • Standardization and interoperability

  4. Evolution of VoIP “Can it really replace the phone system?” “How can I make it stop ringing?” long-distance calling, ca. 1930 “does it do call transfer?” replacing the global phone system going beyond the black phone “amazing – the phone rings” catching up with the digital PBX 1996-2000 2000-2003 2004-2005 2006-

  5. IETF VoIP efforts ECRIT (emergency calling) ENUM (E.164 translation) SIMPLE (presence) uses SPEERMINT (peering) GEOPRIV (geo + privacy) uses may use uses XCON (conf. control) SIP (protocol) SIPPING (usage, requirements) uses provides IPTEL (tel URL) SPEECHSC (speech services) usually used with MMUSIC (SDP, RTSP, ICE) AVT (RTP, SRTP, media) SIGTRAN (signaling transport) IETF RAI area

  6. Roadmap • Introduction • Emergency calling • Server scaling • P2P SIP • End-to-end management • Standardization and interoperability

  7. VoIP emergency communications emergency call now transition all IP Contact well-known number or identifier 112 911 112 911 112, 911 urn:service:sos emergency alert (“inverse 911”) dispatch Route call to location-appropriate PSAP LoST SR VPC civic coordination Deliver precise location to call taker to dispatch emergency help phone number  location (ALI lookup) in-band  key  location in-band

  8. IETF ECRIT working group • Emergency Contact Resolution with Internet Technologies • Solve four major pieces of the puzzle: • location conveyance (with SIP & GEOPRIV) • emergency call identification • mapping geo and civic caller locations to PSAP URI • discovery of local and visited emergency dial string • Not solving • location discovery --> GEOPRIV • inter-PSAP communication and coordination • citizen notification • Current status: • finishing general and security requirements • agreement on mapping protocol (LoST) and identifier (sos URN) • working on overall architecture and UA requirements

  9. ECRIT: Options for location delivery • GPS • L2: LLDP-MED (standardized version of CDP + location data) • periodic per-port broadcast of configuration information • currently implementing CDP • L3: DHCP for • geospatial (RFC 3825) • civic (RFC 4676) • L7: proposals for retrievals: HELD, RELO, LCP, SIP, … • for own IP address or by third party (e.g., ISP to infrastructure provider) • by IP address • by MAC address • by identifier (conveyed by DHCP or PPP) • HELD, RELO: both HTTP-based

  10. ECRIT: Finding the correct PSAP • Which PSAP should the e-call go to? • Usually to the PSAP that serves the geographic area • Sometimes to a backup PSAP • If no location, then ‘default’ PSAP • solved by LoST

  11. Civic as well as geospatial queries civic address validation Recursive and iterative resolution Fully distributed and hierarchical deployment can be split by any geographic or civic boundary same civic region can span multiple LoST servers Indicates errors in civic location data  debugging but provides best-effort resolution Can be used for non-emergency services: directory and information services pizza delivery services, towing companies, … ECRIT: LoST Functionality <findService xmlns="urn:…:lost1"> <location profile="basic-civic"> <civicAddress> <country>Germany</country> <A1>Bavaria</A1> <A3>Munich</A3> <A6>Neu Perlach</A6> <HNO>96</HNO> </civicAddress> </location> <service>urn:service:sos.police</service> </findService>

  12. LoST: Location-to-URL Mapping VSP1 cluster serving VSP1 replicate root information cluster serves VSP2 123 Broad Ave Leonia Bergen County NJ US LoST root nodes NJ US NY US sip:psap@leonianj.gov search referral Bergen County NJ US Leonia NJ US

  13. LoST Architecture G tree guide G G G broadcast (gossip) T1: .us T2: .de G resolver T2 (.de) seeker 313 Westview Leonia, NJ US T3 (.dk) T1 (.us) Leonia, NJ  sip:psap@leonianj.gov

  14. Roadmap • Introduction • Emergency calling • Server scaling • P2P SIP • End-to-end management • Standardization and interoperability

  15. SIP server overload overloaded Springsteen tickets!! earthquake vote for your favorite… • Proxies will return 503 --> retry elsewhere • Just adds more load • Retransmissions exacerbate the problem INVITE 503 overloaded overloaded

  16. Avalanche restart • Large number of terminals all start at once • Typically, after power outage • Overwhelms registrar • Possible loss of registrations due to retransmission time-out #1 REGISTER #300,000 reboot after power outage

  17. Overload control • Current discussion in design team • Feedback control: rate-based or window-based • Avoid congestion collapse • Deal with multiple upstream sources goodput capacity offered load

  18. Need TCP TLS support: customer privacy, theft of service, … particularly for WiFi many SIP messages now exceed reasonable UDP size (fragmentation) e.g., INVITE for IMS: 1182 bytes Concern: UA support improving: 82% of systems at recent SIPit’19 had TCP support only 45% support TLS Concern: TCP (and TLS) much less efficient than UDP running series of tests to identify differences difference mainly in connection setup cost message splitting (may need pre-parsing or incremental parsers) thread count (one per socket?) Our model: 300,000 customers/servers 0.1 Erlang, 180 sec/call 600,000 BHCA --> 167 req/sec 300,000 registrations --> 83 req/sec $0.001/subscriber Scaling servers & TCP

  19. Pentium 4 server, 3 GHz 4 GB memory Linux 2.6.16 Performance evaluation results echo server Kumiko Ono

  20. Initial INVITE measurements OpenSER 400 calls/sec for TCP roughly 260 calls/sec for TLS SIP server measurements TCP sipd REGISTER test Kumiko Ono, Charles Shen, Erich Nahum

  21. Roadmap • Introduction • Emergency calling • Server scaling • P2P SIP • End-to-end management • Standardization and interoperability

  22. P2P SIP generic DHT service • Why? • no infrastructure available: emergency coordination • don’t want to set up infrastructure: small companies • Skype envy :-) • P2P technology for • user location • only modest impact on expenses • but makes signaling encryption cheap • NAT traversal • matters for relaying • services (conferencing, …) • how prevalent? • New IETF working group just formed • likely, multiple DHTs • common control and look-up protocol? p2p network P2P provider B DNS P2P provider A traditional provider zeroconf LAN

  23. P2P SIP -- components • Multicast-DNS (zeroconf) SIP enhancements for LAN • announce UAs and their capabilities • Client-P2P protocol • GET, PUT mappings • mapping: proxy or UA • P2P protocol • get routing table, join, leave, … • independent of DHT? • replaces DNS for SIP, not proxy

  24. Roadmap • Introduction • Emergency calling • Server scaling • P2P SIP • End-to-end management • Standardization and interoperability

  25. VoIP user experience • Only 95-99.5% call attempt success • “Keynote was able to complete VoIP calls 96.9% of the time, compared with 99.9% for calls made over the public network. Voice quality for VoIP calls on average was rated at 3.5 out of 5, compared with 3.9 for public-network calls and 3.6 for cellular phone calls. And the amount of delay the audio signals experienced was 295 milliseconds for VoIP calls, compared with 139 milliseconds for public-network calls.” (InformationWeek, July 11, 2005) • Mid-call disruptions common • Lots of knobs to turn • Separate problem: manual configuration

  26. Ideally, should only need a user name and some credential password, USB key, host identity (MAC address), … More than DHCP: device needs to get SIP-level information (outbound proxy, timers) policy information (“sorry, no video”) Multiple sources of configuration information local network (hotel proxy) voice service provider (off-network) Configuration information may change Needs to allow no-touch deployment of thousands of devices SIP configuration framework has been languishing for years currently being rewritten to reduce complexity Open issues: Configuration

  27. Circle of blame probably packet loss in your Internet connection  reboot your DSL modem ISP probably a gateway fault  choose us as provider OS VSP must be a Windows registry problem  re-install Windows app vendor must be your software  upgrade

  28. Traditional network management model X SNMP “management from the center”

  29. Single provider (enterprise, carrier) has access to most path elements professionally managed Problems are hard failures & elements operate correctly element failures (“link dead”) substantial packet loss Mostly L2 and L3 elements switches, routers rarely 802.11 APs Problems are specific to a protocol “IP is not working” Indirect detection MIB variable vs. actual protocol performance End systems don’t need management DMI & SNMP never succeeded each application does its own updates Old assumptions, now wrong

  30. Management what causes the most trouble? network understanding fault location we’ve only succeeded here configuration element inspection

  31. Managing the protocol stack protocol problem authorization asymmetric conn (NAT) media echo gain problems VAD action RTP SIP protocol problem playout errors UDP/TCP TCP neg. failure NAT time-out firewall policy IP no route packet loss

  32. Proposal: “Do You See What I See?” • Each node has a set of active and passive measurement tools • Use intercept (NDIS, pcap) • to detect problems automatically • e.g., no response to HTTP or DNS request • gather performance statistics (packet jitter) • capture RTCP and similar measurement packets • Nodes can ask others for their view • possibly also dedicated “weather stations” • Iterative process, leading to: • user indication of cause of failure • in some cases, work-around (application-layer routing)  TURN server, use remote DNS servers • Nodes collect statistical information on failures and their likely causes

  33. Management architecture “not working” (notification) request diagnostics orchestrate tests contact others inspect protocol requests (DNS, HTTP, RTCP, …) ping 127.0.0.1 can buddy reach our resolver? “DNS failure for 15m” notify admin (email, IM, SIP events, …)

  34. Roadmap • Introduction • Emergency calling • Server scaling • P2P SIP • End-to-end management • Standardization and interoperability

  35. SIP, SIPPING & SIMPLE –00 drafts includes draft-ietf-*-00 and draft-personal-*-00

  36. RFC publication

  37. ~ 44 SIP-related RFCs published in 2006 BFCP, conferencing SDP revision rich presence Activities: hitchhiker’s guide infrastructure: GRUUs (random identifiers) URI lists XCAP configuration SIP MIB services: rejecting anonymous requests consent framework location conveyance session policy security: end-to-middle security certificates SAML sips clarification NAT: connection re-use SIP outbound ICE (in MMUSIC) IETF WG: SIP in 2006 & 2007 see http://tools.ietf.org/wg/sip’/

  38. 31 RFCs published in 2006 Policy media policy SBC functions Services service examples call transfer configuration framework spam and spit text-over-IP transcoding Testing and operations IPv6 transition race condition examples IPv6 torture tests SIP offer-answer examples overload requirements configuration voice quality reporting IETF WG: SIPPING

  39. Interoperability • Generally no interoperability problems for basic SIP functionality • basic call, digest registration, call transfer, voice mail • Weaker in advanced scenarios and backward compatibility • handling TCP, TLS • NAT support (symmetric RTP, ICE, STUN, ...) • multipart bodies • SIP torture tests • call transfer, call pick-up • video and voice codec interoperability (H.264, anything beyond G.711) • SIPit useful, but no equivalent of WiFi certification • most implementations still single-vendor (enterprise, carrier) or vendor-supplied (VSP) • SFTF (test framework) still limited • Need profiles to guide implementers

  40. Trouble in Standards Land • Proliferation of transition standards: 2.5G, 2.6G, 3.5G, … • true even for emergency calling… • Splintering of standardization efforts across SDOs • primary: • IEEE, IETF, W3C, OASIS, ISO • architectural: • PacketCable, ETSI, 3GPP, 3GPP2, OMA, UMA, ATIS, … • specialized: • NENA • operational, marketing: • SIP Forum, IPCC, … OASIS data formats W3C ISO (MPEG) data exchange IETF L2.5-L7 protocols IEEE L1-L2 PacketCable 3GPP

  41. SIP WGs: small number (dozen?) of core authors (80/20) some now becoming managers… or moving to other topics IETF: research  engineering  maintenance many groups are essentially maintaining standards written a decade (or two) ago DNS, IPv4, IPv6, BGP, DHCP; RTP, SIP, RTSP constrained by design choices made long ago often dealing with transition to hostile & “random” network network ossification Stale IETF leadership often from core equipment vendors, not software vendors or carriers fair amount of not-invented-here syndrome late to recognize wide usage of XML and web standards late to deal with NATs security tends to be per-protocol (silo) some efforts such as SAML and SASL tendency to re-invent the wheel in each group IETF issues

  42. Most drafts spend lots of time in 90%-complete state lack of energy (moved on to new -00) optimizers vs. satisfiers multiple choices that have non-commensurate trade-offs Notorious examples: SIP request history: Feb. 2002 – May 2005 (RFC 4244) Session timers: Feb. 1999 – May 2005 (RFC 4028) Resource priority: Feb. 2001 – Feb 2006 (RFC 4412) New framework/requirements phase adds 1-2 years of delay Three bursts of activity/year, with silence in-between occasional interim meetings IETF meetings are often not productive most topics gets 5-10 minutes  lack context, focus on minutiae no background  same people as on mailing list 5 people discuss, 195 people read email No formal issue tracking some WGs use tools, haphazardly Gets worse over time: dependencies increase, sometimes undiscovered backwards compatibility issues more background needed to contribute IETF issue: timeliness

  43. IETF issues: timeliness • WG chairs run meetings, but are not managing WG progress • very little control of deadlines • e.g., all SIMPLE deadlines are probably a year behind • little push to come to working group last call (WGLC) • limited timeliness accountability of authors and editors • chairs often provide limited editorial feedback • IESG review can get stuck in long feedback loop • author – AD – WG chairs • sometimes lack of accountability (AD-authored documents) • RFC editor often takes 6+ months to process document • dependencies; IANA; editor queue; author delays • e.g., session timer: Aug. 2004 – May 2005

  44. Conclusion • Moving from lab and trials to large-scale deployments • Planning horizon includes turning off circuit-switched phones • in large enterprises • in some carriers • From emphasis on features to global scale: • interoperation • configuration • peer-to-peer systems • emergency services • overload behavior • failure detection across networks and protocol layers • Integration of advanced features (IM, presence, video, programmable services) still lacking • Current standardization processes slow and complexity-inducing

More Related