1 / 21

Scaling SIP Servers

Scaling SIP Servers. Sankaran Narayanan Joint work with CINEMA team IRT Group Meeting – April 17, 2002. Agenda. Introduction Issues in scaling Facets of sipd architecture Some results Conclusion and Future Work. SQL database. Introduction – SIP servers.

jenn
Download Presentation

Scaling SIP Servers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scaling SIP Servers Sankaran Narayanan Joint work with CINEMA team IRT Group Meeting – April 17, 2002

  2. Agenda • Introduction • Issues in scaling • Facets of sipd architecture • Some results • Conclusion and Future Work

  3. SQL database Introduction – SIP servers • SIP Signaling – Proxy, redirect • Proxies • Call routing by contact location • UDP/TCP/TLS • Stateful or stateless • Programmable scripts • User location – Registrars

  4. What is scale ? • Large call volumes, commodity hardware [Schu0012:Industrial] • Response times (mean, deviation), Turn around time • Goals • Delay budget [SIPstone] • R2 < 2 s • R1 < 500 ms • Class-5 switches handle > 750K BHCA REGISTER R1 200 OK INVITE INVITE R2 180 180 200 200 ACK ACK

  5. Limits to scaling • Not CPU bound • Network I/O – blocking • Wait for responses • Latency: Contact, DNS lookups • OS resource limits • Open files (<= 1024 on Unix) • LWP’s (Solaris) vs. user-kernel threads (Linux, Windows) • Try not to… • Customize and recompile OS • (parts) server into kernel (khttpd, AFPA, …)

  6. The problem • Scaling CPU-bound jobs (throughput=1/delay) • Hardware: CPU speed, RAM, … • Software: better OS, scheduler, … • Algorithm: optimize protocol processing • Blocking (Network, Disk I/O) is expensive • Hypothesis • I/O-bound CPU-bound; reduce blocking • Optimized resource usage – stability at high loads

  7. Facets of sipd architecture • Blocking • Process models • Socket management • Protocol processing

  8. Blocking • Mutex, event (socket, timeout), fread • Queue builds up • Potentially high variability • Tandem queue system • Easy to fix • Non-blocking calls (event driven, later!) • Move queue to different thread (lazy logger) Logger { lock; write; unlock; }

  9. Blocking (2) • Call routing involves ( 1) contact lookups • 10 ms per query (approx) • Cache • Works well for sipd style servers • Fetch-on-demand with replacement (harder) • Loading entire database is easy • need for refresh – long lived servers. • Potentially useful for DNS SRV lookups (?) SQL database Periodic Refresh Cache < 1 ms

  10. REGISTER performance Single CPU Sun Ultra10 Response time is constant for Cache (FastSQL)

  11. Incoming Requests R1-4 R1 R2 R3 R4 Throughput Load Process models (1) One thread per request • Doesn’t scale • Too many threads over a short timescale • Stateless proxy: 2-4 threads per transaction • High load affects throughput

  12. Fixed number of threads Throughput Load Process models (2) Incoming Requests R1-4 Thread pool + Queue • Thread overhead less; more useful processing • Overload management • drop requests over responses, drop tail • Not enough if holding time is high • Each request holds (blocks) a thread

  13. Stateless proxy (Solaris) • Turnaround time is almost constant for stateless proxy • The sudden increase in response time - client problem • UDP losses on Ultra10 @ (120 * 6 * 500 * 8) bps

  14. Stateless proxy (Linux) Request turnaround time breaks down Response turnaround time is constant Effect of high holding times and thread scheduling How to set queue size – investigate?

  15. Queue evolution for sipd Number of requests (y-axis) waiting in the queue for a free thread on Solaris (left) and Linux (right) over a period of up-time (x-axis).

  16. Process models (3) • Blocking thread model needs “too many” threads • Stateful transaction stays for 30 s • Return thread to free pool instead of blocking • Event-driven architectures • State transition triggered by a global event scheduler • OnIncoming1xx(), OnInviteTimeout(), … • SIP-CGI: pre-forked multiple processes

  17. Socket management • Problem: open sockets limit (1024), “liveness” detection, retransmission • One socket per transaction does not scale • Global socket if downstream server is alive, soft state – works for UDP • Hard for TCP/TLS – connections • Worse for Java servers – no select, poll

  18. Optimizing protocol processing • Not too useful if CPU is not the bottleneck • Text protocol - parsing, formatting overheads • Order of headers matter (Via) • Other optimizations (parse-on-demand, date formatting) . . .

  19. Conclusion • Unlike web servers: can be stateful, less disk I/O, lesser impact of TCP stack/behavior, … • Pros: UDP, Stateless routing, Load-balancing using DNS, … • Challenges: scaling state machine, • Towards 2.5M BHCA (3600 messages/s) • Event driven architecture (SEDA?) • Resource management (file limits, threads) • Tuning operating system (scheduler, …)

  20. Future work • Stateful proxy performance • Evaluate event driven architecture • Effect of request forking (> 1 contacts) on server behavior • Programmable scripts • Queue management and overload control • Other types of servers (conference servers, media servers, etc.),

  21. References • CINEMA web page. http://www.cs.columbia.edu/IRT/cinema • H. Schulzrinne. “Industrial strength internet telephony,” Presentation at 6th SIP bakeoff, Dec. 2000. • H. Schulzrinne et. al. “SIPstone – Benchmarking SIP server performance,” CS Technical report, Columbia University.

More Related