MBUF Problems and solutions on VxWorks

1. MBUF Problems and solutions on VxWorks Dave Thompson and cast of many.

2. MBUF Problems This is usually how it lands in my inbox: On Tue, 2003-05-06 at 20:38, Kay-Uwe Kasemir wrote: > Hi: > > Neither ics-accl-srv1 nor the CA gateway were able to get to dtl-hprf-ioc3. > > Via "cu", the IOC looked fine except for error messages > (CA_TCP): CAS: Client accept error was "S_errno_ENOBUFS" (CA_online): ../online_notify.c: CA beacon error was "S_errno_ENOBUFS� This has been a problem since before our front end commissioning even though we are using power pc IOCs and a fully switched, full duplex, 100 MHz Cisco based network infrastructure. The error is coming from the Channel Access Server.

3. Contributing Circumstances (According to Jeff Hill) The total number of connected clients is high. the server's sustained (data) production rate is higher than the client's sustained consumption rate. clients that subscribe for monitor events but do not call ca_pend_event() or ca_poll() to process their CA input queue The server does not get a chance to run The server has multiple stale connections And also probably: tNetTask does not get to run From the Channel access reference manual: Contributing Circumstances The total number of connected clients is high. Each active socket requires dedicated mbufs for protocol control blocks, and for any data that might be pending in the operating system for transmission to Channel Access or to the network at a given instant. If you increase the vxWorks limit on the maximum number of file descriptors then it may also be necessary to increase the size of the mbuf pool. The server has multiple connections where the server's sustained event (monitor subscription update) production rate is higher than the client's or the network's sustained event consumption rate. This ties up a per socket quota of mbufs for data that are pending transmission to the client via the network. In particular, if there are multiple clients that subscribe for monitor events but do not call ca_pend_event() or ca_poll() to process their CA input queue, then a significant mbuf consuming backlog can occur in the server. The server does not get a chance to run (because some other higher priority thread is running) and the CA clients are sending a high volume of data over TCP or UDP. This ties up a quota of mbufs for each socket in the server that isn't being reduced by the server's socket read system calls. The server has multiple stale connections. Stale connections occur when a client is abruptly turned off or disconnected from the network, and an internal "keepalive" timer has not yet expired for the virtual circuit in the operating system, and therefore mbufs may be dedicated to unused virtual circuits. This situation is made worse if there are active monitor subscriptions associated with stale connections which will rapidly increase the number of dedicated mbufs to the quota available for each circuit. Related Diagnostics The EPICS command "casr [interest level]" displays information about the CA server and how many clients are connected. The vxWorks command "inetstatShow" indicates how many bytes are pending in mbufs and indirectly (based on the number of circuits listed) how many mbuf based protocol control blocks have been consumed. The vxWorks commands (availability depending on vxWorks version) mbufShow, netStackSysPoolShow, and netStackDataPoolShow indicate how much space remains in the mbuf pool. The RTEMS command "netstat [interest level]" displays network information including mbuf consumption statistics. From the Channel access reference manual: Contributing Circumstances The total number of connected clients is high. Each active socket requires dedicated mbufs for protocol control blocks, and for any data that might be pending in the operating system for transmission to Channel Access or to the network at a given instant. If you increase the vxWorks limit on the maximum number of file descriptors then it may also be necessary to increase the size of the mbuf pool. The server has multiple connections where the server's sustained event (monitor subscription update) production rate is higher than the client's or the network's sustained event consumption rate. This ties up a per socket quota of mbufs for data that are pending transmission to the client via the network. In particular, if there are multiple clients that subscribe for monitor events but do not call ca_pend_event() or ca_poll() to process their CA input queue, then a significant mbuf consuming backlog can occur in the server. The server does not get a chance to run (because some other higher priority thread is running) and the CA clients are sending a high volume of data over TCP or UDP. This ties up a quota of mbufs for each socket in the server that isn't being reduced by the server's socket read system calls. The server has multiple stale connections. Stale connections occur when a client is abruptly turned off or disconnected from the network, and an internal "keepalive" timer has not yet expired for the virtual circuit in the operating system, and therefore mbufs may be dedicated to unused virtual circuits. This situation is made worse if there are active monitor subscriptions associated with stale connections which will rapidly increase the number of dedicated mbufs to the quota available for each circuit. Related Diagnostics The EPICS command "casr [interest level]" displays information about the CA server and how many clients are connected. The vxWorks command "inetstatShow" indicates how many bytes are pending in mbufs and indirectly (based on the number of circuits listed) how many mbuf based protocol control blocks have been consumed. The vxWorks commands (availability depending on vxWorks version) mbufShow, netStackSysPoolShow, and netStackDataPoolShow indicate how much space remains in the mbuf pool. The RTEMS command "netstat [interest level]" displays network information including mbuf consumption statistics.

4. Contributing Circumstances SNS Now has a number of different IOCs : 21 VxWorks IOCS 21 +/- Windows IOCs 1 Linux IOC 4 OPIs in control room and many others on site Servers running CA clients like the archiver Users remotely logged in running edm via ssh�s X tunnel. CA Gateway Other IP clients and services running on vxWorks and servers. Other IP applications running on IOCs such as log tasks, etherIP and serial devices running over IP.

5. Our experience to date At SNS we have seen all of the contributing circumstances that Jeff mentions. At BNL, Larry Hoff saw the problem on an IOC where the network tasks were being starved. Many of our IOCs have heavy connection loads. There are some CA client and Java CA client applications which need to be checked. IOCs get hard reboots to fix problems and thus leave stale connections. Other network problems have existed and been �fixed� including CA gateway loopback.

6. Late breaking: Jeff Hill was at ORNL last week. One of the things he suspected was that the noise on the Ethernet wiring causes the link to re-negotiate speed and full/half duplex operation. He confirmed that the combination of the MV2100 and the Cisco switches is prone to frequent auto-negotiation, shutting down Ethernet I/O on the IOC. This is not JUST a boot-up problem.

7. What is an mbuf anyway?

MBUF Problems and solutions on VxWorks

MBUF Problems and solutions on VxWorks

Presentation Transcript

Problems, options and solutions

Tornado and VxWorks

Review Problems And Solutions

Problems and Solutions

VxWorks

MEDC’s: Problems And Solutions

Problems and Possible Solutions

Problems and Possible Solutions

Global problems and solutions

Externalities: Problems and Solutions

XML Problems and Solutions

MBUF Problems and solutions on VxWorks

MBUF 101

Network Problems and Solutions

WindRiver Hand-on seminar Migrating from VxWorks 5.5 to VxWorks 6.3

Computer problems and solutions

Problems and solutions

ACS on VxWorks

Externalities: Problems and Solutions

Leprosy: Problems and Solutions

Drinking: Problems and Solutions

MBUF Problems and solutions on VxWorks