1 / 12

CA integration tests

CA integration tests. We need a way to run integration tests test IOCs -> CAJ -> pvmanager Including disconnects due to power cycle and network downtime Corner cases (e.g. different type at reconnect) Ability to check server state (e.g. number of monitors open)

kaveri
Download Presentation

CA integration tests

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CA integration tests • We need a way to run integration tests • test IOCs -> CAJ -> pvmanager • Including disconnects due to power cycle and network downtime • Corner cases (e.g. different type at reconnect) • Ability to check server state (e.g. number of monitors open) • Ability to drop in and run the tests in production environment (to check specific versions of EPICS and network configurations) • Start a script on the server side, start a script on the client side, come back in 15 minutes

  2. CA integration tests • Server side: • Requirements: Epics base (softIoc), procserv • Start server script • Starts 1stsoftIoc • Keeps listening on the “command” pv. Possible commands: • start IOCNAME NSEC – stops the current ioc, waits for NSEC, starts the ioc in the IOCNAME directory • netpause NSEC – brings down the network (ifconfig down) for NSEC • connections PVNAME – puts the number of current monitors (casr 2) on the PVNAME in the “output” pv • stop – stops the server side

  3. CA integration tests • Client side: • Library in pvmanager to make integration tests reasonable to write • Two phases • Run a series of tasks while recording all events that come out of pvmanager • Verify the order and number of events coming from pvmanager • If verification fails, you get a table with all the events gathered

  4. public final void run() throws Exception { init("typeChange1"); addReader(PVManager.read(channel("double-to-i32")), TimeDuration.ofHertz(50)); pause(1000); restart("typeChange2"); pause(2000); } public final void verify(Log log) { // Check double log.matchConnections("double-to-i32", true, false, true); log.matchValues("double-to-i32", ALL_EXCEPT_TIME, newVDouble(0.0, newAlarm(AlarmSeverity.INVALID, "UDF_ALARM"), newTime(Timestamp.of(631152000, 0), null, false), displayNone()), newVDouble(0.0, newAlarm(AlarmSeverity.UNDEFINED, "Disconnected"), newTime(Timestamp.of(631152000, 0), null, false), displayNone()), newVInt(0, newAlarm(AlarmSeverity.INVALID, "UDF_ALARM"), newTime(Timestamp.of(631152000, 0), null, false), displayNone())); }

  5. CA integration tests • Covered • Simple reboot: connect pv, ioc down, ioc up, only 1 monitor open • Simple network outage: connect, network down, network up, only 1 monitor open • Multiple reboots: connect pv, ioc cycle 10 times • Type change: connect double pv, ioc cycle, pv become integer • Constant pv: conect to double/int/string/enum that do not change • Slow changing pv: conect to double updating at 1 Hz (same rate received) • Fast changing pv: conect to double updating at 100 Hz (reduced rate received) • Alarm changing pv: conect to double updating at 1 Hz for alarm only • Write pv: change value for double/int • Not yet covered • Add all remaining types for disconnection test • Add all types for type change • Add all types for slow changing pvs • Add all types for fast changing pvs • Add all types for alarm changing pvs • Add all types for write pvs • Add metadata changes • Add access control changes • Add multiple reader on a single pv (only 1 monitor open) • Add nanosec out of range for time • Old RTYP handling

  6. Review BOY connection layer • Review connection layer in BOY to: • Solve concurrency issues • Likely cause of missed events • Investigate performance problems • Background load • Slow to open some screens (>5 sec) • Find better ways to integrate pvmanager

  7. Review BOY connection layer • Findings: • State of widgets accessed/changed from different threads without synchronizations • Simple.pvpvmanager implementation • uses 4 different synchronization methods, not well coordinated, some unneeded • synchronized, volatile, Atomic variable, thread-safe collections • Simple.pv interface forces to split calls to then re-merge them • E.g. connection/value are one callback in pvmanager, split into two, later recombined • Sets the pvmanager rate throttling at 50Hz and then does an additional throttling at 10Hz • Script interface: utility.pv implementation provides all values; pvmanager implementation does not • Different widgets with different needs go through the same code path • E.g. All widgets create a writer, even if they are monitors. Same code for both widgets that need queuing and widgets that need caching

  8. Review BOY connection layer • Changes on special branch: • Connecting BOY directly to pvmanager, skipping utility.pv • Making sure all events go on the UI thread • May solve missed events, but was never tested • Removed unnecessary context switches • Using pvmanager proper event throttling, removing EventBundlingThread • Added pause/resume when widgets out of screen • Script interface too problematic to touch • Hope was to re-implement rules on top of pvmanager • Can’t be done in general as rule user parameters are basically javascript pieces that are concatenated • No formal parsing or rule definition

  9. Review BOY connection layer • Background load • Sources of background load are different on different environment • On my development environment (Windows/Debian/Scientific Linux) the main source of load is SWT. Pause/Resume makes 64% load go to 4% when the window is hidden. • On one BNL production machine, the main source of load seems to be the synchronization used in the thread pool used by pvmanager during the active scanning. Pause/Resume has no significant benefit. • On another BNL production machine, the main load was SWT, but Pause/Resume had no effect. • Not OS dependent. Maybe hardware of hardware + OS combinations. • Slow load • Traced back to use of rules. Each rule is a script. Each script starts a scripting environment. Each scripting environment seems to load a lot of classes (interaction between classloaders and OSGI?). Loading of a screen with a large set of rules is stuck loading/unloading classing for several seconds.

  10. Review BOY connection layer • Takeaway: • Work that needs to be done in BOY • Finish proper pvmanager integration • Properly divide widget state (should all be in the model) so that real-time only updates that • Don’t just have one connection logic for all widget types • Understand how to implement rules on top of pvmanager(re-implement or migrate?) • Whoever does this work will not be able to do the testing himself; needs prompt support and feedback • Performance profile is significantly different • Concurrency issues are difficult to replicate

  11. Review BOY connection layer • Takeaway: • For pvmanager • Wrote 100 times on the blackboard: “My development environment is not a good approximation of all production environment” • Will prepare a performance benchmarking suite to gather data so I can keep track • Passive scanning got on the “toppish” of the list. Considering also implementing a different ExecutorService

More Related