1 / 14

gLite Error Handling

gLite Error Handling. Steve Fisher for JRA1-UK. Introduction. Good error handling is appreciated by users Bad error handling incurs the wrath of Stephen Burke. Examples from Stephen B. From long experience I think there's a hierarchy of bad error messages: 1) Crash, core dump etc.

rdorsey
Download Presentation

gLite Error Handling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. gLite Error Handling Steve Fisher for JRA1-UK

  2. Introduction • Good error handling is appreciated by users • Bad error handling incurs the wrath of Stephen Burke Errors - Brno

  3. Examples from Stephen B • From long experience I think there's a hierarchy of bad error messages: • 1) Crash, core dump etc. • 2) No error message, so you think it worked when it didn't • maybe worse than 1 • 3) Something which might indicate an error or might not, e.g. “No results returned” from R-GMA. • 4) A catch-all error which translates to “something went wrong”, e.g. “ERROR: Failed to instantiate Consumer” from R-GMA. • 5) An error which assumes a particular cause when in fact there are many causes, e.g. “invalid argument” from the lcg-* tools. • 6) A message which can only be translated to the real cause by the initiated. Errors - Brno

  4. Examples - continued • 7) A message which almost tells you what happened, but leaves out some vital information: “couldn't open file” - but which file?! • 8) A 50-line dump of everything the code can find, which has the real error buried somewhere in it, e.g. “expired host certificates” with GSI. • 9) A message which tells you what went wrong in a way which makes it clear that the code could have recovered itself but didn't bother, e.g. edg-rm giving up when the first replica fails even when there might be 30 others to try. • I would include 10), a helpful error message which tells you exactly what went wrong and what to do about it, but I don't think I've ever seen one of those ... Errors - Brno

  5. So… • Good error handling is most important when one gLite component calls another • Error passed finally back to the user must be • Comprehensible • Comprehensive • It must be easy for the API user to take appropriate action • i.e. don’t expect the user to do pattern matching on an error message • 4 Areas • Internal to a service • The service interface (WSDL) • gLite API • Displayed by a gLite provided tool Errors - Brno

  6. Internal to a service • There is no reason to suggest any rules • Services can preserve their autonomy • For R-GMA we use moderately deep exception hierarchy Errors - Brno

  7. In the WSDL • Use a small number of WSDL faults: <element name="UnknownResourceException" type="rgma:UnknownResourceException"/> <complexType name="UnknownResourceException"> <sequence> <element name="errMsg" type="xsd:string" minOccurs="0"/> <element name="errNo" type="xsd:int"/> </sequence> </complexType> <wsdl:message name="UnknownResourceExceptionMessage"> <wsdl:part name="fault" element="rgma:UnknownResourceException"/> </wsdl:message> <wsdl:operation name="setTerminationInterval" parameterOrder="resourceId terminationInterval"> … <wsdl:fault name="UnknownResourceException" message="impl:UnknownResourceExceptionMessage"> </wsdl:fault> … </wsdl:operation> Errors - Brno

  8. R-GMA set of faults • RGMAException • xsd:string errMsg(0..1) • xsd:int errNo • xsd:string trace(0..1) • UnknownResourceException • xsd:string errMsg(0..1) • xsd:int errNo • RGMASecurityException • xsd:string errMsg(0..1) • xsd:int errNo Errors - Brno

  9. Could generalise • ServiceException • xsd:string errorMessage(0..1) • xsd:int errorNumber • xsd:string trace(0..1) • UnknownResourceException • xsd:string errorMessage(0..1) • xsd:int errorNumber • AuthException • xsd:string errorMessage(0..1) • xsd:int errorNumber errorMessage is free format string errorNumber is a “small” integer trace is free format string Auth rather than Security because of java.lang.SecurityException clashes If one service calls another which returns an exception it is the responsibility of the caller to generate a decent message and error number. Information from the underlying problem can be added to the trace. Errors - Brno

  10. API view • Errors get passed from the Service back to the user in a style appropriate to the language. • For Java, C++ and Python use Exceptions matching the WSDL • For C we use an object like thing: if (RGMAPrimaryProducer_insert(pp, insert) != 0) { fprintf(stderr, "Failed to insert.\n"); fprintf(stderr, "<%s>\n", RGMA_getException(pp)->errorMessage); exit(1); } Errors - Brno

  11. API Errors • Additionally some errors can be generated by the API: • RemoteException • unable to contact the service • AuthException • same as service returns but this time due to authentication problem • ServiceException • user does not know what is in the API and what is in the service. • from a user perspective the API is the service • Each API should provide a set of symbolic constants for the error numbers. • Changing the error numbers introduces an incompatibility • No attempt should be made to interpret the value of the number • The error messages are for humans and are subject to change Errors - Brno

  12. The 4 types of exception • AuthException • User should ensure that he is authenticated and has the right authorization. He should not get back much information. • RemoteException • Unable to contact the service. You might want to try again. • UnknownResourceException • Try remaking the resource – though you want to wait a little while first or limit the number of attempts • ServiceException • This may be in invalid interaction with the service or it could be a faulty service. Consult the error message. Errors - Brno

  13. CLI • The CLI will normally trap and handle errors • Unexpected errors should result in printing the error message but not the trace unless the CLI is being run in debug mode. Errors - Brno

  14. Conclusion • Most of the issues about errors are non-technical • Error handling needs to be taken seriously with full attention to the messages: • Comprehensibility • Comprehensiveness • We should try to agree upon the principles Errors - Brno

More Related