1 / 9

UK e-Science CA Resilience

UK e-Science CA Resilience. Jens Jensen, STFC RAL. GridPP 22, UCL, 1-2 April 2009. Disaster Planning. Like Graeme said (in a different context): Unfortunately not all theoretical  Updated planning in connection with R89 move (Nov-Dec. ‘08) Service goals: availability (1-3),

jasia
Download Presentation

UK e-Science CA Resilience

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UK e-Science CAResilience Jens Jensen, STFC RAL GridPP 22, UCL, 1-2 April 2009

  2. Disaster Planning • Like Graeme said (in a different context): Unfortunately not all theoretical  • Updated planning in connection with R89 move (Nov-Dec. ‘08) • Service goals: • availability (1-3), • integrity (1-3), • confidentiality (1-3), • ∏ (2-18)

  3. Services Some ~18 services constitute the CA (of which 4 are people) Some very specialised stuff

  4. More testing needed • E.g. HSM resilience • Setting up NGS-CA-TAG for review • Set up to review R89 move plan prior to move • Taken some time to set up secure comms • Simplify

  5. Previous DP • HBI service review • Looks at infrastructure • But not CAologically • Probably too many staff involved at various levels? • Not as much problem with responsibility • More than no one knows everything (enough) about all layers

  6. Coping with incidents • IGTF-RAT • International Grid Trust Federation • Risk Assessment Team • 2-4 Members from each PMA – 24 hr cover (ish) • Using GPG/PGP keys, secured email at NCSA • Assess incidents • E.g. MD5 “incident” 30-31 Dec. 2008 • E.g. DSA and ECDSA signatures in OpenSSL

  7. Thoughts on resilience • LOCKSS • Redundancy not always possible • Expensive OR complicated OR security risk… • Software needs attention • Make machines do tedious tasks • Machines implement redundancy • People are important (cf Raja’s talk) • Good stuff to set OGF-CAOPS best practice

  8. Thoughts on resilience • Hard to predict all the risks • Some are difficult to mitigate • Understanding services better will help • Trust machines and audit humans • Trust humans and make machines resilient • Complex software • ``Who are General Status and Major Failure, and what are they doing on my system?’’

  9. Thoughts on infraoperations • Online database • Backed up hourly • (Risk of backing up bad data?) • Timeliness of revocations • IGTF: Currently being discussed for MICS profile • IGTF: How well are the Classics doing this • UK: Aim to bring signing online • UK: Direct revocation link for security officers

More Related