1 / 8

Recovering deleted files from HPSS

Recovering deleted files from HPSS. Pierre-Emmanuel Brinette. Context. HPSS used as MSS behind dCache for LHC Experiments In march 2011, LHCb accidentally delete a dataset from the central database located at CERN Delete operation has been propagated to all grid sites that retains the data

ingalls
Download Presentation

Recovering deleted files from HPSS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recovering deleted files from HPSS Pierre-Emmanuel Brinette

  2. Context • HPSS used as MSS behind dCache for LHC Experiments • In march 2011, LHCb accidentally delete a dataset from the central database located at CERN • Delete operation has been propagated to all grid sites that retains the data • many site impacted : CERN (CASTOR), RAL (CASTOR), GRID-KA (TSM-HSM), PIC (CASTOR) • 3 days after the deletion, LHCb experiment asked sites to restore the deleted files. • All these HSM have ‘undelete’ features and sites were able to restore files within 2-3 days. • No undelete features in HPSS !

  3. Restore recipes • Setup a clone HPSS core server • Restore the HPSS backup before deletion • Identify the tapes holding the files • Check if the tapes have been altered in the production system (repacked or reclaimed) • Prepare the resources to recover the files • Copy files from clone HPSS to production HPSS

  4. Setup a HPSS clone core server • Installation of a new core server • Clone the local user (same uid,gid,password,shadow) • Setup same DB2 tablespaces containers (raw devices, mount points, …) • Setup DB2 9.5 • Compilation of HPSS (no installation) • Restore the production database • Restore HCFG, HSUBSYSx and rollforward to the time before the deletion • Alter the configuration to change the hostname • Transfers /var/hpss/etc/* from production server to clone server • Change the hostname in the text config files • Recreate the HPSS Unix keytab • Recreate the HPSS mm keytab • DB2 Rebind • Alter the HPSS metadata • Update EXECUTE_HOSTNAME in HCFG.HPSS.SERVER • Update SERV_DESC in HCFG.HPSS.SERVER for SUD daemon and LOGC • Start SSM

  5. Setup a HPSS clone core server • Avoid writing operations • Disable MPS • Disable log archiving in HPSS (in LOGD) • Lock all drives • Disk volumes • Tape drives • Disk and tape mover • Start component and disables all pending operations • Start logc,logd,PVL (NOT PVR), CORE • Cancel pvljob • Wait for timeout of the pending operation

  6. Identify the tape containing the erased files • Query the clone CORE server with the list of deleted file • Get list of tape contains that contains the files. • Check that the tape exists on the Production env. • Lock the tape on the production env. (ie: VV Cond  Down)

  7. Restore procedure Purge VV: Down put Locked VV: Down get Locked Cancel pvljob Start PVR

  8. Remarks • Clone server is setup with an “alive” HPSS metadata backup • Operations are on progress • Disk & tape volumes are in use • Risk reductions : • Clone: Mark all movers & MPS “non executable” (w/ hpssadm) • Clone: Disable log archiving • Clone: Don’t start PVR • Clone: Cancel all PVL operations • Clone: Force dismount of tapes • Clone: Wait for timeout • Clone: Start/Stop Core server many times • Prod : lock tape drives still displayed as “in use” in clone • Clone: Startup PVR, restart PVL and wait for PVR timeout • Clone: Shutdown & restart CORE, PVL, PVR • Take care of all the error messages on both systems • For recovering : • stage files before transfers • Take care of the position of files on the tape for optimizing transfers

More Related