1 / 134

VNX Parts Replacements

VNX Parts Replacements. Boris Sobolev Lennox Robin. Parts replacement. Each procedure consist : 1. Before you go onsite - ETA’s / Warnings Primus can be found here – http://csexplorer.isus.emc.com/eservice/iviewcs/ui/eserver.asp

tiara
Download Presentation

VNX Parts Replacements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VNX Parts Replacements Boris Sobolev Lennox Robin

  2. Parts replacement Each procedure consist : 1. Before you go onsite - ETA’s / Warnings Primus can be found here – http://csexplorer.isus.emc.com/eservice/iviewcs/ui/eserver.asp 2. Collect environment configuration informationRun SP collect Using available tools and review SP collects 3. Handling FRU’s 4. Part replacement specific information

  3. Parts replacement – USM • Unisphere Service Manager (USM) can be downloaded from –

  4. Parts replacement – parts which can be replaced by user on Block

  5. Parts replacement – Other Hardware

  6. Parts replacement – Other Hardware https://support.emc.com/ https://mydocs.emc.com/VNX/

  7. Parts replacement – Other Hardware https://mydocs.emc.com/VNX/

  8. Parts replacement – Other Hardware https://mydocs.emc.com/VNX/

  9. Parts replacement – Other Hardware https://mydocs.emc.com/VNX/

  10. Parts replacement – Other Hardware https://mydocs.emc.com/VNX/

  11. Parts replacement – Common Tasks • Disable ConnectHome and Email notifications. • Enable ConnectHome and Email notifications. • Disable write cache. • Enable write cache. • Restore trespassed LUNs. • Check system state.

  12. Parts replacement – Common Tasks 1. Disable ConnectHome and Email notifications. Use a HyperTerminal session to disable ConnectHome must be SU: a. From the root directory, disable ConnectHome: # /nas/sbin/nas_connecthome -service stop b. Disable the email notification service: # /nas/bin/nas_emailuser -modify -enabled no c. Verify that the email notification service has stopped (is not enabled): # /nas/bin/nas_emailuser -info ConnectHome and email notifications are now disabled.

  13. Parts replacement – Common Tasks 2. Enable ConnectHome and email notification a. From the root directory, clear any existing ConnectHome files and enable ConnectHome: # /nas/sbin/nas_connecthome -service start –clear b. From the ConnectHome configuration, determine the connections that are enabled: # /nas/sbin/nas_connecthome -i c. Verify that ConnectHome works with the /nas/sbin/nas_connecthome -test connec tion_name command for each enabled connection. For example: # /nas/sbin/nas_connecthome -t -email_1 or # /nas/sbin/nas_connecthome -t -email_2 or # /nas/sbin/nas_connecthome -t -https or # /nas/sbin/nas_connecthome -t -modem_1

  14. Parts replacement – Common Tasks 2. Enable ConnectHome and email notification (continue) d. Verify email notifications is configured: # /nas/bin/nas_emailuser -info If the Recipient Address(es) field is empty, email notifications has not been configured and does not need to be enabled. If you want to configure email notifications, use the /nas/bin/nas_emailuser command or Unisphere to configure it. If the Recipient Address(es) field is populated, email notifications was enabled. Re-enable email notifications: e. Enable email notifications: # /nas/bin/nas_emailuser -modify -enabled yes f. Verify that email notification works: # /nas/bin/nas_emailuser -info g. Test the configuration and verify that the configured Recipient Address(es) received the test email: # /nas/bin/nas_emailuser -test

  15. Parts replacement – Common Tasks 3. Disable write cache Display and record the current write cache settings: # /nas/sbin/naviseccli -h <IP_address> -user <name> -password <password> -scope 0 getcache |grep "Cache Size“ Disable and zero out the system write cache: # /nas/sbin/naviseccli -h <IP_address> -user <name> -password <password> -scope 0 setcache -wsz 0 -wc 0

  16. Parts replacement – Common Tasks 4. Enable write cache. Using the open HyperTerminal session, set the write cache size to match the previous setting [root@VNX5700-CS0 nasadmin]# grep SP /etc/hosts 10.5.22.160 A_APM00112800336 SPA # CLARiiON SP 10.5.22.161 B_APM00112800336 SPB # CLARiiON SP # /nas/sbin/naviseccli -h <IP_address> -user <name> -password <password> -scope 0 setcache -wsz <write_cache_size> Enable the write cache: # /nas/sbin/naviseccli -h <IP_address> -user <name> -password <password> -scope 0 setcache -wc 1

  17. Parts replacement – Common Tasks 5. Restore trespassed LUNs. Using the CLI, do the following: a. Log in to the primary Control Station as nasadmin and change to the root user: $ su root b. Determine the storage-system serial number (storage-system ID): # nas_storage –list [nasadmin@VNX5700-CS0 ~]$ nas_storage -list id acl name serial_number • 0 APM00112800336 APM00112800336 c. Restore the LUNs to the correct SP: # nas_storage -failback storage-system-name For example: # nas_storage -failback APM00070300923 id = 1 serial_number = APM00070300923 name = APM00070300923 acl = 0 done

  18. Parts replacement – Common Tasks 5. Restore trespassed LUNs. (Continue) To restore all LUN’s which by default owned by SP use “mine” command, “mine” command must be issued from the SP that the LUN will trespass to.  C:\>naviseccli -h 10.5.22.177 trespass mine C:\>naviseccli -h 10.5.22.177 getlun -default -owner LOGICAL UNIT NUMBER 29 Default Owner: SP B Current owner: SP B LOGICAL UNIT NUMBER 18 Default Owner: SP A Current owner: SP A LOGICAL UNIT NUMBER 17 Default Owner: SP B Current owner: SP B LOGICAL UNIT NUMBER 20 Default Owner: SP A Current owner: SP A

  19. Parts replacement – Common Tasks 5. Restore trespassed LUNs. (Continue) For one LUN: C:\>naviseccli -h 10.5.22.193 getlun 2 -default -owner Default Owner: SP B Current owner: SP A C:\>naviseccli -h 10.5.22.193 trespass lun 2 Error: trespass command failed This command must be issued from the SP that the LUN will trespass to C:\>naviseccli -h 10.5.22.194 trespass lun 2 C:\>navicli -h 10.5.22.194 getlun 2 -default -owner Default Owner: SP B Current owner: SP B

  20. Parts replacement – Common Tasks 6. Check system state. To view the system state enter the following command: # /nas/bin/nas_checkup Example: # /nas/bin/nas_checkup Check Version: <NAS_version> Check Command: /nas/bin/nas_checkup Check Log : /nas/log/checkup-run.100128-181007.log -------------------------------------Checks------------------------------------- Control Station: Checking if file system usage is under limit.............. Pass Control Station: Checking if NAS Storage API is installed correctly........ Pass If the output of the nas_checkup command indicates any problems, correct the problems and re-do the command before continuing.

  21. Parts replacement – Common Tasks 6. Check system state (Continue)

  22. Parts replacement - Drive Summary 1. Diagnose and identify the CRU to replace 2. Download and install the USM 3. Verify that you do not have multiple failure situation 4. Run the Disk Replacement wizard 5. Replace the drive

  23. Parts replacement - Drive • Diagnose and identify the CRU to replace • The amber fault indicator lit on the disk module • The Unisphere “Fault report” indicates no problems other than a single disk failure • A “CRU removed” (920c, xx0d) message in the event log; and • Verify in the event log shows no “error” or “critical error” events for either other disks or other components.

  24. Parts replacement - Drive Diagnose and identify the CRU to replace

  25. Parts replacement - Drive Diagnose and identify the CRU to replace

  26. Parts replacement - Drive How do I know when a single disk module is faulted and should be replaced? When is it not OK to remove or replace a disk module? How should I proceed when more than a single drive indicates a fault?

  27. Parts replacement - Drive When is it not OK to remove or replace a disk module? • An NDU is in progress • The drive is protected by RAID type 0 • An array component in addition to a disk is indicating a fault • More than one disk is indicating a problem

  28. Parts replacement - Drive When is it not OK to remove or replace a disk module? 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 LCC B LCC A

  29. Parts replacement - Drive When is it not OK to remove or replace a disk module? 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 LCC B LCC A RG 2 RG 3 RG 1 Raid 1 Raid 1/0 Raid 3 Raid 5 Raid 6

  30. Parts replacement - Drive When is it not OK to remove or replace a disk module? Parity Data 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 LCC B LCC A RG 2 RG 3 RG 1 Another (previously replaced) disk is equalizing or rebuilding in the same RAID group Remember that removing more than one drive from a RAID group will certainly cause loss of data availability, and may cause loss of data.

  31. Parts replacement - Drive What is going on when the drive failed ? D1 D2 D3 P123 H

  32. Parts replacement - Drive What is going on when the drive failed ? X D1 D2 D3 P123 H

  33. Parts replacement - Drive What is going on when the drive failed ? Hot spare will be invoked X D1 D2 D3 P123 H

  34. Parts replacement - Drive What is going on when the drive failed ? Hot spare is rebuilt using XOR Calculation X D1 D2 D3 P123 H

  35. Parts replacement - Drive What is going on when the drive failed ? Hot Spare assumes personality of the failed drive. X D1 D2 D3 P123 D3 Use event logs to verify that rebuilt finished – look for: A 08/05/06 03:13:22 Bus2 Enc0 Dsk8 67d All rebuilds for a FRU have completed

  36. Parts replacement - Drive What is going on when a bad drive is replaced ? When bad drive is replaced a hot spare starts equalizing to a new drive X D1 D2 P123 D3

  37. Parts replacement - Drive What is going on when a bad drive is replaced ? When a hot spare finished equalizing to a new drive host spare becomes available to other drives in the clariion. D1 D2 P123 H D3

  38. Parts replacement - Drive How to find out which disks belong to the raid group?

  39. Parts replacement - Drive How to find out which disks belong to the raid group?

  40. Parts replacement - Drive How to find out which disks belong to the raid group?

  41. Parts replacement - Drive What should I do when more than a single drive is faulted in the same RG? • Identify which drive reported a failure. • Run the SP collect script on both of the storage processors • Escalate to the call center.

  42. Parts replacement - Drive USM

  43. Parts replacement - Drive Disk Replacement Wizard

  44. Parts replacement - Drive Disk Replacement Wizard

  45. Parts replacement - Drive Disk Replacement Wizard

  46. Parts replacement - Drive Disk Replacement Wizard

  47. Parts replacement - Drive Disk Replacement Wizard

  48. Parts replacement - Drive Disk Replacement Wizard

More Related