How-to: Hot-Swap a HBA with RDAC Failover

///How-to: Hot-Swap a HBA with RDAC Failover

How-to: Hot-Swap a HBA with RDAC Failover

I have had to perform this on several occassions and I thought it would be useful to share my experience with you all and although this procedure is very well detailed in the Storage Manager Installation and Support Guide, which can be viewed in PDF format by following this IBM link I thought I would copy and publish the details here.

Before we start please be aware of the following:

Hot-Swap of a HBA is NOT supported in single HBA configurations

Complete the procedures in this section to prepare for the hot swap.

Collecting system data

In preparation for the hot swap procedure, complete the following steps to collect data from the system:

1. Type the following command:
# lsdev -C |grep fcs

The output is similar to the following example:
fcs0 Available 17-08 FC Adapter
fcs1 Available 1A-08 FC Adapter

2. Type the following command:
# lsdev -C |grep dac

The output is similar to the following example:
dac0 Available 17-08-02 1815 DS4800 Disk Array Controller
dac1 Available 1A-08-02 1815 DS4800 Disk Array Controller

3. Type the following command for each of the fcs devices:
# lscfg -vpl fcsX      where X is the number of the fcs device.

The output looks similar to the following example:

lscfg--vpl-fcs0
4. Type the following command:
# lsdev -C |grep dar

The output looks similar to the following example:

dar0 Available 1815 DS4800 Disk Array Router
dar1 Available 1815 DS4800 Disk Array Router

5. Type the following command to list the attributes of each dar found on the system:
# lsattr -El darX         where X is the number of the dar.

The output looks similar to the following example:

lsattr--El-dar0

Verifying that autorecovery is disabled

Before you perform the hot swap, you must complete the following steps to ensure that autorecovery is disabled on every dar that is involved with the HBA you want to hot swap:

1. Identify all the dac(s) that are involved with the HBA by typing the following command:
# lsdev -C|grep 11-08

The output looks similar to the following example:

lsdev--C-grep-11-08

2. Consult the lsattr command output that you collected in step 5 of the procedure “Collecting system data”

In the lsattr output, identify the dar(s) that list the dacs you identified in step 1 of this procedure.

3. For each dar that you identified in step 2, type the following command:
# lsattr -El darX |grep autorecovery     where X is the number of the dar.

The output looks similar to the following example:
# lsattr -El dar0 |grep autorecovery
autorecovery    no            Autorecover after failure is corrected         True
4. In the lsattr command output, verify that the second word is no. If the second word is set to yes, then autorecovery is currently enabled.

Important: For each dar on which autorecovery is enabled, you must disable it by setting the autorecovery ODM attribute to no. Do not proceed with the hot swap procedure until you complete this step and verify that autorecovery is disabled.

Replacing the hot swap HBA

Complete the following steps to replace the hot swap HBA:

1. Place the HBA that you want to replace into the Defined state by typing the following command:
# rmdev -Rl fcsX       where X is the number of the HBA.

The output is similar to the following example:

rmdev -Rl fcs0

fcnet0 Defined
dac0 Defined
fscsi0 Defined
fcs0 Defined

2. In the AIX smit menu, initiate the process that is required for the HBA hot swap by selecting smit → Devices → PCI Hot Plug Manager → Replace/Remove a PCI Hot Plug Adapter.

3. In the Replace/Remove a PCI Hot Plug Adapter window, select targeted HBA. A window displays that contains instructions for replacing the HBA.

4. Replace the HBA by following the smit instructions. Note: Do not reinstall the fibre channel cable at this time.

5. If the steps in this procedure are completed successfully up to this point, you obtain the following results:

  • The defective HBA is removed from the system.
  • The replacement FC HBA is powered on.
  • The associated fcsX device is in the Defined state.

Before continuing, verify that these results have been obtained.

6. Install the fibre channel loop back on the replacement HBA.

7. Place the HBA into the Active state by typing the following command:
# cfgmgr

Note: The new HBA is placed in the default group. If the default group has hdisks assigned to it then the HBA will generate a new dar and dac, which will cause a split. Issue the rmdev command to remove the new dar and dac after mapping the WWPN.

8. Verify that the fcs device is now available by typing the following command:
# lsdev -C |grep fcs

9. Verify or upgrade the firmware on the replacement HBA to the appropriate level by typing the following command:
# lscfg -vpl fcsX       where X is the number of the fcs.

Note: You can determine the HBA firmware level by referring to the fcsX device data that you gathered during the procedure at the start of this section, “Collecting system data”

10. Record the 16-digit number that is associated with Network Address, as it was displayed in the output of the command you used in step 9. This Network Address number will be used in the next procedure, manually map the replacement HBA’s WWPN to the Storage subsystem(s).

11. Place the HBA back into the Defined state by typing the following command:
# rmdev -Rl fcsX

When you have completed this procedure, continue to the next procedure.

Mapping the new WWPN to the DS4000 storage subsystem

For each DS4000 storage subsystem that is affected by the hot swap, complete the following steps to map the worldwide port name (WWPN) of the HBA to the storage subsystem:

1. Start DS4000 Storage Manager and open the Subsystem Management window.

2. In the Mapping View of the Subsystem Management window, select Mappings → Show All Host Port Information. The Host Port Information window displays.

3. Using the data that you collected during the procedure “Collecting system data”, find the entry in the Host Port Information window that matches the WWPN of the “defective” HBA (the HBA that you removed), and record the alias name. Then, close the Host Port Information window.

4. In the Mapping View, select the alias name of the HBA host port that you just recorded.

5. Select Mappings → Replace Host Port. The Replace Host Port window opens.

6. In the Replace Host Port window, verify that the current HBA Host Port Identifier, which is listed at the top of the window, exactly matches the WWPN of the HBA that you removed.

7. Type the 16-digit WWPN, without the : (colon), of the replacement HBA in the New Identifier field, and click OK.

When you have completed these steps continue to the next procedure.

Completing the HBA hot swap procedure

Complete the following steps to finish replacing the hot swap HBA:

1. Remove the fibre channel loop back plug, and insert the fibre channel cable that was previously attached to the HBA that you removed.

2. If HBA is attached to a fibre channel switch, and the zoning is based on WWPN, modify the zoning information to replace the WWPN of the former HBA with the WWPN of the replacement HBA. (Run cfgmgr at this time to allow the HBA to register its WWPN in the fibre channel switch.)

Important: Skip this step if the HBA is directly attached to the DS4000 subsystem, or if the fibre channel switch zoning is based on port numbers instead of WWPNs. If you do need to modify the zoning, failure to correctly do so will prevent the HBA from accessing the storage subsystem.

3. Run the cfgmgr command.

4. Type the following commands to verify that the replaced fcsX device and its associated dac(s) are placed in the Available state:
# lsdev -C |grep fcs
# lsdev -C |grep dac

5. Type the following step to verify that no additional dar(s) have been created and that the expected dar(s) are in the Available state. (Refer to the data that you collected during the procedure “Collecting system data” to compare the original number of dar(s) to the number that is now reported by the system.)
# lsdev -C |grep dar

Caution: The presence of additional dar(s) in the lsdev output indicates a configuration problem. If this occurs, do not continue this procedure until you correct the problem, Loss of data availability can occur.

6. For each dar, type the following command to verify that affected dar attributes indicate the presence of two active dac(s):
# lsattr -El darX|grep act_controller  where X is the number of the dar.

The output looks similar to the following:
lsattr -El dar0|grep act_controller
act_controller           dac0,dac2             Active Controllers                  False

Caution: If two dacs are not reported for each affected dar, loss of data availability can occur. Do not continue this procedure if two dac(s) are not reported for each dar. Correct the problem before continuing.

7. Using the Storage Manager manually redistribute volumes to preferred paths.

8. Verify that disks stay on preferred path by using one or both of the following methods:

Using AIX system

Run the fget_config -Av command, and verify that drives are on expected path

Using Storage Manager

In the Enterprise Management window, verify that the storage subsystem(s) are Optimal. If they are not Optimal, verify that any drives that are part of the subsystems involved with hot swap process are not listed in the Recovery GURU.

9. If necessary, enable autorecovery of the affected dar(s) at this time.

Result: The fibre channel HBA hot swap is now complete.

By | 2017-10-04T10:54:26+00:00 Tuesday, October 6th, 2009|IBM / AIX, My Work|0 Comments

About the Author:

I am truly lucky to have found Sharon Garratt, a wonderful partner to share my passions for food, technology, photography and travel with. I really don't know how she puts up with me.