Pages

IBM SAN Volume Controller and SRM issues

If anyone has had the pleasure of performing an SRM implementation with the IBM SAN Volume Controller (SVC) they will know what a gem the IBM product can be (specifically the SRA). I've run into several show-stopper bugs lately (a post is coming about the SRA nightmare in a few days), but I happened to run across another minor issue that may escape even the most careful eyes when working with the SVC.

If you are having issues attempting to run test failovers (much less an actual failover), you may notice that the the SVC is showing the LUN mirror state as 'idle' and it refuses to change state without manually breaking the mirror and re-synchronizing. This can take a long time, especially with larger LUNs. This 'idle' state is the root of so many issues, so you need to resolve this first to see if anything else in the environment needs tweaking.

So, to avoid running into this problem, you want to take a look at two places and make sure your configuration is correct.

The first is via the SVC SRA utility that IBM provides. Aside from filling in the appropriate values in the boxes, you want to make sure that the box for "Auto-Switch" replication is checked.  This is what reverses the mirror direction after a failover and prevents the LUN mirror state from going into an 'idle' state.Do this on both the Protected and Recovery sites to ensure an easy fail-back.

The second area to look at is the copy direction and primary relationship of the LUN in question. The best way to head this off is to set these options when you first start the copy process right after creating the LUN (although you can also do it later if you need to restart the copying process again for some reason). Make sure you set the copy direction as "Master -> Auxiliary (primary=master)" I believe the default is set to "Do not set or change the copy direction" and this can result in problems later on.

After double checking those two areas, re-start the copy process and let it complete. The LUN copy state should now show "Consistent Synchronized" This is good! Note that when you perform an actual fail over, the primary relationship will flip from "primary = master" to "primary = auxiliary". This is how it is supposed to look. If you go back into SRM and setup everything to fail back, the relationship will flip back afterward.

The key thing to look for when you are getting errors with IBM SVC and SRM is what copy state the mirror is in. 'Idling' is not good and will give you nothing but grief. Ideally, the rest state should always show "Consistent Synchronized"

This was tested and confirmed with SRM version 4.1 (not 4.1.1) and IBM SVC version 5.1.0.2

Hope this helps a few of you out there!

2 comments:

  1. What specific version of the SRA were you using? Was this 1.20.71310?

    ReplyDelete
  2. I was actually using a custom version of the SRA that IBM had re-coded for this client due to some other issues. I don't remember the actual SRA number, but it may have been released as GA shortly after this.

    ReplyDelete