High availability with “failure groups” in ASM

Today we would like to share with you an ASM functionality that we don’t usually use too much but that can help us to improve the high availability of our environment.

This functionality is ASM’s Failure Groups. Oracle defines them as the way to group disks that have a single point of failure and allows us to define in the same group the disks that share the same fiber card or cabin.

In this example, we have a multi-node rac that uses two different cabins in the same location. This functionality allows us for example to update the firmware of one of them while continuing to service from the second. An example of a diskgroup information using this functionality is as follows:

CREATE DISKGROUP INFORMATION TEST NORMAL REDUNDANCY
FAILGROUP failure_group_1 DISK
‘ ORCL: DISK6 ‘ NAME diskcabina1_a1,
‘ ORCL: DISK7 ‘ NAME diskcabina1_a2
FAILGROUP failure_group_2 DISK
‘ ORCL: DISK8 ‘ NAME diskcabina2_b1,
‘ ORCL: DISK9 ‘ NAME diskcabina2_b1;

This example creates a diskgroup information with normal redundancy and two disks in each cabin. This allows us to intervene in any of them without causing a fall of the instance. How long can the cabin be in maintenance?, we can define it with the parameter disk_repair_time. An example of utilization is as follows:

 Alter DISKGROUP information test set attribute ‘ disk_repair_time ‘ = ‘ 10h ‘; 

The default value is 3.6 hours, but in some cases may be insufficient to solve the problem, so it is recommended to increase. This parameter allows ASM Fast Mirror Resync functionality to update data on disks that have not been available without having to recreate the bug group. You have to be careful to upload it too because the space reserved in the DiskGroup information for this functionality increases and we subtract space for data.

Once the problem has been fixed in the access to the disks we can indicate it to the ASM by means of the command:

ALTER DISKGROUP INFORMATION test ONLINE ALL;

It allows us to put all the disks in the cockpit of that diskgroup information with a single command.

Once we have checked that they have been put online, the data synchronization work will begin and we will have the discs updated at the moment.

It is necessary to be careful to carry out the operatives in the second cabin without having put the ones of the first one since then we would have a problem of availability.

We hope you will be useful.

Greetings.

 

DBA Team.