ASM, REPLACE DISK Command
The REPLACE DISK command was released with 12.1 and allow to do an online replacement for a failed disk. This command is important because it reduces the rebalance time doing just the SYNC phase. Comparing with normal disk replacement (DROP and ADD in the same command), the REPLACE just do mirror resync.
Basically, when the REPLACE command is called, the rebalance just copy/sync the data from the survivor disk (the partner disk from the mirror). It is faster since the previous way with drop/add execute a complete rebalance from all AU of the diskgroup, doing REBALANCE and SYNC phase.
The replace disk command is important for the SWAP disk process for Exadata (where you add the new 14TB disks) since it is faster to do the rebalance of the diskgroup.
Below one example from this behavior. Look that AU from DISK01 was SYNCED with the new disk:
And compare with the previous DROP/ADD disk, where all AU from all disks was rebalanced:
Actual Environment And Simulate The failure
In this post, to simulate and show how the replace disk works I have the DATA diskgroup with 6 disks (DISK01-06). The DISK07 it is not in use.
SQL> select NAME,FAILGROUP,LABEL,PATH from v$asm_disk order by FAILGROUP, label;
NAME FAILGROUP LABEL PATH
------------------------------ ------------------------------ ---------- -----------
DISK01 FAIL01 DISK01 ORCL:DISK01
DISK02 FAIL01 DISK02 ORCL:DISK02
DISK03 FAIL02 DISK03 ORCL:DISK03
DISK04 FAIL02 DISK04 ORCL:DISK04
DISK05 FAIL03 DISK05 ORCL:DISK05
DISK06 FAIL03 DISK06 ORCL:DISK06
RECI01 RECI01 RECI01 ORCL:RECI01
SYSTEMIDG01 SYSTEMIDG01 SYSI01 ORCL:SYSI01
DISK07 ORCL:DISK07
9 rows selected.
SQL>
And to simulate the error I disconnected the disk from Operational system (since I used iSCSI, I just log off the target for DISK02:
[root@asmrec ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:tsn.eff4683320e8 -p 172.16.0.3:3260 -u
Logging out of session [sid: 41, target: iqn.2006-01.com.openfiler:tsn.eff4683320e8, portal: 172.16.0.3,3260]
Logout of [sid: 41, target: iqn.2006-01.com.openfiler:tsn.eff4683320e8, portal: 172.16.0.3,3260] successful.
[root@asmrec ~]#
At the same moment, the alertlog from ASM detected the error and informed that the mirror was found in another disk (DISK06):
2020-03-29T00:42:11.160695+01:00
WARNING: Read Failed. group:3 disk:1 AU:29 offset:0 size:4096
path:ORCL:DISK02
incarnation:0xf0f0c113 synchronous result:'I/O error'
subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so krq:0x7f3df8db35b8 bufp:0x7f3df8c9c000 osderr1:0x3 osderr2:0x2e
IO elapsed time: 0 usec Time waited on I/O: 0 usec
WARNING: cache failed reading from group=3(DATA) fn=8 blk=0 count=1 from disk=1 (DISK02) mirror=0 kfkist=0x20 status=0x02 osderr=0x3 file=kfc.c line=13317
WARNING: cache succeeded reading from group=3(DATA) fn=8 blk=0 count=1 from disk=5 (DISK06) mirror=1 kfkist=0x20 status=0x01 osderr=0x0 file=kfc.c line=13366
So, at this moment the DQISK02 will not be removed instantly, but after the disk_repair_time finish:
WARNING: Started Drop Disk Timeout for Disk 1 (DISK02) in group 3 with a value 43200
WARNING: Disk 1 (DISK02) in group 3 will be dropped in: (43200) secs on ASM inst 1
cluster guid (e4db41a22bd95fc6bf79d2e2c93360c7) generated for PST Hbeat for instance 1
If you want to check the full output from ASM alertlog you can access here at ASM-ALERTLOG-Output-Online-Disk-Error.txt
So, the actual diskgroup is
SQL> select NAME,FAILGROUP,LABEL,PATH from v$asm_disk order by FAILGROUP, label;
NAME FAILGROUP LABEL PATH
------------------------------ ------------------------------ ---------- -----------
DISK01 FAIL01 DISK01 ORCL:DISK01
DISK02 FAIL01
DISK03 FAIL02 DISK03 ORCL:DISK03
DISK04 FAIL02 DISK04 ORCL:DISK04
DISK05 FAIL03 DISK05 ORCL:DISK05
DISK06 FAIL03 DISK06 ORCL:DISK06
RECI01 RECI01 RECI01 ORCL:RECI01
SYSTEMIDG01 SYSTEMIDG01 SYSI01 ORCL:SYSI01
DISK07 ORCL:DISK07
9 rows selected.
SQL>
REPLACE DISK
Since the old disk was lost (by HW or something similar), it is impossible to put it again online. A new disk was attached to the server (DISK07 in this example) and this is added in the diskgroup.
So, we just need to execute the REPLACE DISK command:
SQL> alter diskgroup DATA
2 REPLACE DISK DISK02 with 'ORCL:DISK07'
3 power 2;
Diskgroup altered.
SQL>
The command is easy, we replace disk failed disk with the new disk path. And it is possible to replace more than one at the same time and specify the power of the rebalance too.
At ASM alertlog we can see a lot of messages about this replacement, but look that resync of the disk. The full output can be found here at ASM-ALERTLOG-Output-Replace-Disk.txt
Some points here:
2020-03-29T00:44:31.602826+01:00
SQL> alter diskgroup DATA
replace disk DISK02 with 'ORCL:DISK07'
power 2
2020-03-29T00:44:31.741335+01:00
NOTE: cache closing disk 1 of grp 3: (not open) DISK02
2020-03-29T00:44:31.742068+01:00
NOTE: GroupBlock outside rolling migration privileged region
2020-03-29T00:44:31.742968+01:00
NOTE: client +ASM1:+ASM:asmrec no longer has group 3 (DATA) mounted
2020-03-29T00:44:31.746444+01:00
NOTE: Found ORCL:DISK07 for disk DISK02
NOTE: initiating resync of disk group 3 disks
DISK02 (1)
NOTE: process _user20831_+asm1 (20831) initiating offline of disk 1.4042309907 (DISK02) with mask 0x7e in group 3 (DATA) without client assisting
2020-03-29T00:44:31.747191+01:00
NOTE: sending set offline flag message (2044364809) to 1 disk(s) in group 3
…
…
2020-03-29T00:44:34.558097+01:00
NOTE: PST update grp = 3 completed successfully
2020-03-29T00:44:34.559806+01:00
SUCCESS: alter diskgroup DATA
replace disk DISK02 with 'ORCL:DISK07'
power 2
2020-03-29T00:44:36.805979+01:00
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
2020-03-29T00:44:36.820900+01:00
NOTE: starting rebalance of group 3/0xf99030d7 (DATA) at power 2
After that, we can see the rebalance just take the SYNC phase:
SQL> select * from gv$asm_operation;
INST_ID GROUP_NUMBER OPERA PASS STAT POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE CON_ID
---------- ------------ ----- --------- ---- ---------- ---------- ---------- ---------- ---------- ----------- ---------- ----------
1 3 REBAL COMPACT WAIT 2 2 0 0 0 0 0
1 3 REBAL REBALANCE WAIT 2 2 0 0 0 0 0
1 3 REBAL REBUILD WAIT 2 2 0 0 0 0 0
1 3 REBAL RESYNC RUN 2 2 231 1350 513 2 0
SQL>
SQL> /
INST_ID GROUP_NUMBER OPERA PASS STAT POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE CON_ID
---------- ------------ ----- --------- ---- ---------- ---------- ---------- ---------- ---------- ----------- ---------- ----------
1 3 REBAL COMPACT WAIT 2 2 0 0 0 0 0
1 3 REBAL REBALANCE WAIT 2 2 0 0 0 0 0
1 3 REBAL REBUILD WAIT 2 2 0 0 0 0 0
1 3 REBAL RESYNC RUN 2 2 373 1350 822 1 0
SQL>
SQL> /
INST_ID GROUP_NUMBER OPERA PASS STAT POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE CON_ID
---------- ------------ ----- --------- ---- ---------- ---------- ---------- ---------- ---------- ----------- ---------- ----------
1 3 REBAL COMPACT REAP 2 2 0 0 0 0 0
1 3 REBAL REBALANCE DONE 2 2 0 0 0 0 0
1 3 REBAL REBUILD DONE 2 2 0 0 0 0 0
1 3 REBAL RESYNC DONE 2 2 1376 1350 0 0 0
SQL> /
no rows selected
SQL>
In the end, after the rebalance we have:
SQL> select NAME,FAILGROUP,LABEL,PATH from v$asm_disk order by FAILGROUP, label;
NAME FAILGROUP LABEL PATH
------------------------------ ------------------------------ ---------- -----------
DISK01 FAIL01 DISK01 ORCL:DISK01
DISK02 FAIL01 DISK07 ORCL:DISK07
DISK03 FAIL02 DISK03 ORCL:DISK03
DISK04 FAIL02 DISK04 ORCL:DISK04
DISK05 FAIL03 DISK05 ORCL:DISK05
DISK06 FAIL03 DISK06 ORCL:DISK06
RECI01 RECI01 RECI01 ORCL:RECI01
SYSTEMIDG01 SYSTEMIDG01 SYSI01 ORCL:SYSI01
8 rows selected.
SQL>
An important detail is that the NAME for the disk will not change, it is impossible to change using REPLACE DISK command. As you can see above, the disk named DISK02 has the label DISK07 (here this came from asmlib disk).
Know Issues
There is a known issue for REPLACE DISK for 18c and higher for GI where the rebalance can take AGES to finish. This occurs because (when replacing more than one disk per time), it executes the SYNC disk by disk. One example, for one Exadata the replace for a complete cell took more than 48 hours, while a DROP/ADD took just 12 hours for the same disks.
So, it is recommended to have the fix for Bug 30582481 and Bug 31062010 applied. The detail it is that patch 30582481 (Patch 30582481: ASM REPLACE DISK COMMAND EXECUTED ON ALL CELLDISKS OF A FAILGROUP, ASM RUNNING RESYNC ONE DISK AT A TIME) was withdraw and replaced by bug/patch 31062010 that it is not available (at the moment that I write this port – March 2020).
So, be careful to do this in one engineering system or when you need to replace a lot of disks at the same time.
Some reference for reading:
Disclaimer: “The postings on this site are my own and don’t necessarily represent my actual employer positions, strategies or opinions. The information here was edited to be useful for general purpose, specific data and identifications were removed to allow reach the generic audience and to be useful for the community.”