In this post we are going to see how a failed disk is to be replaced.It is very usual ,we will face hard errors on local disks.
To avoid data loss in such cases,we will be maintaining a mirror copy of root disk.
So when a root disk is under SVM we need to detach and unconfigure the mirror disk to replace it.
In below description, my server is having 2 disks and under SVM with RAID 1 (mirroring).
mysrv1 # echo |format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c1t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000000875c9aa4,0
1. c1t1d0 <HITACHI-HUS1014FASUN146G-2A08 cyl 14087 alt 2 hd 24 sec 848>
/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000000875cbe35,0
Specify disk (enter its number): Specify disk (enter its number):
root@mysrv1 #
root@mysrv1 #
We can check the disk status with iostat -en.
root@mysrv1 # iostat -en |grep -i c1t1d0
23 4 0 27 c1t1d0
To make sure regarding hard errors we cand check the dmesg errors and the metastat output.
root@mysrv1 # metastat -c
d20 m 30GB d21 d22 (maint)
d21 s 30GB c1t0d0s1
d22 s 30GB c1t1d0s1 (maint)
d10 m 14GB d11 d12 (maint)
d11 s 14GB c1t0d0s0
d12 s 14GB c1t1d0s0 (maint) ----- Disk is in maintenance state.
Now we need to detach metadevice and clear the metas and metadb related to failed disk.
root@mysrv1#metadetach d20 d22
root@mysrv1 #
root@mysrv1 #metadetach d10 d12
root@mysrv1 #
root@mysrv1 #metastat -c
d20 m 30GB
d21 s 30GB c1t0d0s1
d10 m 14GB
d11 s 14GB c1t0d0s0
d22 s 30GB c1t1d0s1 (maint)
d12 s 14GB c1t1d0s0 (maint)
root@mysrv1 #
root@mysrv1 #
root@mysrv1 #metaclear d20 d10
root@mysrv1 #
root@mysrv1 #
root@mysrv1 #metadb -d /dev/dsk/c1t1d0s7
Unconfigure the disk, cfgadm -al :
root@mysrv1 # cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
c1 fc-private connected configured unknown
c1::dsk/c1t1d0 disk connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb0/3 unknown empty unconfigured ok
usb0/4 unknown empty unconfigured ok
root@mysrv1 #
root@mysrv1 #
root@mysrv1 # cfgadm
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c1 fc-private connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb0/3 unknown empty unconfigured ok
usb0/4 unknown empty unconfigured ok
root@mysrv1 #
root@mysrv1 #
Now unconfigure the failed disk ,
root@mysrv1 #
root@mysrv1 #cfgadm -c unconfigure c1::dsk/c1t1d0
root@mysrv1 #
root@mysrv1 #
root@mysrv1 # cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
c1 fc-private connected configured unknown
c1::dsk/c1t1d0 disk connected unconfigured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb0/3 unknown empty unconfigured ok
usb0/4 unknown empty unconfigured ok
root@mysrv1 #
root@mysrv1 #
Remove the failed disk and insert the new disk.After inserting configure the new disk again.
root@mysrv1 #cfgadm -c configure c1::dsk/c1t1d0
root@mysrv1 #
root@mysrv1 # cfgadm
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c1 fc-private connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb0/3 unknown empty unconfigured ok
usb0/4 unknown empty unconfigured ok
root@mysrv1 #
Now disk is ready , so copy the prtvtoc of primary disk and create metadb and metadevices.
Now re-attach the metadevices created from new disk.
root@mysrv1 #
root@mysrv1 #prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c1t1d0s2
root@mysrv1 #
root@mysrv1 #metadb -a -c 3 c1t1d0s7
root@mysrv1 #
root@mysrv1 # metadb
flags first blk block count
a m p luo 16 8192 /dev/dsk/c1t0d0s7
a p luo 8208 8192 /dev/dsk/c1t0d0s7
a p luo 16400 8192 /dev/dsk/c1t0d0s7
a u 16 8192 /dev/dsk/c1t1d0s7
a u 8208 8192 /dev/dsk/c1t1d0s7
a u 16400 8192 /dev/dsk/c1t1d0s7
root@mysrv1 #
root@mysrv1 #
root@mysrv1 #metainit d12 1 1 c1t1d0s0
root@mysrv1 #
root@mysrv1 #metainit d22 1 1 c1t1d0s1
root@mysrv1 #
root@mysrv1 #metattach d10 d12
root@mysrv1 #
root@mysrv1 #metattach d20 d22
root@mysrv1 #
root@mysrv1 # metastat -c
d20 m 30GB d21 d22 (resync-21%)
d21 s 30GB c1t0d0s1
d22 s 30GB c1t1d0s1
d10 m 14GB d12 d11 (resync-35%)
d11 s 14GB c1t0d0s0
d12 s 14GB c1t1d0s0
root@mysrv1 #
Thus Failed disk is replaced and new disk is configured and attached.
##################################################################################
To avoid data loss in such cases,we will be maintaining a mirror copy of root disk.
So when a root disk is under SVM we need to detach and unconfigure the mirror disk to replace it.
In below description, my server is having 2 disks and under SVM with RAID 1 (mirroring).
mysrv1 # echo |format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c1t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000000875c9aa4,0
1. c1t1d0 <HITACHI-HUS1014FASUN146G-2A08 cyl 14087 alt 2 hd 24 sec 848>
/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000000875cbe35,0
Specify disk (enter its number): Specify disk (enter its number):
root@mysrv1 #
root@mysrv1 #
We can check the disk status with iostat -en.
root@mysrv1 # iostat -en |grep -i c1t1d0
23 4 0 27 c1t1d0
To make sure regarding hard errors we cand check the dmesg errors and the metastat output.
root@mysrv1 # metastat -c
d20 m 30GB d21 d22 (maint)
d21 s 30GB c1t0d0s1
d22 s 30GB c1t1d0s1 (maint)
d10 m 14GB d11 d12 (maint)
d11 s 14GB c1t0d0s0
d12 s 14GB c1t1d0s0 (maint) ----- Disk is in maintenance state.
Now we need to detach metadevice and clear the metas and metadb related to failed disk.
root@mysrv1#metadetach d20 d22
root@mysrv1 #
root@mysrv1 #metadetach d10 d12
root@mysrv1 #
root@mysrv1 #metastat -c
d20 m 30GB
d21 s 30GB c1t0d0s1
d10 m 14GB
d11 s 14GB c1t0d0s0
d22 s 30GB c1t1d0s1 (maint)
d12 s 14GB c1t1d0s0 (maint)
root@mysrv1 #
root@mysrv1 #
root@mysrv1 #metaclear d20 d10
root@mysrv1 #
root@mysrv1 #
root@mysrv1 #metadb -d /dev/dsk/c1t1d0s7
Unconfigure the disk, cfgadm -al :
root@mysrv1 # cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
c1 fc-private connected configured unknown
c1::dsk/c1t1d0 disk connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb0/3 unknown empty unconfigured ok
usb0/4 unknown empty unconfigured ok
root@mysrv1 #
root@mysrv1 #
root@mysrv1 # cfgadm
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c1 fc-private connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb0/3 unknown empty unconfigured ok
usb0/4 unknown empty unconfigured ok
root@mysrv1 #
root@mysrv1 #
Now unconfigure the failed disk ,
root@mysrv1 #
root@mysrv1 #cfgadm -c unconfigure c1::dsk/c1t1d0
root@mysrv1 #
root@mysrv1 #
root@mysrv1 # cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
c1 fc-private connected configured unknown
c1::dsk/c1t1d0 disk connected unconfigured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb0/3 unknown empty unconfigured ok
usb0/4 unknown empty unconfigured ok
root@mysrv1 #
root@mysrv1 #
Remove the failed disk and insert the new disk.After inserting configure the new disk again.
root@mysrv1 #cfgadm -c configure c1::dsk/c1t1d0
root@mysrv1 #
root@mysrv1 # cfgadm
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c1 fc-private connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb0/3 unknown empty unconfigured ok
usb0/4 unknown empty unconfigured ok
root@mysrv1 #
Now disk is ready , so copy the prtvtoc of primary disk and create metadb and metadevices.
Now re-attach the metadevices created from new disk.
root@mysrv1 #
root@mysrv1 #prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c1t1d0s2
root@mysrv1 #
root@mysrv1 #metadb -a -c 3 c1t1d0s7
root@mysrv1 #
root@mysrv1 # metadb
flags first blk block count
a m p luo 16 8192 /dev/dsk/c1t0d0s7
a p luo 8208 8192 /dev/dsk/c1t0d0s7
a p luo 16400 8192 /dev/dsk/c1t0d0s7
a u 16 8192 /dev/dsk/c1t1d0s7
a u 8208 8192 /dev/dsk/c1t1d0s7
a u 16400 8192 /dev/dsk/c1t1d0s7
root@mysrv1 #
root@mysrv1 #
root@mysrv1 #metainit d12 1 1 c1t1d0s0
root@mysrv1 #
root@mysrv1 #metainit d22 1 1 c1t1d0s1
root@mysrv1 #
root@mysrv1 #metattach d10 d12
root@mysrv1 #
root@mysrv1 #metattach d20 d22
root@mysrv1 #
root@mysrv1 # metastat -c
d20 m 30GB d21 d22 (resync-21%)
d21 s 30GB c1t0d0s1
d22 s 30GB c1t1d0s1
d10 m 14GB d12 d11 (resync-35%)
d11 s 14GB c1t0d0s0
d12 s 14GB c1t1d0s0
root@mysrv1 #
Thus Failed disk is replaced and new disk is configured and attached.
##################################################################################
No comments:
Post a Comment