Total Pageviews

Friday, 14 March 2014

Failed Disk Replacement !!!

In this post we are going to see how a failed disk is to be replaced.It is very usual ,we will face hard errors on local disks.

To avoid data loss in such c
ases,we will be maintaining a mirror copy of root disk.

So when a root disk is under SVM we need to detach and unconfigure the mirror disk to replace it.

In below description, my server is having 2 disks and under SVM with RAID 1 (mirroring).

mysrv1 # echo |format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
       0. c1t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
         
/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000000875c9aa4,0
       1. c1t1d0 <HITACHI-HUS1014FASUN146G-2A08 cyl 14087 alt 2 hd 24 sec 848>
         
/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000000875cbe35,0
Specify disk (enter its number): Specify disk (enter its number):
root@mysrv1 #
root@mysrv1 #

We can check the disk status with iostat -en.

root@mysrv1 # iostat -en |grep -i c1t1d0  
23   4   0  27 c1t1d0

To make sure regarding hard errors we cand check the dmesg errors and the metastat output.

root@mysrv1 # metastat -c
d20              m   30GB d21 d22 (maint)    
    d21          s   30GB c1t0d0s1
    d22          s   30GB c1t1d0s1 (maint)

d10              m   14GB d11 d12 (maint)    
    d11          s   14GB c1t0d0s0
    d12          s   14GB c1t1d0s0 (maint)           ----- Disk is in maintenance state.


Now we need to detach metadevice and clear the metas and metadb related to failed disk.

root@mysrv1#metadetach d20 d22
root@mysrv1 #
root@mysrv1 #metadetach d10 d12
root@mysrv1 #
root@mysrv1 #metastat -c
d20              m   30GB
    d21          s   30GB c1t0d0s1
d10              m   14GB
    d11          s   14GB c1t0d0s0
d22          s   30GB c1t1d0s1 (maint)
d12          s   14GB c1t1d0s0 (maint)

root@mysrv1 #
root@mysrv1 #
root@mysrv1 #metaclear d20 d10
root@mysrv1 #
root@mysrv1 #
root@mysrv1 #metadb -d /dev/dsk/c1t1d0s7

Unconfigure the disk,    cfgadm -al :

root@mysrv1 # cfgadm -al
Ap_Id                          Type         Receptacle   Occupant     Condition
c0                             scsi-bus     connected    configured   unknown
c0::dsk/c0t0d0     CD-ROM       connected    configured   unknown
c1                             fc-private   connected    configured   unknown
c1::dsk/c1t1d0          disk         connected    configured   unknown

usb0/1                         unknown      empty        unconfigured ok
usb0/2                         unknown      empty        unconfigured ok
usb0/3                         unknown      empty        unconfigured ok
usb0/4                         unknown      empty        unconfigured ok

root@mysrv1 #
root@mysrv1 #
root@mysrv1 # cfgadm
Ap_Id                          Type         Receptacle   Occupant     Condition
c0                             scsi-bus     connected    configured   unknown
c1                             fc-private   connected    configured   unknown

usb0/1                         unknown      empty        unconfigured ok
usb0/2                         unknown      empty        unconfigured ok
usb0/3                         unknown      empty        unconfigured ok
usb0/4                         unknown      empty        unconfigured ok

root@mysrv1 #
root@mysrv1 #

Now unconfigure the failed disk ,

root@mysrv1 #
root@mysrv1 #cfgadm -c unconfigure c1::dsk/c1t1d0
root@mysrv1 #
root@mysrv1 #
root@mysrv1 # cfgadm -al
Ap_Id                          Type         Receptacle   Occupant     Condition
c0                             scsi-bus     connected    configured   unknown
c0::dsk/c0t0d0     CD-ROM       connected    configured   unknown
c1                             fc-private   connected    configured   unknown
c1::dsk/c1t1d0          disk         connected    unconfigured   unknown

usb0/1                         unknown      empty        unconfigured ok
usb0/2                         unknown      empty        unconfigured ok
usb0/3                         unknown      empty        unconfigured ok
usb0/4                         unknown      empty        unconfigured ok

root@mysrv1 #
root@mysrv1 #

Remove the failed disk and insert the new disk.After inserting configure the new disk again.

root@mysrv1 #cfgadm -c configure c1::dsk/c1t1d0
root@mysrv1 #
root@mysrv1 # cfgadm
Ap_Id                          Type         Receptacle   Occupant     Condition
c0                             scsi-bus     connected    configured   unknown
c1                             fc-private   connected    configured   unknown
usb0/1                         unknown      empty        unconfigured ok
usb0/2                         unknown      empty        unconfigured ok
usb0/3                         unknown      empty        unconfigured ok
usb0/4                         unknown      empty        unconfigured ok

root@mysrv1 #

Now disk is ready , so copy the prtvtoc of primary disk and create metadb and metadevices.
Now re-attach the metadevices created from new
disk.

root@mysrv1 #
root@mysrv1 #prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c1t1d0s2
root@mysrv1 #
root@mysrv1 #metadb -a -c 3 c1t1d0s7
root@mysrv1 #
root@mysrv1 # metadb
        flags           first blk       block count
     a m  p  luo        16              8192            /dev/dsk/c1t0d0s7
     a    p  luo        8208            8192            /dev/dsk/c1t0d0s7
     a    p  luo        16400           8192            /dev/dsk/c1t0d0s7
     a        u         16              8192            /dev/dsk/c1t1d0s7
     a        u         8208            8192            /dev/dsk/c1t1d0s7
     a        u         16400           8192            /dev/dsk/c1t1d0s7

root@mysrv1 #
root@mysrv1 #
root@mysrv1 #metainit d12 1 1 c1t1d0s0
root@mysrv1 #
root@mysrv1 #metainit d22 1 1 c1t1d0s1
root@mysrv1 #
root@mysrv1 #metattach d10 d12
root@mysrv1 #
root@mysrv1 #metattach d20 d22
root@mysrv1 #
root@mysrv1 # metastat -c
d20              m   30GB d21 d22 (resync-21%)
    d21          s   30GB c1t0d0s1
    d22          s   30GB c1t1d0s1

d10              m   14GB d12 d11 (resync-35%)
  
  d11          s   14GB c1t0d0s0
    d12          s   14GB c1t1d0s0

root@mysrv1 #

Thus Failed disk is replaced and new disk is configured and attached.

##################################################################################

No comments:

Post a Comment