Total Pageviews

Thursday, 20 October 2016

Using ASR (Automatic System Recovery) to enable/disable hardware !!!

Generally coming to OBP, we know many regular commands like probe-scsi, probe-scsi-all, reset-all, devalias, show-devs. We also have some commands which are used to troubleshoot hardware issues. In this post we are going to discuss about "asr" command usage.

We can disable or enable cpu/motherboards by using " ASR " command.

ASR stands for Automatic System Recovery. Using this we can disable hardware directly from the OBP.

In my scenario, one of our server is continuously rebooting (panic), then I took login from console to start troubleshooting...

rsc>
rsc>
rsc> console
rsc>
rsc>
rsc>               THIS IS THE SCENARIO SERVER IS BOOTING.....
rsc>
rsc>
rsc>
SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
EVENT-TIME: 0x54d12ddd.0xde0bcc8 (0x963e9d9a84)
PLATFORM: SUNW,Sun-Fire-V490, CSN: -, HOSTNAME: SRVR
SOURCE: SunOS, REV: 5.10 Generic_147440-27
DESC: Errors have been detected that require a reboot to ensure system
integrity.  See http://www.sun.com/msg/SUNOS-8000-0G for more information.
AUTO-RESPONSE: Solaris will attempt to save and diagnose the error telemetry
IMPACT: The system will sync files, save a crash dump if needed, and reboot
REC-ACTION: Save the error summary below in case telemetry cannot be saved

panic[cpu2]/thread=3000ccd7200: UE WDU Error(s)

!!!!!!!!!!!!!!!!!!!

000002a100f778d0 unix:pagefault+ac (1036e6000, 0, 2, 0, 60035099778, 2)
  %l0-3: 0000000000000004 0000000000000000 0000030003c36000 0000000000000000
  %l4-7: 000000000183e400 000000000183b400 0000000000000000 000006003b4d2cd0
000002a100f77990 unix:trap+d50 (2a100f77b90, 1036e64b9, 0, 2, ffffffff7826ff1c, 0)
  %l0-3: 0000000000000000 000006003b4d2cd0 0000000000010031 00000600353d2fa8
  %l4-7: 0000000000000000 0000000000010034 0000000000010000 000006003b4d2eb0

syncing file systems... [1] 112 [1] 74 [1] 4 [1] 4 [1] 4 [1] 4 [1] ...... 4 [1] 4 done (not all i/o completed)
ereport.cpu.ultraSPARC-IVplus.edu-st ena=963cb72b2c00801 detector=[ version=1
 scheme="cpu" cpuid=2 cpumask=22 serial="80001A58E75C3807" ] afsr=
 10000a00000003 afsr-ext=0 afar-status=1 afar=a3c9ea0250 pc=12a3870 tl=0 tt=63
 privileged=1 multiple=0 syndrome-status=1 syndrome=3 l3-cache-ways=4
 l3-cache-data=[...] l2-cache-ways=1 l2-cache-data=[...] dcache-ways=0
 icache-ways=0 resource=[ version=1 scheme="cpu" cpuid=2 cpumask=22 serial=
 "80001A58E75C3807" ]

...... OUTPUT TRUNCATED .........

ereport.cpu.ultraSPARC-IVplus.ce ena=963e9ce73804001 detector=[ version=1
 scheme="cpu" cpuid=10 cpumask=22 serial="80001A58C75C3807" ] afsr=
 100002000000b0 afsr-ext=0 afar-status=1 afar=a3e7122f40 pc=0 tl=0 tt=0
 privileged=1 multiple=0 syndrome-status=1 syndrome=b0 error-type="U"
 error-disposition=0 l3-cache-ways=0 l2-cache-ways=0 dcache-ways=0 icache-ways=
 0 resource=[ version=0 scheme="mem" unum="Slot A: J8001" ]

dumping to /dev/md/dsk/d20, offset 6873219072, content: kernel
 0:11 100% done
100% done: 169129 pages dumped, dump succeeded
rebooting...

Resetting ...

!!!!!!!!!!!!!!!!!!

RSC Alert: Host System has Reset

<*>
Software Reset

@(#)OBP 4.30.4.c 2010/09/29 09:42 Sun Fire 4XX
Online:  CMP0 UltraSPARC IV+ (v2.2) 10:1 1500MHz 32MB 5:1 ECache
Online:  CMP1 UltraSPARC IV+ (v2.2) 10:1 1500MHz 32MB 5:1 ECache
Online:  CMP2 UltraSPARC IV+ (v2.2) 10:1 1500MHz 32MB 5:1 ECache
Online: *CMP3 UltraSPARC IV+ (v2.2) 10:1 1500MHz 32MB 5:1 ECache
Skipping POST.
Enabling system bus....... CMP0 CMP1 CMP2 CMP3 Done
Init ICache/etc........... CMP0 CMP1 CMP2 CMP3 Done
Init ECache Tags.......... CMP0 CMP1 CMP2 CMP3 Done
Clearing TLBs............. CMP0 CMP1 CMP2 CMP3 Done
Setup I/DTLBs............. CMP0 CMP1 CMP2 CMP3 Done
Enabling Cache/MMUs....... CMP0 CMP1 CMP2 CMP3 Done
Init ECache Data.......... CMP0 CMP1 CMP2 CMP3 Done
Zeroing memory...Done
Copying FLASHRAM to memory...Verifying base 128KB...Done
Jumping into RAM (leaving slave CPUs in ROM)
RAM CRC = 0000.0000.b81b.5f23;  ROM CRC = 0000.0000.b81b.5f23
Dropping in...
Find dropin, Decompressing Done, Size 0000.0000.0007.fd30 (512KB)
Slave CPUs starting Forth at 0000.0000.f000.00e0
Boot  CPU3 starting Forth at 0000.0000.f000.00e0
Diagnostic console initialized
Configure root name: SUNW,Sun-Fire-V490
Probing system devices
(1500 MHz @ 10:1, 16 MB) /: gptwo at 0,0 cmp cpu cpu memory-controller
(1500 MHz @ 10:1, 16 MB) /: gptwo at 1,0 cmp cpu cpu memory-controller
(1500 MHz @ 10:1, 16 MB) /: gptwo at 2,0 cmp cpu cpu memory-controller
(1500 MHz @ 10:1, 16 MB) /: gptwo at 3,0 cmp cpu cpu memory-controller
/: gptwo at 4,0 Nothing there
/: gptwo at 5,0 Nothing there
/: gptwo at 6,0 Nothing there
/: gptwo at 7,0 Nothing there
/: gptwo at 8,0 pci pci
/: gptwo at 9,0 pci pci
Loading Support Packages: obp-tftp kbd-translator SUNW,i2c-ram-device SUNW,fru-device
Loading onboard drivers: ebus
/pci@9,700000/ebus@1: flashprom bbc power i2c i2c rtc gpio pmc rsc-control rsc-console serial
/pci@9,700000/ebus@1/i2c@1,2e: fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru nvram idprom fru fru
/pci@9,700000/ebus@1/i2c@1,30: temperature temperature temperature ioexp ioexp ioexp temperature ioexp ioexp ioexp ioexp temperature-sensor fru fru fru fru fru rscrtc
/memory: CMP0 Bank0  512 +  512 +  512 +  512 :    2 GB @  a000000000  8-way #0
/memory: CMP0 Bank1  512 +  512 +  512 +  512 :    2 GB @  a000000000  8-way #2
/memory: CMP0 Bank2  512 +  512 +  512 +  512 :    2 GB @  a000000000  8-way #4
/memory: CMP0 Bank3  512 +  512 +  512 +  512 :    2 GB @  a000000000  8-way #6
/memory: CMP1 Bank0  512 +  512 +  512 +  512 :    2 GB @  b000000000  8-way #0
/memory: CMP1 Bank1  512 +  512 +  512 +  512 :    2 GB @  b000000000  8-way #2
/memory: CMP1 Bank2  512 +  512 +  512 +  512 :    2 GB @  b000000000  8-way #4
/memory: CMP1 Bank3  512 +  512 +  512 +  512 :    2 GB @  b000000000  8-way #6
/memory: CMP2 Bank0  512 +  512 +  512 +  512 :    2 GB @  a000000000  8-way #1
/memory: CMP2 Bank1  512 +  512 +  512 +  512 :    2 GB @  a000000000  8-way #3
/memory: CMP2 Bank2  512 +  512 +  512 +  512 :    2 GB @  a000000000  8-way #5
/memory: CMP2 Bank3  512 +  512 +  512 +  512 :    2 GB @  a000000000  8-way #7
/memory: CMP3 Bank0  512 +  512 +  512 +  512 :    2 GB @  b000000000  8-way #1
/memory: CMP3 Bank1  512 +  512 +  512 +  512 :    2 GB @  b000000000  8-way #3
/memory: CMP3 Bank2  512 +  512 +  512 +  512 :    2 GB @  b000000000  8-way #5
/memory: CMP3 Bank3  512 +  512 +  512 +  512 :    2 GB @  b000000000  8-way #7
ChassisSerialNumber 0412KM4616
Probing I/O buses
/pci@8,600000: Device 1 Nothing there
/pci@8,600000: Device 2 Nothing there
/pci@8,700000: Device 2 Nothing there
/pci@8,700000: Device 3 QLGC,qlc fp disk QLGC,qlc fp disk
/pci@8,700000: Device 4 Nothing there
/pci@8,700000: Device 5 QLGC,qlc fp disk QLGC,qlc fp disk
/pci@8,700000: Device 6 ide disk cdrom
/pci@9,600000: Device 1 network
/pci@9,600000: Device 2 SUNW,qlc fp disk
/pci@9,700000: Device 1 usb
/pci@9,700000: Device 2 network
Configure root name: SUNW,Sun-Fire-V490           

UNABLE TO TRUNCATE OUTPUT AS IT IS NECESSARY TO UNDERSTAND.....

Probing system devices
(1500 MHz @ 10:1, 16 MB) /: gptwo at 0,0 cmp cpu cpu memory-controller
(1500 MHz @ 10:1, 16 MB) /: gptwo at 1,0 cmp cpu cpu memory-controller
(1500 MHz @ 10:1, 16 MB) /: gptwo at 2,0 cmp cpu cpu memory-controller
(1500 MHz @ 10:1, 16 MB) /: gptwo at 3,0 cmp cpu cpu memory-controller
/: gptwo at 4,0 Nothing there
/: gptwo at 5,0 Nothing there
/: gptwo at 6,0 Nothing there
/: gptwo at 7,0 Nothing there
/: gptwo at 8,0 pci pci
/: gptwo at 9,0 pci pci
Loading Support Packages: obp-tftp kbd-translator SUNW,i2c-ram-device SUNW,fru-device
Loading onboard drivers: ebus
/pci@9,700000/ebus@1: flashprom bbc power i2c i2c rtc gpio pmc rsc-control rsc-console serial
/pci@9,700000/ebus@1/i2c@1,2e: fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru nvram idprom fru fru
/pci@9,700000/ebus@1/i2c@1,30: temperature temperature temperature ioexp ioexp ioexp temperature ioexp ioexp ioexp ioexp temperature-sensor fru fru fru fru fru rscrtc
/memory: CMP0 Bank0  512 +  512 +  512 +  512 :    2 GB @  a000000000  8-way #0
/memory: CMP0 Bank1  512 +  512 +  512 +  512 :    2 GB @  a000000000  8-way #2
/memory: CMP0 Bank2  512 +  512 +  512 +  512 :    2 GB @  a000000000  8-way #4
/memory: CMP0 Bank3  512 +  512 +  512 +  512 :    2 GB @  a000000000  8-way #6
/memory: CMP1 Bank0  512 +  512 +  512 +  512 :    2 GB @  b000000000  8-way #0
/memory: CMP1 Bank1  512 +  512 +  512 +  512 :    2 GB @  b000000000  8-way #2
/memory: CMP1 Bank2  512 +  512 +  512 +  512 :    2 GB @  b000000000  8-way #4
/memory: CMP1 Bank3  512 +  512 +  512 +  512 :    2 GB @  b000000000  8-way #6
/memory: CMP2 Bank0  512 +  512 +  512 +  512 :    2 GB @  a000000000  8-way #1
/memory: CMP2 Bank1  512 +  512 +  512 +  512 :    2 GB @  a000000000  8-way #3
/memory: CMP2 Bank2  512 +  512 +  512 +  512 :    2 GB @  a000000000  8-way #5
/memory: CMP2 Bank3  512 +  512 +  512 +  512 :    2 GB @  a000000000  8-way #7
/memory: CMP3 Bank0  512 +  512 +  512 +  512 :    2 GB @  b000000000  8-way #1
/memory: CMP3 Bank1  512 +  512 +  512 +  512 :    2 GB @  b000000000  8-way #3
/memory: CMP3 Bank2  512 +  512 +  512 +  512 :    2 GB @  b000000000  8-way #5
/memory: CMP3 Bank3  512 +  512 +  512 +  512 :    2 GB @  b000000000  8-way #7
ChassisSerialNumber 0412KM4616
Probing I/O buses
/pci@8,600000: Device 1 Nothing there
/pci@8,600000: Device 2 Nothing there
/pci@8,700000: Device 2 Nothing there
/pci@8,700000: Device 3 QLGC,qlc fp disk QLGC,qlc fp disk
/pci@8,700000: Device 4 Nothing there
/pci@8,700000: Device 5 QLGC,qlc fp disk QLGC,qlc fp disk
/pci@8,700000: Device 6 ide disk cdrom
/pci@9,600000: Device 1 network
/pci@9,600000: Device 2 SUNW,qlc fp disk
/pci@9,700000: Device 1 usb
/pci@9,700000: Device 2 network

Sun Fire V490, No Keyboard
Copyright (c) 1998, 2010, Oracle and/or its affiliates. All rights reserved.
OpenBoot 4.30.4.c, 32768 MB memory installed, Serial #70989125.
Ethernet address 0:14:4f:3b:35:45, Host ID: 843b3545.

Creating CMP memory layout properties.

Reading temperature limits from FRUPROMs: CMP0/2 CMP1/3 BACKPLANE

Environmental monitor is ON
Rebooting with command: boot
Boot device: rootdisk  File and args:
SunOS Release 5.10 Version Generic_147440-27 64-bit
Copyright (c) 1983, 2012, Oracle and/or its affiliates. All rights reserved.
Hostname: SRVR

........... OUTPUT TRUNCATED ............

SRVR console login: root        ---------- AS YOU CAN SEE BEFORE TAKING LOGIN , IT STARTED BOOTING AGAIN ....

SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
EVENT-TIME: 0x54d12fbb.0x237c8904 (0x5af9a9771c)
PLATFORM: SUNW,Sun-Fire-V490, CSN: -, HOSTNAME: SRVR
SOURCE: SunOS, REV: 5.10 Generic_147440-27
DESC: Errors have been detected that require a reboot to ensure system
integrity.  See http://www.sun.com/msg/SUNOS-8000-0G for more information.
AUTO-RESPONSE: Solaris will attempt to save and diagnose the error telemetry
IMPACT: The system will sync files, save a crash dump if needed, and reboot
REC-ACTION: Save the error summary below in case telemetry cannot be saved

...... OUTPUT TRUNCATED .........

panic[cpu19]/thread=2a100a33c80: Fatal PCI UE Error

000002a100a5bd40 pcisch:ecc_intr+84 (6003340b920, 4, 6003340b910, fffffffffffffff8, 6003340b910, 3000010de38)
  %l0-3: 0000000000000000 0000000000004000 0000000000000000 0000000000000004
  %l4-7: 000000007be12800 00000000018e8c00 ffffffffffffffff 000003000010de98
000002a100a5bf50 unix:current_thread+164 (2, 180c000, 180c000, 0, ffffffffffffffff, 0)
  %l0-3: 000000000100777c 000002a100a32fe1 000000000000000e 0000000070008100
  %l4-7: 0000000000000000 0000000000000000 0000000000000000 000002a100a33890
000002a100a33930 0 (30003c5c000, 184c9d0, 0, ffffffffffffffff, 420d, 1814800)
  %l0-3: 0000000000000000 0000000000000001 0000000000000001 0000000000000000
  %l4-7: 0000000001000000 0000000000000002 0000030003c5c178 000000000000e193
000002a100a339e0 unix:idle+d4 (1814800, 0, 30003c5c000, ffffffffffffffff, 8, 1813000)
  %l0-3: 000006003492eff8 000000000000001b 0000000000000000 ffffffffffffffff
  %l4-7: 000006003492eff8 ffffffffffffffff 000000000184c9d0 0000000001063a88

syncing file systems... [2] 205 [2] 164 [2] 127 [2] 127 [2] 127 [2] 127rsc>
rsc>
rsc>

THIS TIME I AM LEFT WITH ONLY OPTION IS TO BREAK THE SYSTEM, USUALLY WHICH WE SHOULD NOT PREFER.

rsc> break
rsc>
rsc>
rsc> break
rsc>
rsc>
rsc> console

{13} ok
{13} ok
{13} ok
{13} ok
panic[cpu19]/thread=2a100a33c80: panic sync timeout
ereport.io.pci.sta ena=5af9a4175404c01 detector=[ version=0 scheme="dev"
 device-path="/pci@9,600000" ] pci-status=aa0 pci-command=146 pci-pa=4000c0

!!!!!!!!!!!!!!!!

ereport.cpu.ultraSPARC-IVplus.ce ena=5af9a8d11800401 detector=[ version=1
 scheme="cpu" cpuid=1 cpumask=22 serial="80010220E95CB6CF" ] afsr=
 100002000000b0 afsr-ext=0 afar-status=1 afar=a3fd0d3940 pc=0 tl=0 tt=0
 privileged=1 multiple=0 syndrome-status=1 syndrome=b0 error-type="U"
 error-disposition=0 l3-cache-ways=0 l2-cache-ways=0 dcache-ways=0 icache-ways=
 0 resource=[ version=0 scheme="mem" unum="Slot A: J8001" ]

dumping to /dev/md/dsk/d20, offset 6873219072, content: kernel

panic[cpu19]/thread=2a100a33c80: BAD TRAP: type=31 rp=fff53cd0 addr=d61c6500d00c650 mmu_fsr=0
dump aborted: please record the above information!
rebooting...
No space left in device
Resetting ...

...... OUTPUT TRUNCATED .........

RSC Alert: Host System has Reset

<*>
Software Reset

@(#)OBP 4.30.4.c 2010/09/29 09:42 Sun Fire 4XX

........... OUTPUT TRUNCATED ............

!!!!!!!!!!!!!!!!!!!!!!

Sun Fire V490, No Keyboard
Copyright (c) 1998, 2010, Oracle and/or its affiliates. All rights reserved.
OpenBoot 4.30.4.c, 32768 MB memory installed, Serial #70989125.
Ethernet address 0:14:4f:3b:35:45, Host ID: 843b3545.

Creating CMP memory layout properties.

Reading temperature limits from FRUPROMs: CMP0/2 CMP1/3 BACKPLANE

Environmental monitor is ON
{3} ok
{3} ok
{3} ok    ------ FINALLY TO OBP ....
{3} ok
{3} ok
{3} ok
{3} ok .asr       ----- command to check status...

ASR Disablement Status
Component:     Status

CMP/Memory:    Enabled
IO-Bridge8:    Enabled
IO-Bridge9:    Enabled
GPTwo Slots:   Enabled
Onboard FCAL:  Enabled
Onboard Net1:  Enabled
Onboard Net0:  Enabled
Onboard IDE:   Enabled
PCI Slots:     Enabled

{3} ok
{3} ok asr-disable  ----- "disable" command with options...

Usage: asr-disable <dev-id>
Where <dev-id> is an absolute device path, a device alias, or a device label.
Valid device labels include:
    cmp3-bank3      cmp3-bank2      cmp3-bank1      cmp3-bank0
    cmp2-bank3      cmp2-bank2      cmp2-bank1      cmp2-bank0
    cmp1-bank3      cmp1-bank2      cmp1-bank1      cmp1-bank0
    cmp0-bank3      cmp0-bank2      cmp0-bank1      cmp0-bank0
    pci-slot5       pci-slot4       pci-slot3       pci-slot2
    pci-slot1       pci-slot0       gptwo-slotc     gptwo-slotb
    gptwo-slota     ob-ide          ob-net0         ob-net1
    ob-fcal         io-bridge9      io-bridge8      cmp3
    cmp2            cmp1            cmp0

{3} ok
{3} ok
{3} ok asr-enable ----- "enable" command with options...

Usage: asr-enable <dev-id>
Where <dev-id> is an absolute device path, a device alias, or a device label.
Valid device labels include:
    cmp3-bank3      cmp3-bank2      cmp3-bank1      cmp3-bank0
    cmp2-bank3      cmp2-bank2      cmp2-bank1      cmp2-bank0
    cmp1-bank3      cmp1-bank2      cmp1-bank1      cmp1-bank0
    cmp0-bank3      cmp0-bank2      cmp0-bank1      cmp0-bank0
    pci-slot5       pci-slot4       pci-slot3       pci-slot2
    pci-slot1       pci-slot0       gptwo-slotc     gptwo-slotb
    gptwo-slota     ob-ide          ob-net0         ob-net1
    ob-fcal         io-bridge9      io-bridge8      cmp3
    cmp2            cmp1            cmp0
    *               cmp3-bank*      cmp2-bank*      cmp1-bank*
    cmp0-bank*      pci*            pci-slot*       gptwo-slot*
    io-bridge*      cmp*

{3} ok
{3} ok
{3} ok asr-disable cmp0      ----Finally disabling the faulted...
{3} ok
{3} ok asr-disable cmp2
{3} ok
{3} ok
{3} ok .asr                ----- can check the status now...
ASR Disablement Status
Component:     Status

CMP0:          Disabled
Memory Bank0:  Enabled
Memory Bank1:  Enabled
Memory Bank2:  Enabled
Memory Bank3:  Enabled
CMP1/Memory:   Enabled
CMP2:          Disabled
Memory Bank0:  Enabled
Memory Bank1:  Enabled
Memory Bank2:  Enabled
Memory Bank3:  Enabled
CMP3/Memory:   Enabled
IO-Bridge8:    Enabled
IO-Bridge9:    Enabled
GPTwo Slots:   Enabled
Onboard FCAL:  Enabled
Onboard Net1:  Enabled
Onboard Net0:  Enabled
Onboard IDE:   Enabled
PCI Slots:     Enabled

{3} ok devalias
rootmirror               /pci@9,600000/SUNW,qlc@2/fp@0,0/disk@1,0
rootdisk                 /pci@9,600000/SUNW,qlc@2/fp@0,0/disk@0,0
disk1                    /pci@9,600000/SUNW,qlc@2/fp@0,0/disk@1,0
disk0                    /pci@9,600000/SUNW,qlc@2/fp@0,0/disk@0,0
disk                     /pci@9,600000/SUNW,qlc@2/fp@0,0/disk@0,0
ide                      /pci@8,700000/ide@6
scsi                     /pci@9,600000/SUNW,qlc@2
cdrom                    /pci@8,700000/ide@6/cdrom@0,0:f
net                      /pci@9,700000/network@2
net1                     /pci@9,600000/network@1
net0                     /pci@9,700000/network@2
flash                    /pci@9,700000/ebus@1/flashprom@0,0
idprom                   /pci@9,700000/ebus@1/i2c@1,2e/idprom@4,a4
nvram                    /pci@9,700000/ebus@1/i2c@1,2e/nvram@4,a4
i2c1                     /pci@9,700000/ebus@1/i2c@1,30
i2c0                     /pci@9,700000/ebus@1/i2c@1,2e
bbc                      /pci@9,700000/ebus@1/bbc@1,0
rsc-console              /pci@9,700000/ebus@1/rsc-console@1,3083f8
rsc-control              /pci@9,700000/ebus@1/rsc-control@1,3062f8
ttya                     /pci@9,700000/ebus@1/serial@1,400000:a
pci9b                    /pci@9,700000
pci9a                    /pci@9,600000
pci8b
                         /pci@8,700000
pci8a                    /pci@8,600000
ebus                     /pci@9,700000/ebus@1
name                     aliases
{3} ok
{3} ok

Finally boot the system,

{3} ok boot rootdisk
Boot device: /pci@9,600000/SUNW,qlc@2/fp@0,0/disk@0,0  File and args:
SunOS Release 5.10 Version Generic_147440-27 64-bit
Copyright (c) 1983, 2012, Oracle and/or its affiliates. All rights reserved.
Hostname: SRVR

VxVM sysboot INFO V-5-2-3409 starting in boot mode...
NOTICE: VxVM vxdmp V-5-0-34 added disk array DISKS, datype = Disk

NOTICE: VxVM vxdmp V-5-3-1700 dmpnode 341/0x0 has migrated from enclosure FAKE_ENCLR_SNO to enclosure DISKS

VxVM sysboot INFO V-5-2-3390 Starting restore daemon...
Feb  4 02:35:36 svc.startd[408]: restarting after interruption
WARNING: /pci@8,700000/ide@6/sd@0,0 (sd1):
        transport rejected bad packet

WARNING: /pci@8,700000/ide@6/sd@0,0 (sd1):
        transport rejected bad packet

!!!!!!!!!!!!!!!!!!!!!!

panic[cpu3]/thread=30003e6aa60: UE WDU Error(s)

000002a1013124e0 SUNW,UltraSPARC-IV+:cpu_deferred_error+56c (1, 3, 10000400000154, 400000000, 0, 0)
  %l0-3: 000000a3e4eb5c40 0000000000000001 0000000001851000 0000080c00000040
  %l4-7: 0000080c00000000 0000000000203000 0000000000000001 0000000000000000


syncing file systems... [1] 177 [1] 132 [1] 127 [1] 127 [1] 127 [1] 127 [1] 127 [1] 127 [1] 127 done (not all i/o completed)

ereport.cpu.ultraSPARC-IVplus.ce ena=1f6adb3c02c04801 detector=[ version=1
 scheme="cpu" cpuid=12 cpumask=22 serial="80001A58E75C3807" ] afsr=
 100002000001e4 afsr-ext=0 afar-status=1 afar=a3e5bdd040 pc=0 tl=0 tt=0
 privileged=1 multiple=0 syndrome-status=1 syndrome=1e4 error-type="U"
 error-disposition=2000000 l3-cache-ways=0 l2-cache-ways=0 dcache-ways=0
 icache-ways=0 resource=[ version=0 scheme="mem" unum="Slot A: J8001" ]

dumping to /dev/md/dsk/d20, offset 6873219072, content: kernel
 0:10 100% done
100% done: 147791 pages dumped, dump succeeded
rebooting...

Resetting ...

!!!!!!!!!!!!!!!!!!!!!!

RSC Alert: Host System has Reset

<*>
Software Reset

Skipping POST.
WARNING: Offlining/Disabling CMP0...and CMP2...FRU bus access...Done.
Enabling system bus....... CMP1 CMP3 Done
Mungeing Memory...........Done
HiMem: 0000.00b0.0000.0000, size: 0000.0004.0000.0000
Configuring Memory........ CMP1 CMP3 Done
Init ICache/etc........... CMP1 CMP3 Done
Init ECache Tags.......... CMP1 CMP3 Done
Clearing TLBs............. CMP1 CMP3 Done
Setup I/DTLBs............. CMP1 CMP3 Done
Enabling Cache/MMUs....... CMP1 CMP3 Done
Init ECache Data.......... CMP1 CMP3 Done
Zeroing memory...Done
Copying FLASHRAM to memory...Verifying base 128KB...Done
Jumping into RAM (leaving slave CPUs in ROM)
RAM CRC = 0000.0000.b81b.5f23;  ROM CRC = 0000.0000.b81b.5f23
Dropping in...
Find dropin, Decompressing Done, Size 0000.0000.0007.fd30 (512KB)
Slave CPUs starting Forth at 0000.0000.f000.00e0
Boot  CPU3 starting Forth at 0000.0000.f000.00e0
Diagnostic console initialized

Creating CMP memory layout properties.

Reading temperature limits from FRUPROMs: CMP1/3 BACKPLANE

Environmental monitor is ON
Rebooting with command: boot
Boot device: rootdisk  File and args:
SunOS Release 5.10 Version Generic_147440-27 64-bit
Copyright (c) 1983, 2012, Oracle and/or its affiliates. All rights reserved.
Hostname: SRVR

................. OUTPUT TRUNCATED...........

SRVR console login: root
Password:
Last login: Wed Feb  4 01:47:48 on console
Feb  4 02:45:06 SRVR login: ROOT LOGIN /dev/console
You have new mail.
Sourcing /root/.profile-EIS.....
root@SRVR #
root@SRVR #
root@SRVR #

Using fmadm command we can check which the faulted SB/CPU's .....

root@SRVR # fmadm faulty
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 03 19:42:56 5fa23e99-815a-e9bd-c8b1-929231753f98  SUN4U-8007-L3  Critical

Host        : SRVR
Platform    : SUNW,Sun-Fire-V490        Chassis_id  :
Product_sn  :

Fault class : fault.memory.dimm-ue-imminent 95%
Affects     : mem:///unum=Slot,A:J8001
                  faulted but still in service
FRU         : mem:///unum=Slot,A:J8001 95%
                  faulty
Serial ID.  : 887214

Description : A pattern of correctable errors has been observed suggesting the
              potential exists that an uncorrectable error may occur.
              Refer to http://sun.com/msg/SUN4U-8007-L3 for more information.

Response    : None at this time.

Impact      : None at this time. However, the potential uncorrectable error
              warrants proactive service action to avoid any unplanned system
              outages.

Action      : Schedule a repair procedure to replace the DIMM. Use fmadm faulty
              to identify the DIMM to replace.

--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 03 19:34:02 499d428d-84ad-cacd-bae8-e0b6258b0bc0  SUN4U-8000-35  Critical

Host        : SRVR
Platform    : SUNW,Sun-Fire-V490        Chassis_id  :
Product_sn  :

Fault class : fault.memory.bank 95%
Affects     : mem:///unum=Slot,A:J7900,J7901,J8001,J8000
                  faulted but still in service
FRU         : mem:///unum=Slot,A:J7900,J7901,J8001,J8000 95%
                  faulty
Serial ID.  : 887430
              887318
              887214
              688813

Description : The number of errors associated with this memory module has
              exceeded acceptable levels.  Refer to
              http://sun.com/msg/SUN4U-8000-35 for more information.

Response    : Pages of memory associated with this memory module are being
              removed from service as errors are reported.

Impact      : Total system memory capacity will be reduced as pages are
              retired.

Action      : Schedule a repair procedure to replace the affected memory
              module. Use fmdump -v -u <EVENT_ID> to identify the module.

--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 03 20:02:00 1536fecc-7ee3-6e49-fb81-c2da0262733a  PCI-8000-AP    Major

Host        : SRVR
Platform    : SUNW,Sun-Fire-V490        Chassis_id  :
Product_sn  :

Fault class : fault.io.pci.device-invreq 50%
              fault.io.pci.device-interr 50%
Affects     : dev:////pci@9,600000/SUNW,qlc@2
              dev:////pci@9,600000
                  faulted but still in service
FRU         : "MB" (hc://:product-id=SUNW,Sun-Fire-V490:server-id=SRVR/motherboard=0)
                  faulty

Description : Either the transmitting device sent an invalid request or the
              receiving device is reporting an internal fault.
              Refer to http://sun.com/msg/PCI-8000-AP for more information.

Response    : One or more device instances may be disabled

Impact      : Possible loss of services provided by the device instances
              associated with this fault

Action      : Ensure that the latest drivers and patches are installed.
              Otherwise schedule a repair procedure to replace the affected
              device(s).  Use fmadm faulty to identify the devices or contact
              Sun for support.

--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 02 17:36:45 50af5cf4-571a-cee2-b76b-85b4d0c0430b  SUN4U-8007-KY  Major

Host        : SRVR
Platform    : SUNW,Sun-Fire-V490        Chassis_id  :
Product_sn  :

Fault class : fault.memory.dimm-page-retires-excessive 95%
Affects     : mem:///unum=Slot,A:J8001
                  faulted but still in service
FRU         : mem:///unum=Slot,A:J8001 95%
                  faulty
Serial ID.  : 887214

Description : The number of correctable errors associated with this memory
              module has exceeded acceptable levels.
              Refer to http://sun.com/msg/SUN4U-8007-KY for more information.

Response    : Pages of memory associated with this memory module have been
              removed from service, up to a limit which has now been reached.

Impact      : Total system memory capacity has been reduced.

Action      : Schedule a repair procedure to replace the DIMM. Use fmadm faulty
              to identify the DIMM to replace.

root@SRVR #
root@SRVR #

Now we can check using "prtdiag" command whether SLOT-A is disabled or not...

root@SRVR # prtdiag -v | grep -i mem
Memory size: 16384 Megabytes
========================= Memory Configuration ===============================
root@SRVR #
root@SRVR # prtdiag -v
System Configuration:  Oracle Corporation  sun4u Sun Fire V490
System clock frequency: 150 MHz
Memory size: 16384 Megabytes

========================= CPUs ===============================================

          Run   E$  CPU     CPU
Brd  CPU  MHz   MB  Impl.   Mask
--- ----- ---- ---- ------- ----
 B  1, 17 1500 32.0 US-IV+   2.2                       BOARD "A" is missing.....
 B  3, 19 1500 32.0 US-IV+   2.2

========================= Memory Configuration ===============================

          Logical  Logical  Logical
     MC   Bank     Bank     Bank         DIMM    Interleave  Interleaved
Brd  ID   num      size     Status       Size    Factor      with
---  ---  ----     ------   -----------  ------  ----------  -----------
 B    1     0      2048MB   no_status    1024MB     8-way        0
 B    1     1      2048MB   no_status    1024MB     8-way        0
 B    1     2      2048MB   no_status    1024MB     8-way        0
 B    1     3      2048MB   no_status    1024MB     8-way        0
 B    3     0      2048MB   no_status    1024MB     8-way        0
 B    3     1      2048MB   no_status    1024MB     8-way        0
 B    3     2      2048MB   no_status    1024MB     8-way        0
 B    3     3      2048MB   no_status    1024MB     8-way        0

========================= IO Cards =========================

                    Bus  Max
 IO  Port Bus       Freq Bus  Dev,
Type  ID  Side Slot MHz  Freq Func State Name                              Model
---- ---- ---- ---- ---- ---- ---- ----- --------------------------------  ----------------------
PCI   8    B    3    33   33  3,0  ok    QLGC,qlc-pci1077,2312.1077.101.2+
PCI   8    B    3    33   33  3,1  ok    QLGC,qlc-pci1077,2312.1077.101.2+
PCI   8    B    5    33   33  5,0  ok    QLGC,qlc-pci1077,2312.1077.101.2+
PCI   8    B    5    33   33  5,1  ok    QLGC,qlc-pci1077,2312.1077.101.2+
.......

So we need to schedule downtime for Memory replacement....

##########################################################################################




No comments:

Post a Comment