Generally coming to
OBP, we know many regular commands like probe-scsi, probe-scsi-all, reset-all,
devalias, show-devs. We also have some commands which are used to troubleshoot
hardware issues. In this post we are going to discuss about "asr"
command usage.
We can disable or
enable cpu/motherboards by using " ASR " command.
ASR stands for Automatic System Recovery. Using this we can disable hardware directly
from the OBP.
In my scenario, one of
our server is continuously rebooting (panic), then I took login from console to
start troubleshooting...
rsc>
rsc>
rsc> console
rsc>
rsc>
rsc>
THIS IS THE SCENARIO SERVER IS BOOTING.....
rsc>
rsc>
rsc>
SUNW-MSG-ID:
SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
EVENT-TIME:
0x54d12ddd.0xde0bcc8 (0x963e9d9a84)
PLATFORM:
SUNW,Sun-Fire-V490, CSN: -, HOSTNAME: SRVR
SOURCE: SunOS, REV:
5.10 Generic_147440-27
DESC: Errors have been detected that require a reboot to
ensure system
integrity. See
http://www.sun.com/msg/SUNOS-8000-0G for more information.
AUTO-RESPONSE: Solaris
will attempt to save and diagnose the error telemetry
IMPACT: The system
will sync files, save a crash dump if needed, and reboot
REC-ACTION: Save the
error summary below in case telemetry cannot be saved
panic[cpu2]/thread=3000ccd7200:
UE WDU Error(s)
!!!!!!!!!!!!!!!!!!!
000002a100f778d0
unix:pagefault+ac (1036e6000, 0, 2, 0, 60035099778, 2)
%l0-3:
0000000000000004 0000000000000000 0000030003c36000 0000000000000000
%l4-7:
000000000183e400 000000000183b400 0000000000000000 000006003b4d2cd0
000002a100f77990
unix:trap+d50 (2a100f77b90, 1036e64b9, 0, 2, ffffffff7826ff1c, 0)
%l0-3: 0000000000000000
000006003b4d2cd0 0000000000010031 00000600353d2fa8
%l4-7:
0000000000000000 0000000000010034 0000000000010000 000006003b4d2eb0
syncing file
systems... [1] 112 [1] 74 [1] 4 [1] 4 [1] 4 [1] 4 [1] ...... 4 [1] 4 done (not
all i/o completed)
ereport.cpu.ultraSPARC-IVplus.edu-st
ena=963cb72b2c00801 detector=[ version=1
scheme="cpu"
cpuid=2 cpumask=22 serial="80001A58E75C3807" ] afsr=
10000a00000003
afsr-ext=0 afar-status=1 afar=a3c9ea0250 pc=12a3870 tl=0 tt=63
privileged=1
multiple=0 syndrome-status=1 syndrome=3 l3-cache-ways=4
l3-cache-data=[...]
l2-cache-ways=1 l2-cache-data=[...] dcache-ways=0
icache-ways=0
resource=[ version=1 scheme="cpu" cpuid=2 cpumask=22 serial=
"80001A58E75C3807"
]
...... OUTPUT
TRUNCATED .........
ereport.cpu.ultraSPARC-IVplus.ce
ena=963e9ce73804001 detector=[ version=1
scheme="cpu"
cpuid=10 cpumask=22 serial="80001A58C75C3807" ] afsr=
100002000000b0
afsr-ext=0 afar-status=1 afar=a3e7122f40 pc=0 tl=0 tt=0
privileged=1
multiple=0 syndrome-status=1 syndrome=b0 error-type="U"
error-disposition=0
l3-cache-ways=0 l2-cache-ways=0 dcache-ways=0 icache-ways=
0 resource=[
version=0 scheme="mem" unum="Slot A: J8001" ]
dumping to
/dev/md/dsk/d20, offset 6873219072, content: kernel
0:11 100% done
100% done: 169129
pages dumped, dump succeeded
rebooting...
Resetting ...
!!!!!!!!!!!!!!!!!!
RSC Alert: Host System
has Reset
<*>
Software Reset
@(#)OBP 4.30.4.c
2010/09/29 09:42 Sun Fire 4XX
Online: CMP0
UltraSPARC IV+ (v2.2) 10:1 1500MHz 32MB 5:1 ECache
Online: CMP1
UltraSPARC IV+ (v2.2) 10:1 1500MHz 32MB 5:1 ECache
Online: CMP2
UltraSPARC IV+ (v2.2) 10:1 1500MHz 32MB 5:1 ECache
Online: *CMP3
UltraSPARC IV+ (v2.2) 10:1 1500MHz 32MB 5:1 ECache
Skipping POST.
Enabling system
bus....... CMP0 CMP1 CMP2 CMP3 Done
Init
ICache/etc........... CMP0 CMP1 CMP2 CMP3 Done
Init ECache
Tags.......... CMP0 CMP1 CMP2 CMP3 Done
Clearing TLBs.............
CMP0 CMP1 CMP2 CMP3 Done
Setup
I/DTLBs............. CMP0 CMP1 CMP2 CMP3 Done
Enabling
Cache/MMUs....... CMP0 CMP1 CMP2 CMP3 Done
Init ECache
Data.......... CMP0 CMP1 CMP2 CMP3 Done
Zeroing memory...Done
Copying FLASHRAM to
memory...Verifying base 128KB...Done
Jumping into RAM
(leaving slave CPUs in ROM)
RAM CRC =
0000.0000.b81b.5f23; ROM CRC = 0000.0000.b81b.5f23
Dropping in...
Find dropin,
Decompressing Done, Size 0000.0000.0007.fd30 (512KB)
Slave CPUs starting
Forth at 0000.0000.f000.00e0
Boot CPU3
starting Forth at 0000.0000.f000.00e0
Diagnostic console
initialized
Configure root name:
SUNW,Sun-Fire-V490
Probing system devices
(1500 MHz @ 10:1, 16
MB) /: gptwo at 0,0 cmp cpu cpu memory-controller
(1500 MHz @ 10:1, 16
MB) /: gptwo at 1,0 cmp cpu cpu memory-controller
(1500 MHz @ 10:1, 16
MB) /: gptwo at 2,0 cmp cpu cpu memory-controller
(1500 MHz @ 10:1, 16
MB) /: gptwo at 3,0 cmp cpu cpu memory-controller
/: gptwo at 4,0
Nothing there
/: gptwo at 5,0
Nothing there
/: gptwo at 6,0
Nothing there
/: gptwo at 7,0
Nothing there
/: gptwo at 8,0 pci
pci
/: gptwo at 9,0 pci
pci
Loading Support
Packages: obp-tftp kbd-translator SUNW,i2c-ram-device SUNW,fru-device
Loading onboard
drivers: ebus
/pci@9,700000/ebus@1:
flashprom bbc power i2c i2c rtc gpio pmc rsc-control rsc-console serial
/pci@9,700000/ebus@1/i2c@1,2e:
fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru
fru fru fru fru fru fru fru fru fru fru fru fru fru fru nvram idprom fru fru
/pci@9,700000/ebus@1/i2c@1,30: temperature temperature temperature ioexp ioexp
ioexp temperature ioexp ioexp ioexp ioexp temperature-sensor fru fru fru fru
fru rscrtc
/memory: CMP0 Bank0
512 + 512 + 512 + 512 : 2 GB @
a000000000 8-way #0
/memory: CMP0 Bank1
512 + 512 + 512 + 512 : 2 GB @
a000000000 8-way #2
/memory: CMP0 Bank2
512 + 512 + 512 + 512 : 2 GB @
a000000000 8-way #4
/memory: CMP0 Bank3
512 + 512 + 512 + 512 : 2 GB @
a000000000 8-way #6
/memory: CMP1 Bank0
512 + 512 + 512 + 512 : 2 GB @
b000000000 8-way #0
/memory: CMP1 Bank1
512 + 512 + 512 + 512 : 2 GB @
b000000000 8-way #2
/memory: CMP1 Bank2
512 + 512 + 512 + 512 : 2 GB @
b000000000 8-way #4
/memory: CMP1 Bank3
512 + 512 + 512 + 512 : 2 GB @
b000000000 8-way #6
/memory: CMP2 Bank0
512 + 512 + 512 + 512 : 2 GB @
a000000000 8-way #1
/memory: CMP2 Bank1
512 + 512 + 512 + 512 : 2 GB @
a000000000 8-way #3
/memory: CMP2 Bank2
512 + 512 + 512 + 512 : 2 GB @
a000000000 8-way #5
/memory: CMP2 Bank3
512 + 512 + 512 + 512 : 2 GB @
a000000000 8-way #7
/memory: CMP3 Bank0
512 + 512 + 512 + 512 : 2 GB @
b000000000 8-way #1
/memory: CMP3 Bank1
512 + 512 + 512 + 512 : 2 GB @ b000000000
8-way #3
/memory: CMP3 Bank2
512 + 512 + 512 + 512 : 2 GB @
b000000000 8-way #5
/memory: CMP3 Bank3
512 + 512 + 512 + 512 : 2 GB @
b000000000 8-way #7
ChassisSerialNumber 0412KM4616
Probing I/O buses
/pci@8,600000: Device
1 Nothing there
/pci@8,600000: Device
2 Nothing there
/pci@8,700000: Device
2 Nothing there
/pci@8,700000: Device
3 QLGC,qlc fp disk QLGC,qlc fp disk
/pci@8,700000: Device
4 Nothing there
/pci@8,700000: Device
5 QLGC,qlc fp disk QLGC,qlc fp disk
/pci@8,700000: Device
6 ide disk cdrom
/pci@9,600000: Device
1 network
/pci@9,600000: Device
2 SUNW,qlc fp disk
/pci@9,700000: Device
1 usb
/pci@9,700000: Device
2 network
Configure root name:
SUNW,Sun-Fire-V490
UNABLE TO TRUNCATE OUTPUT AS IT IS NECESSARY TO UNDERSTAND.....
Probing system devices
(1500 MHz @ 10:1, 16 MB)
/: gptwo at 0,0 cmp cpu cpu memory-controller
(1500 MHz @ 10:1, 16
MB) /: gptwo at 1,0 cmp cpu cpu memory-controller
(1500 MHz @ 10:1, 16
MB) /: gptwo at 2,0 cmp cpu cpu memory-controller
(1500 MHz @ 10:1, 16
MB) /: gptwo at 3,0 cmp cpu cpu memory-controller
/: gptwo at 4,0
Nothing there
/: gptwo at 5,0
Nothing there
/: gptwo at 6,0
Nothing there
/: gptwo at 7,0
Nothing there
/: gptwo at 8,0 pci
pci
/: gptwo at 9,0 pci
pci
Loading Support
Packages: obp-tftp kbd-translator SUNW,i2c-ram-device SUNW,fru-device
Loading onboard
drivers: ebus
/pci@9,700000/ebus@1:
flashprom bbc power i2c i2c rtc gpio pmc rsc-control rsc-console serial
/pci@9,700000/ebus@1/i2c@1,2e:
fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru fru
fru fru fru fru fru fru fru fru fru fru fru fru fru fru nvram idprom fru fru
/pci@9,700000/ebus@1/i2c@1,30:
temperature temperature temperature ioexp ioexp ioexp temperature ioexp ioexp
ioexp ioexp temperature-sensor fru fru fru fru fru rscrtc
/memory: CMP0 Bank0
512 + 512 + 512 + 512 : 2 GB @
a000000000 8-way #0
/memory: CMP0 Bank1
512 + 512 + 512 + 512 : 2 GB @
a000000000 8-way #2
/memory: CMP0 Bank2
512 + 512 + 512 + 512 : 2 GB @
a000000000 8-way #4
/memory: CMP0 Bank3
512 + 512 + 512 + 512 : 2 GB @
a000000000 8-way #6
/memory: CMP1 Bank0
512 + 512 + 512 + 512 : 2 GB @
b000000000 8-way #0
/memory: CMP1 Bank1
512 + 512 + 512 + 512 : 2 GB @
b000000000 8-way #2
/memory: CMP1 Bank2
512 + 512 + 512 + 512 : 2 GB @
b000000000 8-way #4
/memory: CMP1 Bank3
512 + 512 + 512 + 512 : 2 GB @
b000000000 8-way #6
/memory: CMP2 Bank0
512 + 512 + 512 + 512 : 2 GB @
a000000000 8-way #1
/memory: CMP2 Bank1
512 + 512 + 512 + 512 : 2 GB @
a000000000 8-way #3
/memory: CMP2 Bank2
512 + 512 + 512 + 512 : 2 GB @
a000000000 8-way #5
/memory: CMP2 Bank3
512 + 512 + 512 + 512 : 2 GB @
a000000000 8-way #7
/memory: CMP3 Bank0
512 + 512 + 512 + 512 : 2 GB @
b000000000 8-way #1
/memory: CMP3 Bank1
512 + 512 + 512 + 512 : 2 GB @
b000000000 8-way #3
/memory: CMP3 Bank2
512 + 512 + 512 + 512 : 2 GB @
b000000000 8-way #5
/memory: CMP3 Bank3
512 + 512 + 512 + 512 : 2 GB @
b000000000 8-way #7
ChassisSerialNumber 0412KM4616
Probing I/O buses
/pci@8,600000: Device
1 Nothing there
/pci@8,600000: Device
2 Nothing there
/pci@8,700000: Device
2 Nothing there
/pci@8,700000: Device
3 QLGC,qlc fp disk QLGC,qlc fp disk
/pci@8,700000: Device
4 Nothing there
/pci@8,700000: Device
5 QLGC,qlc fp disk QLGC,qlc fp disk
/pci@8,700000: Device
6 ide disk cdrom
/pci@9,600000: Device
1 network
/pci@9,600000: Device
2 SUNW,qlc fp disk
/pci@9,700000: Device
1 usb
/pci@9,700000: Device
2 network
Sun Fire V490, No
Keyboard
Copyright (c) 1998,
2010, Oracle and/or its affiliates. All rights reserved.
OpenBoot 4.30.4.c,
32768 MB memory installed, Serial #70989125.
Ethernet address
0:14:4f:3b:35:45, Host ID: 843b3545.
Creating CMP memory layout
properties.
Reading temperature
limits from FRUPROMs: CMP0/2 CMP1/3 BACKPLANE
Environmental monitor
is ON
Rebooting with
command: boot
Boot device: rootdisk
File and args:
SunOS Release 5.10
Version Generic_147440-27 64-bit
Copyright (c) 1983,
2012, Oracle and/or its affiliates. All rights reserved.
Hostname: SRVR
........... OUTPUT TRUNCATED ............
SRVR console login:
root ---------- AS YOU CAN SEE BEFORE TAKING LOGIN ,
IT STARTED BOOTING AGAIN ....
SUNW-MSG-ID:
SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
EVENT-TIME:
0x54d12fbb.0x237c8904 (0x5af9a9771c)
PLATFORM:
SUNW,Sun-Fire-V490, CSN: -, HOSTNAME: SRVR
SOURCE: SunOS, REV:
5.10 Generic_147440-27
DESC: Errors have been
detected that require a reboot to ensure system
integrity. See
http://www.sun.com/msg/SUNOS-8000-0G for more information.
AUTO-RESPONSE: Solaris
will attempt to save and diagnose the error telemetry
IMPACT: The system
will sync files, save a crash dump if needed, and reboot
REC-ACTION: Save the
error summary below in case telemetry cannot be saved
...... OUTPUT
TRUNCATED .........
panic[cpu19]/thread=2a100a33c80:
Fatal PCI UE Error
000002a100a5bd40
pcisch:ecc_intr+84 (6003340b920, 4, 6003340b910, fffffffffffffff8, 6003340b910,
3000010de38)
%l0-3:
0000000000000000 0000000000004000 0000000000000000 0000000000000004
%l4-7:
000000007be12800 00000000018e8c00 ffffffffffffffff 000003000010de98
000002a100a5bf50
unix:current_thread+164 (2, 180c000, 180c000, 0, ffffffffffffffff, 0)
%l0-3:
000000000100777c 000002a100a32fe1 000000000000000e 0000000070008100
%l4-7:
0000000000000000 0000000000000000 0000000000000000 000002a100a33890
000002a100a33930 0
(30003c5c000, 184c9d0, 0, ffffffffffffffff, 420d, 1814800)
%l0-3:
0000000000000000 0000000000000001 0000000000000001 0000000000000000
%l4-7:
0000000001000000 0000000000000002 0000030003c5c178 000000000000e193
000002a100a339e0
unix:idle+d4 (1814800, 0, 30003c5c000, ffffffffffffffff, 8, 1813000)
%l0-3: 000006003492eff8
000000000000001b 0000000000000000 ffffffffffffffff
%l4-7:
000006003492eff8 ffffffffffffffff 000000000184c9d0 0000000001063a88
syncing file
systems... [2] 205 [2] 164 [2] 127 [2] 127 [2] 127 [2] 127rsc>
rsc>
rsc>
THIS TIME I AM LEFT
WITH ONLY OPTION IS TO BREAK THE SYSTEM, USUALLY WHICH WE SHOULD NOT PREFER.
rsc> break
rsc>
rsc>
rsc> break
rsc>
rsc>
rsc> console
{13} ok
{13} ok
{13} ok
{13} ok
panic[cpu19]/thread=2a100a33c80:
panic sync timeout
ereport.io.pci.sta
ena=5af9a4175404c01 detector=[ version=0 scheme="dev"
device-path="/pci@9,600000"
] pci-status=aa0 pci-command=146 pci-pa=4000c0
!!!!!!!!!!!!!!!!
ereport.cpu.ultraSPARC-IVplus.ce
ena=5af9a8d11800401 detector=[ version=1
scheme="cpu"
cpuid=1 cpumask=22 serial="80010220E95CB6CF" ] afsr=
100002000000b0
afsr-ext=0 afar-status=1 afar=a3fd0d3940 pc=0 tl=0 tt=0
privileged=1
multiple=0 syndrome-status=1 syndrome=b0 error-type="U"
error-disposition=0
l3-cache-ways=0 l2-cache-ways=0 dcache-ways=0 icache-ways=
0 resource=[
version=0 scheme="mem" unum="Slot A: J8001" ]
dumping to
/dev/md/dsk/d20, offset 6873219072, content: kernel
panic[cpu19]/thread=2a100a33c80:
BAD TRAP: type=31 rp=fff53cd0 addr=d61c6500d00c650 mmu_fsr=0
dump aborted: please
record the above information!
rebooting...
No space left in
device
Resetting ...
...... OUTPUT
TRUNCATED .........
RSC Alert: Host System
has Reset
<*>
Software Reset
@(#)OBP 4.30.4.c
2010/09/29 09:42 Sun Fire 4XX
........... OUTPUT
TRUNCATED ............
!!!!!!!!!!!!!!!!!!!!!!
Sun Fire V490, No
Keyboard
Copyright (c) 1998,
2010, Oracle and/or its affiliates. All rights reserved.
OpenBoot 4.30.4.c,
32768 MB memory installed, Serial #70989125.
Ethernet address
0:14:4f:3b:35:45, Host ID: 843b3545.
Creating CMP memory
layout properties.
Reading temperature
limits from FRUPROMs: CMP0/2 CMP1/3 BACKPLANE
Environmental monitor
is ON
{3} ok
{3} ok
{3} ok
------ FINALLY TO OBP ....
{3} ok
{3} ok
{3} ok
{3} ok .asr
----- command to check status...
ASR Disablement Status
Component:
Status
CMP/Memory:
Enabled
IO-Bridge8:
Enabled
IO-Bridge9:
Enabled
GPTwo Slots:
Enabled
Onboard FCAL:
Enabled
Onboard Net1:
Enabled
Onboard Net0:
Enabled
Onboard IDE:
Enabled
PCI Slots:
Enabled
{3} ok
{3} ok asr-disable
----- "disable" command with options...
Usage: asr-disable
<dev-id>
Where <dev-id>
is an absolute device path, a device alias, or a device label.
Valid device labels
include:
cmp3-bank3 cmp3-bank2 cmp3-bank1
cmp3-bank0
cmp2-bank3 cmp2-bank2 cmp2-bank1
cmp2-bank0
cmp1-bank3 cmp1-bank2 cmp1-bank1
cmp1-bank0
cmp0-bank3 cmp0-bank2 cmp0-bank1
cmp0-bank0
pci-slot5 pci-slot4 pci-slot3
pci-slot2
pci-slot1 pci-slot0 gptwo-slotc
gptwo-slotb
gptwo-slota ob-ide ob-net0
ob-net1
ob-fcal
io-bridge9 io-bridge8
cmp3
cmp2
cmp1
cmp0
{3} ok
{3} ok
{3} ok asr-enable
----- "enable" command with options...
Usage: asr-enable
<dev-id>
Where <dev-id>
is an absolute device path, a device alias, or a device label.
Valid device labels
include:
cmp3-bank3 cmp3-bank2 cmp3-bank1
cmp3-bank0
cmp2-bank3 cmp2-bank2 cmp2-bank1
cmp2-bank0
cmp1-bank3 cmp1-bank2 cmp1-bank1
cmp1-bank0
cmp0-bank3 cmp0-bank2 cmp0-bank1
cmp0-bank0
pci-slot5 pci-slot4 pci-slot3
pci-slot2
pci-slot1 pci-slot0 gptwo-slotc
gptwo-slotb
gptwo-slota ob-ide ob-net0
ob-net1
ob-fcal
io-bridge9 io-bridge8
cmp3
cmp2
cmp1
cmp0
*
cmp3-bank*
cmp2-bank* cmp1-bank*
cmp0-bank* pci*
pci-slot* gptwo-slot*
io-bridge* cmp*
{3} ok
{3} ok
{3} ok asr-disable
cmp0 ----Finally disabling the faulted...
{3} ok
{3} ok asr-disable
cmp2
{3} ok
{3} ok
{3} ok .asr
----- can check the status
now...
ASR Disablement Status
Component:
Status
CMP0:
Disabled
Memory Bank0:
Enabled
Memory Bank1:
Enabled
Memory Bank2: Enabled
Memory Bank3:
Enabled
CMP1/Memory:
Enabled
CMP2:
Disabled
Memory Bank0:
Enabled
Memory Bank1:
Enabled
Memory Bank2:
Enabled
Memory Bank3:
Enabled
CMP3/Memory:
Enabled
IO-Bridge8:
Enabled
IO-Bridge9:
Enabled
GPTwo Slots:
Enabled
Onboard FCAL:
Enabled
Onboard Net1:
Enabled
Onboard Net0:
Enabled
Onboard IDE:
Enabled
PCI Slots:
Enabled
{3} ok devalias
rootmirror
/pci@9,600000/SUNW,qlc@2/fp@0,0/disk@1,0
rootdisk
/pci@9,600000/SUNW,qlc@2/fp@0,0/disk@0,0
disk1
/pci@9,600000/SUNW,qlc@2/fp@0,0/disk@1,0
disk0
/pci@9,600000/SUNW,qlc@2/fp@0,0/disk@0,0
disk
/pci@9,600000/SUNW,qlc@2/fp@0,0/disk@0,0
ide
/pci@8,700000/ide@6
scsi
/pci@9,600000/SUNW,qlc@2
cdrom
/pci@8,700000/ide@6/cdrom@0,0:f
net
/pci@9,700000/network@2
net1
/pci@9,600000/network@1
net0
/pci@9,700000/network@2
flash
/pci@9,700000/ebus@1/flashprom@0,0
idprom
/pci@9,700000/ebus@1/i2c@1,2e/idprom@4,a4
nvram
/pci@9,700000/ebus@1/i2c@1,2e/nvram@4,a4
i2c1
/pci@9,700000/ebus@1/i2c@1,30
i2c0
/pci@9,700000/ebus@1/i2c@1,2e
bbc
/pci@9,700000/ebus@1/bbc@1,0
rsc-console
/pci@9,700000/ebus@1/rsc-console@1,3083f8
rsc-control
/pci@9,700000/ebus@1/rsc-control@1,3062f8
ttya
/pci@9,700000/ebus@1/serial@1,400000:a
pci9b
/pci@9,700000
pci9a
/pci@9,600000
pci8b
/pci@8,700000
pci8a
/pci@8,600000
ebus
/pci@9,700000/ebus@1
name
aliases
{3} ok
{3} ok
Finally boot the
system,
{3} ok boot rootdisk
Boot device:
/pci@9,600000/SUNW,qlc@2/fp@0,0/disk@0,0 File and args:
SunOS Release 5.10
Version Generic_147440-27 64-bit
Copyright (c) 1983,
2012, Oracle and/or its affiliates. All rights reserved.
Hostname: SRVR
VxVM sysboot INFO
V-5-2-3409 starting in boot mode...
NOTICE: VxVM vxdmp
V-5-0-34 added disk array DISKS, datype = Disk
NOTICE: VxVM vxdmp
V-5-3-1700 dmpnode 341/0x0 has migrated from enclosure FAKE_ENCLR_SNO to
enclosure DISKS
VxVM sysboot INFO
V-5-2-3390 Starting restore daemon...
Feb 4 02:35:36
svc.startd[408]: restarting after interruption
WARNING:
/pci@8,700000/ide@6/sd@0,0 (sd1):
transport rejected bad packet
WARNING:
/pci@8,700000/ide@6/sd@0,0 (sd1):
transport rejected bad packet
!!!!!!!!!!!!!!!!!!!!!!
panic[cpu3]/thread=30003e6aa60:
UE WDU Error(s)
000002a1013124e0
SUNW,UltraSPARC-IV+:cpu_deferred_error+56c (1, 3, 10000400000154, 400000000, 0,
0)
%l0-3:
000000a3e4eb5c40 0000000000000001 0000000001851000 0000080c00000040
%l4-7:
0000080c00000000 0000000000203000 0000000000000001 0000000000000000
syncing file systems...
[1] 177 [1] 132 [1] 127 [1] 127 [1] 127 [1] 127 [1] 127 [1] 127 [1] 127 done
(not all i/o completed)
ereport.cpu.ultraSPARC-IVplus.ce
ena=1f6adb3c02c04801 detector=[ version=1
scheme="cpu"
cpuid=12 cpumask=22 serial="80001A58E75C3807" ] afsr=
100002000001e4
afsr-ext=0 afar-status=1 afar=a3e5bdd040 pc=0 tl=0 tt=0
privileged=1
multiple=0 syndrome-status=1 syndrome=1e4 error-type="U"
error-disposition=2000000
l3-cache-ways=0 l2-cache-ways=0 dcache-ways=0
icache-ways=0
resource=[ version=0 scheme="mem" unum="Slot A: J8001" ]
dumping to
/dev/md/dsk/d20, offset 6873219072, content: kernel
0:10 100% done
100% done: 147791
pages dumped, dump succeeded
rebooting...
Resetting ...
!!!!!!!!!!!!!!!!!!!!!!
RSC Alert: Host System
has Reset
<*>
Software Reset
Skipping POST.
WARNING:
Offlining/Disabling CMP0...and CMP2...FRU bus access...Done.
Enabling system
bus....... CMP1 CMP3 Done
Mungeing
Memory...........Done
HiMem:
0000.00b0.0000.0000, size: 0000.0004.0000.0000
Configuring
Memory........ CMP1 CMP3 Done
Init
ICache/etc........... CMP1 CMP3 Done
Init ECache
Tags.......... CMP1 CMP3 Done
Clearing
TLBs............. CMP1 CMP3 Done
Setup
I/DTLBs............. CMP1 CMP3 Done
Enabling
Cache/MMUs....... CMP1 CMP3 Done
Init ECache
Data.......... CMP1 CMP3 Done
Zeroing memory...Done
Copying FLASHRAM to
memory...Verifying base 128KB...Done
Jumping into RAM
(leaving slave CPUs in ROM)
RAM CRC =
0000.0000.b81b.5f23; ROM CRC = 0000.0000.b81b.5f23
Dropping in...
Find dropin,
Decompressing Done, Size 0000.0000.0007.fd30 (512KB)
Slave CPUs starting
Forth at 0000.0000.f000.00e0
Boot CPU3
starting Forth at 0000.0000.f000.00e0
Diagnostic console
initialized
Creating CMP memory
layout properties.
Reading temperature
limits from FRUPROMs: CMP1/3 BACKPLANE
Environmental monitor
is ON
Rebooting with
command: boot
Boot device: rootdisk
File and args:
SunOS Release 5.10
Version Generic_147440-27 64-bit
Copyright (c) 1983,
2012, Oracle and/or its affiliates. All rights reserved.
Hostname: SRVR
................. OUTPUT TRUNCATED...........
SRVR console login:
root
Password:
Last login: Wed Feb
4 01:47:48 on console
Feb 4 02:45:06
SRVR login: ROOT LOGIN /dev/console
You have new mail.
Sourcing
/root/.profile-EIS.....
root@SRVR #
root@SRVR #
root@SRVR #
Using fmadm command we
can check which the faulted SB/CPU's .....
root@SRVR # fmadm
faulty
---------------
------------------------------------ -------------- ---------
TIME
EVENT-ID
MSG-ID
SEVERITY
---------------
------------------------------------ -------------- ---------
Feb 03 19:42:56
5fa23e99-815a-e9bd-c8b1-929231753f98 SUN4U-8007-L3 Critical
Host
: SRVR
Platform
: SUNW,Sun-Fire-V490 Chassis_id :
Product_sn :
Fault class :
fault.memory.dimm-ue-imminent 95%
Affects
: mem:///unum=Slot,A:J8001
faulted but still in service
FRU
: mem:///unum=Slot,A:J8001 95%
faulty
Serial ID. :
887214
Description : A
pattern of correctable errors has been observed suggesting the
potential exists that an uncorrectable error may
occur.
Refer to http://sun.com/msg/SUN4U-8007-L3 for more
information.
Response
: None at this time.
Impact
: None at this time. However, the potential uncorrectable error
warrants proactive service action to avoid any
unplanned system
outages.
Action
: Schedule a repair procedure to replace the DIMM. Use fmadm faulty
to identify the DIMM to replace.
---------------
------------------------------------ -------------- ---------
TIME
EVENT-ID
MSG-ID
SEVERITY
---------------
------------------------------------ -------------- ---------
Feb 03 19:34:02
499d428d-84ad-cacd-bae8-e0b6258b0bc0 SUN4U-8000-35 Critical
Host
: SRVR
Platform
: SUNW,Sun-Fire-V490 Chassis_id :
Product_sn :
Fault class :
fault.memory.bank 95%
Affects
: mem:///unum=Slot,A:J7900,J7901,J8001,J8000
faulted but still in service
FRU
: mem:///unum=Slot,A:J7900,J7901,J8001,J8000 95%
faulty
Serial ID. :
887430
887318
887214
688813
Description : The
number of errors associated with this memory module has
exceeded acceptable levels. Refer to
http://sun.com/msg/SUN4U-8000-35 for more
information.
Response
: Pages of memory associated with this memory module are being
removed from service as errors are reported.
Impact
: Total system memory capacity will be reduced as pages are
retired.
Action
: Schedule a repair procedure to replace the affected memory
module. Use fmdump -v -u <EVENT_ID> to
identify the module.
---------------
------------------------------------ -------------- ---------
TIME
EVENT-ID
MSG-ID
SEVERITY
---------------
------------------------------------ -------------- ---------
Feb 03 20:02:00
1536fecc-7ee3-6e49-fb81-c2da0262733a PCI-8000-AP Major
Host
: SRVR
Platform
: SUNW,Sun-Fire-V490 Chassis_id :
Product_sn :
Fault class :
fault.io.pci.device-invreq 50%
fault.io.pci.device-interr 50%
Affects
: dev:////pci@9,600000/SUNW,qlc@2
dev:////pci@9,600000
faulted but still in service
FRU
: "MB"
(hc://:product-id=SUNW,Sun-Fire-V490:server-id=SRVR/motherboard=0)
faulty
Description : Either
the transmitting device sent an invalid request or the
receiving device is reporting an internal fault.
Refer to http://sun.com/msg/PCI-8000-AP for more
information.
Response
: One or more device instances may be disabled
Impact
: Possible loss of services provided by the device instances
associated with this fault
Action
: Ensure that the latest drivers and patches are installed.
Otherwise schedule a repair procedure to replace
the affected
device(s). Use fmadm faulty to identify the
devices or contact
Sun for support.
---------------
------------------------------------ -------------- ---------
TIME
EVENT-ID
MSG-ID
SEVERITY
---------------
------------------------------------ -------------- ---------
Feb 02 17:36:45 50af5cf4-571a-cee2-b76b-85b4d0c0430b
SUN4U-8007-KY Major
Host
: SRVR
Platform
: SUNW,Sun-Fire-V490 Chassis_id :
Product_sn :
Fault class :
fault.memory.dimm-page-retires-excessive 95%
Affects
: mem:///unum=Slot,A:J8001
faulted but still in service
FRU
: mem:///unum=Slot,A:J8001 95%
faulty
Serial ID. :
887214
Description : The
number of correctable errors associated with this memory
module has exceeded acceptable levels.
Refer to http://sun.com/msg/SUN4U-8007-KY for more
information.
Response
: Pages of memory associated with this memory module have been
removed from service, up to a limit which has now
been reached.
Impact
: Total system memory capacity has been reduced.
Action
: Schedule a repair procedure to replace the DIMM. Use fmadm faulty
to identify the DIMM to replace.
root@SRVR #
root@SRVR #
Now we can check using
"prtdiag" command whether SLOT-A is disabled or not...
root@SRVR # prtdiag -v
| grep -i mem
Memory size: 16384
Megabytes
=========================
Memory Configuration ===============================
root@SRVR #
root@SRVR # prtdiag -v
System Configuration:
Oracle Corporation sun4u Sun Fire V490
System clock
frequency: 150 MHz
Memory size: 16384
Megabytes
=========================
CPUs ===============================================
Run E$ CPU CPU
Brd CPU
MHz MB Impl. Mask
--- ----- ---- ----
------- ----
B 1, 17
1500 32.0 US-IV+ 2.2 BOARD "A" is missing.....
B 3, 19
1500 32.0 US-IV+ 2.2
=========================
Memory Configuration ===============================
Logical Logical Logical
MC
Bank Bank Bank
DIMM Interleave Interleaved
Brd ID
num size Status Size
Factor with
--- ---
---- ------ ----------- ------
---------- -----------
B 1
0 2048MB no_status 1024MB
8-way 0
B 1
1 2048MB no_status 1024MB
8-way 0
B 1
2 2048MB no_status 1024MB
8-way 0
B 1
3 2048MB no_status 1024MB
8-way 0
B 3
0 2048MB no_status 1024MB
8-way 0
B 3
1 2048MB no_status 1024MB
8-way 0
B 3
2 2048MB no_status 1024MB
8-way 0
B 3
3 2048MB no_status 1024MB
8-way 0
=========================
IO Cards =========================
Bus Max
IO Port
Bus Freq Bus Dev,
Type ID
Side Slot MHz Freq Func State Name
Model
---- ---- ---- ----
---- ---- ---- ----- --------------------------------
----------------------
PCI 8
B 3 33 33 3,0 ok
QLGC,qlc-pci1077,2312.1077.101.2+
PCI 8
B 3 33 33 3,1 ok
QLGC,qlc-pci1077,2312.1077.101.2+
PCI 8
B 5 33 33 5,0 ok
QLGC,qlc-pci1077,2312.1077.101.2+
PCI 8
B 5 33 33 5,1 ok
QLGC,qlc-pci1077,2312.1077.101.2+
.......
So we need to schedule
downtime for Memory replacement....
##########################################################################################
No comments:
Post a Comment