We are currently have a problem with this annoying errors
This is main symptom one of our ucs domains for vdi.
Symptoms:
1.1 main error messages
Affected object: sys/mgmt-entity-A
Description: device chassis num., error accessing shared-storage
Type: management
Cause:device-shared-storage-error
Highest severity: Warning.
Fault appeared and cleared after a while
1.2 show cluster log
I can check the cluster state for my ucs domain.
I find out errors for chassis
FI-A#sh cluster state
Chassis x, serial:xxxxxxxx, state:active with errors
Fabric A, chassis-seeprom local IO failure:
xxxxxxxx READ_FAILED, error: TIMEOUT, error code:10, error count:4
Warning: there are pending management I/O errors on one or more devices, failover may not complete
Workaround:
2.1
It has a limitation of hardware of having reliable communication between IOM and SEEPROM in the chassis backplane(i2c delay)
The error message indicates IOM had an issue in accessing SEEPROM in the chassis backplane.
(1) Unplug the IO module.
(2) Replug in the IO module. Make sure the module is in contact with the backplane firmly.
(3) Reboot the IO module.
2.2
1)A) If the error accessing shared-storage fault is currently in cleared state and does not raise again, do not apply the work around and do not do anything
2)B) If the error accessing shared-storage fault is raised state and is never cleared, or the fault keeps flip - comes (raised) and goes (cleared), try the following;
2.2.1 SSH to fabric interconnect
2.2.2 FI-A# connect local-mgmt
2.2.3 FI-A(local-mgmt)# connect iom x(chassis num)
2.2.4 fex-1#show platform soft cmc thermal status | grep status
status: #ACTIVE
9(i1), 13(peer_therm_status),1(not_applicable), 0 "PEER_IOM_THERM"
=> If it says PASSIVE, you need to restart this IOM (IOM 1 on fabric A)
via UCSM, if it says ACTIVE, it's the other side (IOM 2 on fabric B) that
needs to be reset.
I need to reset IOM 2 on fabric B
2.2.5. To do reset, Equipment -> IO Modules -> IO Module 2 -> Reset IO Module
2.2.6 It doesn't solve the problem. you need to workaround as belows
Remove PSU1 let sit for 2 minutes replace, wait 10 seconds confirm PSU1 has power, Move to PSU2
Remove PSU2 let sit for 2 minutes replace, wait 10 seconds has power, Move to PSU3
Remove PSU3 let sit for 2 minutes replace, wait 10 seconds PSU3 has power, Move to PSU4
Remove PSU4 let sit for 2 minutes replace, wait 10 seconds PSU4 has power, Move to Fan1
Remove Fan1 let sit for 30 seconds replace, wait 10 seconds confirm Fan1 has power, Move to Fan2
Remove Fan2 let sit for 30 seconds replace, wait 10 seconds confirm Fan2 has power, Move to Fan3
Remove Fan3 let sit for 30 seconds replace, wait 10 seconds confirm Fan3 has power, Move to Fan4
Remove Fan4 let sit for 30 seconds replace, wait 10 seconds confirm Fan4 has power, Move to Fan5
Remove Fan5 let sit for 30 seconds replace, wait 10 seconds confirm Fan5 has power, Move to Fan6
Remove Fan6 let sit for 30 seconds replace, wait 10 seconds confirm Fan6 has power, Move to Fan7
Remove Fan7 let sit for 30 seconds replace, wait 10 seconds confirm Fan7 has power, Move to Fan8
Remove Fan8 let sit for 30 seconds replace, wait 10 seconds confirm Fan8 has power, Move to IO MOD1
Remove IO Mod 1 let sit for 5 minutes replace, confirm that IO MOD is UP and Running before you reseat IOMOD 2
Once IO MOD1 is Up and Running finally reseat IO MOD 2 let sit for 5 minutes, and place it back into the chassis.
This is the complete reseat process to clear the i2c bus.
'Cisco UCS' 카테고리의 다른 글
| [UCS]VMware's Server Virtualization and Cisco's Virtualized Networking Technology (0) | 2024.03.01 |
|---|---|
| [UCS] vmnic received packets dropped (0) | 2024.03.01 |
| [UCSM] FI6454, IOM2408 thermal problem, fan equipment inoperable (0) | 2024.02.29 |
| [UCS B200M5] POWER_SYS_FLT:Sensor Failure (0) | 2024.02.28 |