Hi,
You had a crash of the main module at 02-01-24 02:10:23
In my experience the only way to know the cause of the crash is to send the file to support. The answer has always been upgrade to the version a big is fixed. So upgrading to the latest when available could short circuit the process.
The one thing to add is that I also have crash indicator on every module on every 5406. I think they stay forever so will be historical. Especially when POE is involved.
If the crash happens again you might need to raise a call with the info you sent to see if there is a clue. The version and error code is sometimes enough.
Original Message:
Sent: 2/28/2024 1:37:00 PM
From: Reed_Krueger
Subject: RE: Switch Self Restarts With No Crash Logs
Thank you so much for replying! Here is my show boot:
hostname# sh boot
Mgmt Module 1 -- Saved Crash Information (most recent first):
=============================================================
ID: e9d62a4e
Active system went down: 02/01/24 02:10:57 KB.16.02.0013 528
Health Monitor: Restr Mem Access
HW Addr=0x00000000 IP=0xf679c7c Task='mIpPktRecv' Task ID=0x3fb02640
sp:0x1e0578b8 lr:0xf4e1424
msr: 0x02029200 xer: 0x00000000 cr: 0x24000400
Mgmt Module 2 -- Saved Crash Information (most recent first):
=============================================================
Slot D -- Saved Crash Information (most recent first):
=======================================================
ID: 85c85f55
Slot D subsystem went down: 05/28/22 03:38:42 KB.16.02.0013 562
Software exception in ISR at interrupts_ahs.c:5247
-> FR Int ERROR_STATUS =0x18000020
------------------------------------------------------------
MM:Management Module, IM: Interface Module
------------------------------------------------------------
TimeStamp Type Core Dump File Name Build Version
----------------- -------- ---------------------------- -----------------
02-01-24 02:10:23 MM1 M_SG71G4C0KC.cor KB.16.02.0013
04-01-91 14:36:10 Slot D I_SG7297R01N.cor KB.16.02.0013
So if I'm reading this output correctly, there was a memory issue which caused Module D to have an issue which caused the crash? Sh mod indicates there is a core dump on every line card but that's true for all my modules onsite and unfortunately due to security I can't tftp my files anywhere.
Original Message:
Sent: Feb 23, 2024 02:41 AM
From: IanNightingale
Subject: Switch Self Restarts With No Crash Logs
Hi, I can confirm this switch will crash but then show messages that suggest it was a power cycle. If you have two power supplies then that rules out a very local issue (plug not fitting right) and you've confirmed the rack had power.
Two things to look at:
1) show boot.
switch12# show boot
Mgmt Module 1 -- Saved Crash Information (most recent first):
=============================================================
ID: 49747532
Active system went down: 02/19/15 11:54:57 K.15.10.0015m 616
Operator warm reload.
No Core-dump Files Present.
Note how long the crash info is saved for.
2) show modules - This has a column if an individual module (line card) has a dump. I've seen enough times a crash happen when a 'hot swap' module was inserted. Not that someone was there at the time for you but it might give a hint.
Also, try copying the crash-logs (copy crash-files mm-active tftp...) to see if there is anything in them.
If you see nothing in all of the above I would put this down to a hardware failure.
Original Message:
Sent: Feb 22, 2024 06:23 PM
From: Reed_Krueger
Subject: Switch Self Restarts With No Crash Logs
Hi Everyone!
I got an alarm from IMC stating one of my 5412Rzl2 J9851A switches had a cold start. This was odd, as I was under the impression that cold starts can only happen after power loss and the other switch in the same rack/power circuit did not lose power. Our EC verified there were no power events on that rack (they have power monitors). Digging into the logs, I saw this:
I 02/01/24 02:10:58 03006 system: AM1: Reason for system reboot: Self reset
M 02/01/24 02:10:58 00064 system: AM1: Health Monitor: Restr Mem Access
I 02/01/24 02:10:58 00063 system: AM1: System went down: 02/01/24 02:10:57
I 02/01/24 02:10:58 00061 system: AM1: -----------------------------------------
I 02/01/24 02:10:55 03803 chassis: xM1: System Self test completed on Master
I feel like this switch crashed but it does not show any reason why it self reset. I've been poking around in various forums looking for information but I can't find anything. I'm running KB.16.02.0013 and didn't see any major issues relating to this problem. Has anyone else experienced this?