Симптомы, упали и через непродолжительное время поднялись все интерфейсы одной из плат, если к данной карте подлючены сателиты, их интерфейсы также перейдут в состояние down.
Проверяем текущее состояние, нас интересует 0/2 слот:
1 2 3 4 |
RP/0/RSP0/CPU0:PE1# admin show platform 0/2/CPU0 Node Type State Config State ----------------------------------------------------------------------------- 0/2/CPU0 A9K-2x100GE-TR IOS XR RUN PWR,NSHUT,MON |
В логах выглядит так:
1 2 3 4 5 |
RP/0/RSP0/CPU0:PE1# show log | i Card reset LC/0/2/CPU0:Aug 16 16:20:15.398 MSK: pfm_node_lc[294]: %PLATFORM-PFM-0-CARD_RESET_REQ : Card reset requested by: Process ID: 192582 (fialc), Target node: 0/2/CPU0, CondID: 9048, Fault Reason: Egress Mux FPGA has encountered uncorrectable error. Reloading LC to recover LC/0/2/CPU0:Aug 16 16:20:15.398 MSK: syslog_dev[90]: pfm_node_lc[294] PID-192587: Request Graceful Reboot via Sysmgr: Reason: Card reset requested by: Process ID: 192582 (fialc), Target node: 0/2/CPU0, CondID: 9048, Fault Reason: Egress Mux FPGA has encountered uncorrectable error. Reloading LC to recover LC/0/2/CPU0:Aug 16 16:20:15.450 MSK: syslog_dev[90]: pfm_node_lc[294] PID-192587: reboot internal : cause code 671088667 cause Card reset requested by: Process ID: 192582 (fialc), Target node: 0/2/CPU0, CondID: 9048, Fault Reason: Egress Mux FPGA has encountered uncorrectable error. Reloading LC to recover |
Аптайм карты и причину перезагрузки можно посмотреть:
1 2 3 4 5 6 7 |
RP/0/RSP0/CPU0:PE1# admin show logging onboard uptime location 0/2/CPU0 ------------------------------------------------------------------------------- UPTIME CONTINUOUS INFORMATION (Node: node0_2_CPU0) ------------------------------------------------------------------------------- Current reset reason : 0x05 Current uptime : 0 years 0 weeks 0 days 0 hours 8 minutes ------------------------------------------------------------------------------- |
Причины перезагрузки, взято отсюда:
CPU_RESET_UNKNOWN = 1, (CBC was reset after CPU was reset. So, CBC doesn’t know)
CPU_RESET_OIR_POR = 2, (Board was plugged-in and CBC powered-on board by default)
CPU_RESET_SRESET = 3, (CBC received a CAN message to S-Reset CPU)
CPU_RESET_HRESET = 4, (CBC received a CAN message to H-Reset CPU)
CPU_RESET_POR = 5, (CBC received a CAN message to Power-Off or Power-Cycle CPU)
CPU_RESET_WDOG_SRESET = 6, (Watchdog expired and CBC S-Reset CPU so CPU can collect core-dump)
CPU_RESET_WDOG_HRESET = 7, (Watchdog expired and CBC H-Reset CPU)
CPU_RESET_WDOG_POR = 8, (Watchdog expired and CBC power-cycled board)
CPU_RESET_PSEQFAIL_POR = 9, (CBC power-cycled board following power-sequencer failure)
CPU_RESET_PWR_OFF = 10, (Board powered-off)
CPU_RESET_PLDREQ_SRESET = 11, (Lance / Mace S-Reset CPU)
CPU_RESET_PLDREQ_HRESET = 12, (Lance / Mace H-Reset CPU)
CPU_RESET_AUTO_RESET = 13, (CPU reset autonomously without informing CBC)
CPU_RESET_MCLR_PROLONGED_HOLD = 14, (CPU held in reset for several minutes, typically during PLD upgrade)
Если логи затерлись, посмотреть историю ребутов карты:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
RP/0/RSP0/CPU0:PE1# show reboot history location 0/2/CPU0 No Time Cause Code Reason -------------------------------------------------------------------------------- 01 Fri Aug 16 16:20:15 2019 0x2c00001b Cause: Card reset requested by: Proces s ID: 192582 (fialc), Target node: 0/2 /CPU0, CondID: 9048, Fa 02 Tue Jan 15 09:06:39 2019 0x2c00001b Cause: Card reset requested by: Proces s ID: 528468 (prm_server_ty), Target n ode: 0/2/CPU0, CondID: 03 Mon Dec 10 22:23:03 2018 0x04000043 Cause: Reloading managed node Process: insthelper 04 Sat Jul 16 16:58:32 2016 0x2c00001b Cause: pfm_dev_sm_perform_recovery_act ion, Card reset requested by: Process ID: 172100 (fialc), Fau 05 Tue Oct 13 22:10:17 2015 0x04000043 Cause: Reloading managed node Process: insthelper 06 Tue Oct 13 21:59:38 2015 0x04000043 Cause: Reloading managed node Process: insthelper 07 Wed Sep 10 21:23:43 2014 0x04000043 Cause: Reloading managed node Process: insthelper 08 Wed Jan 1 02:58:10 2014 0x2c00001b Cause: pfm_dev_sm_perform_recovery_act ion, Card reset requested by: Process ID: 172100 (fialc), Fau 09 Wed Jan 1 02:58:10 2014 0x2c00001b Cause: pfm_dev_sm_perform_recovery_act ion, Card reset requested by: Process ID: 172100 (fialc), Fau 10 Thu Nov 21 05:46:28 2013 0x0400004f Cause: MBI-HELLO reloading node on rec eiving reload notification Process: mbi-hello 11 Tue Nov 5 21:02:23 2013 0x0400004f Cause: MBI-HELLO reloading node on rec eiving reload notification Process: mbi-hello 12 Tue Nov 5 20:46:16 2013 0x0400004f Cause: MBI-HELLO reloading node on rec eiving reload notification Process: mbi-hello 13 Tue Nov 5 20:18:10 2013 0x0400004f Cause: MBI-HELLO reloading node on rec eiving reload notification Process: mbi-hello 14 Tue Jul 9 16:48:39 2013 0x2c000007 Cause: INIT: respawn 'instsetup' disab led, exit_code 139, INIT_MAX_SPAWN rea ched Process: init 15 Tue Jul 9 16:48:39 2013 0x2c000007 Cause: INIT: respawn 'instsetup' disab led, exit_code 139, INIT_MAX_SPAWN rea ched Process: init |
По «Cause Code«: «0x2c00001b» видно, что из-за аппаратных проблем карта ресетилась несколько раз, можно оформлять RMA.
Core файлы:
1 2 3 4 5 6 7 |
RP/0/RSP0/CPU0:PE1# dir harddisk:/dumper Directory of harddisk:/dumper 25243 -rwx 80813 Tue Jan 15 09:06:39 2019 LC2.190115-060639.crashinfo.by.pfm_node_lc 25244 -rwx 851968 Tue Jan 15 09:06:39 2019 LC2.190115-060639.pcds 25248 -rwx 83870 Fri Aug 16 16:20:15 2019 LC2.190816-132015.crashinfo.by.pfm_node_lc 25281 -rwx 851968 Fri Aug 16 16:20:15 2019 LC2.190816-132015.pcds 6576652288 bytes total (6552807424 bytes free) |