Wired Intelligent Edge

 View Only
  • 1.  8360 and removal of fans...

    Posted Sep 12, 2024 04:01 AM

    Hello,

    I just made "p**p-into-fan" test for core stack (which was by the way successfull) where I noted that if I remove two fan modules out of three from 8360-32Y4C switch will reboot itself... Is that how it should happen if I remove more than one fan modules from switch?

    I think it's anyway enough to have one fan module throttling 100% rpm and use gaffa tape to close open holes until getting replacement fans?!? Noise is terrible but better to have noise and running switch... 



    ------------------------------
    Jori Luoto
    AV-IT Specialist
    ------------------------------


  • 2.  RE: 8360 and removal of fans...

    Posted Sep 12, 2024 06:14 PM

    Hi Jori, AFAIK an Aruba CX 8360-32Y4C runs with 3 Fan modules, FM1, FM2 and FM3 (three "red" JL714A if Port-to-Power airflow or three "blue" JL715A if Power-to-Port airflow) and those three Fan modules use the 2+1 redundancy logic for hot swapping. Documentation reports:

    "Three fans are required for normal operation. During normal operation fans operate at reduced speed. If a fan fails, the remaining fans will be boosted to 100% speed indefinitely. The fan module removal and replacement can be done without a tool. If a fan fails leave installed in switch so airflow is not disturbed by open slot."

    I'm pretty sure that one single Fan Module's failure shouldn't cause any operational issue to the Switch (provided that a replacement Fan Module is readily available as spare part and installed as soon as possible given that the two remaining Fan Modules go at 100% RPM to keep the Switch cooled in absence of the third Fan Module failed), the same can't be said if you consider the scenario of two Fan Modules concurrent failures (two out of three)...in this case (your case) I believe the remaining single Fan Module can't sustain the cooling requirements of the entire Switch for long (minutes? hours?) or, at worst, in the immediacy of the failure (indeed I don't know - I admit I never tried - what happens if two Fan Modules are removed...does that force/cause an the immediate Switch reboot or is there a some sort of graceful delay - say just few minutes, useful for manual fixing with replacement Fan Modules - before the Switch autonomously reboots? given what you wrote the Switch reboots immediately after you remove the second Fan Modules after the first one, isn't it?).




  • 3.  RE: 8360 and removal of fans...

    Posted Sep 13, 2024 08:32 AM

    Hello,

    Now when I see that text I remember I have read it from documentation (my memory is sooo great :D)

    I think when switch is running in normal mode fans are running way less than 20% and with one fan and 100% rpm air pressure from front is so high that switch really need to have very high load to overheat... But as you said: what are the odds that two fan modules breaks.

    While reading log entries about my testing (from bottom to top, sorry..) It tells that there should be 3 minutes grace period for shutdown but I claim that ports were shutdown quite fast after taking off second fan... (anyway this setup seems to be very fault tolerant, in 3Gbps videostream had maybe 500ms blackout when it changed to go throug another member of stack)

    2024-09-12T09:54:07.535114+00:00 fin-017d-core2-15 fand[1047]: Event|220|LOG_INFO|AMM|1/1|Fan tray Tray-1/2 airflow is front-to-back.
    2024-09-12T09:54:06.548960+00:00 fin-017d-core2-15 fand[1047]: Event|220|LOG_INFO|AMM|1/1|Fan tray Tray-1/3 airflow is front-to-back.
    2024-09-12T09:54:05.568667+00:00 fin-017d-core2-15 fand[1047]: Event|220|LOG_INFO|AMM|1/1|Fan tray Tray-1/1 airflow is front-to-back.
    2024-09-12T09:54:05.536166+00:00 fin-017d-core2-15 fand[1047]: Event|207|LOG_INFO|AMM|1/1|Fan module PSU-1/1/1 was inserted.
    2024-09-12T09:54:05.533742+00:00 fin-017d-core2-15 fand[1047]: Event|219|LOG_INFO|AMM|1/1|Fan tray PSU-1/1 powered on.
    2024-09-12T09:54:04.602551+00:00 fin-017d-core2-15 powerd[1048]: Event|301|LOG_INFO|||PSU 1/1 changed state to OK
    2024-09-12T09:53:51.261532+00:00 fin-017d-core2-15 fand[1047]: Event|219|LOG_INFO|AMM|1/1|Fan tray PSU-1/1 powered off.
    2024-09-12T09:53:50.213878+00:00 fin-017d-core2-15 fand[1047]: Event|220|LOG_INFO|AMM|1/1|Fan tray Tray-1/2 airflow is front-to-back.
    2024-09-12T09:53:49.564552+00:00 fin-017d-core2-15 powerd[1048]: Event|304|LOG_ERR|||PSU 1/1 faulted. Total fault count: 2
    2024-09-12T09:53:49.564450+00:00 fin-017d-core2-15 powerd[1048]: Event|301|LOG_INFO|||PSU 1/1 changed state to Input Fault
    2024-09-12T09:53:49.227718+00:00 fin-017d-core2-15 fand[1047]: Event|220|LOG_INFO|AMM|1/1|Fan tray Tray-1/3 airflow is front-to-back.
    2024-09-12T09:53:48.246649+00:00 fin-017d-core2-15 fand[1047]: Event|220|LOG_INFO|AMM|1/1|Fan tray Tray-1/1 airflow is front-to-back.
    2024-09-12T09:53:48.229380+00:00 fin-017d-core2-15 fand[1047]: Event|207|LOG_INFO|AMM|1/1|Fan module PSU-1/2/1 was inserted.
    2024-09-12T09:53:48.225521+00:00 fin-017d-core2-15 fand[1047]: Event|219|LOG_INFO|AMM|1/1|Fan tray PSU-1/2 powered on.
    2024-09-12T09:53:47.944405+00:00 fin-017d-core2-15 powerd[1048]: Event|301|LOG_INFO|||PSU 1/2 changed state to OK
    2024-09-12T09:53:38.047159+00:00 fin-017d-core2-15 fand[1047]: Event|219|LOG_INFO|AMM|1/1|Fan tray PSU-1/2 powered off.
    2024-09-12T09:53:37.970807+00:00 fin-017d-core2-15 hpe-restd[1504]: Event|4657|LOG_INFO|AMM|-|User admin logged out of REST session from 172.20.15.9
    2024-09-12T09:53:37.969640+00:00 fin-017d-core2-15 hpe-restd[1504]: Event|4608|LOG_INFO|AMM|-|Authorization allowed for user admin, for resource SessionMgmt, with action POST
    2024-09-12T09:53:37.313773+00:00 fin-017d-core2-15 powerd[1048]: Event|304|LOG_ERR|||PSU 1/2 faulted. Total fault count: 1
    2024-09-12T09:53:37.313671+00:00 fin-017d-core2-15 powerd[1048]: Event|301|LOG_INFO|||PSU 1/2 changed state to Input Fault
    2024-09-12T09:53:29.883780+00:00 fin-017d-core2-15 fand[1047]: Event|220|LOG_INFO|AMM|1/1|Fan tray Tray-1/2 airflow is front-to-back.
    2024-09-12T09:53:28.897918+00:00 fin-017d-core2-15 fand[1047]: Event|220|LOG_INFO|AMM|1/1|Fan tray Tray-1/3 airflow is front-to-back.
    2024-09-12T09:53:27.916087+00:00 fin-017d-core2-15 fand[1047]: Event|220|LOG_INFO|AMM|1/1|Fan tray Tray-1/1 airflow is front-to-back.
    2024-09-12T09:53:27.882350+00:00 fin-017d-core2-15 fand[1047]: Event|207|LOG_INFO|AMM|1/1|Fan module PSU-1/1/1 was inserted.
    2024-09-12T09:53:27.879478+00:00 fin-017d-core2-15 fand[1047]: Event|219|LOG_INFO|AMM|1/1|Fan tray PSU-1/1 powered on.
    2024-09-12T09:53:27.007625+00:00 fin-017d-core2-15 powerd[1048]: Event|301|LOG_INFO|||PSU 1/1 changed state to OK
    2024-09-12T09:53:16.611277+00:00 fin-017d-core2-15 fand[1047]: Event|205|LOG_INFO|AMM|1/1|Fan tray PSU-1/1 was inserted.
    2024-09-12T09:53:16.040934+00:00 fin-017d-core2-15 powerd[1048]: Event|304|LOG_ERR|||PSU 1/1 faulted. Total fault count: 1
    2024-09-12T09:53:16.040801+00:00 fin-017d-core2-15 powerd[1048]: Event|301|LOG_INFO|||PSU 1/1 changed state to Input Fault
    2024-09-12T09:52:58.254404+00:00 fin-017d-core2-15 fand[1047]: Event|204|LOG_INFO|AMM|1/1|Fan tray PSU-1/1 was removed.
    2024-09-12T09:52:57.306870+00:00 fin-017d-core2-15 powerd[1048]: Event|301|LOG_INFO|||PSU 1/1 changed state to Absent
    2024-09-12T09:52:56.166090+00:00 fin-017d-core2-15 fand[1047]: Event|219|LOG_INFO|AMM|1/1|Fan tray PSU-1/1 powered off.
    2024-09-12T09:52:55.150513+00:00 fin-017d-core2-15 powerd[1048]: Event|304|LOG_ERR|||PSU 1/1 faulted. Total fault count: 1
    2024-09-12T09:52:55.150409+00:00 fin-017d-core2-15 powerd[1048]: Event|301|LOG_INFO|||PSU 1/1 changed state to Input Fault
    2024-09-12T09:52:33.743596+00:00 fin-017d-core2-15 fand[1047]: Event|220|LOG_INFO|AMM|1/1|Fan tray Tray-1/2 airflow is front-to-back.
    2024-09-12T09:52:32.756741+00:00 fin-017d-core2-15 fand[1047]: Event|220|LOG_INFO|AMM|1/1|Fan tray Tray-1/3 airflow is front-to-back.
    2024-09-12T09:52:31.776125+00:00 fin-017d-core2-15 fand[1047]: Event|220|LOG_INFO|AMM|1/1|Fan tray Tray-1/1 airflow is front-to-back.
    2024-09-12T09:52:31.775853+00:00 fin-017d-core2-15 fand[1047]: Event|212|LOG_INFO|AMM|1/1|System shutdown timer is cancelled.
    2024-09-12T09:52:31.714026+00:00 fin-017d-core2-15 fand[1047]: Event|207|LOG_INFO|AMM|1/1|Fan module Tray-1/1/2 was inserted.
    2024-09-12T09:52:31.208311+00:00 fin-017d-core2-15 fand[1047]: Event|207|LOG_INFO|AMM|1/1|Fan module Tray-1/1/1 was inserted.
    2024-09-12T09:52:30.710817+00:00 fin-017d-core2-15 fand[1047]: Event|205|LOG_INFO|AMM|1/1|Fan tray Tray-1/1 was inserted.
    2024-09-12T09:52:30.710675+00:00 fin-017d-core2-15 fand[1047]: Event|220|LOG_INFO|AMM|1/1|Fan tray Tray-1/1 airflow is front-to-back.
    2024-09-12T09:52:27.693245+00:00 fin-017d-core2-15 fand[1047]: Event|211|LOG_ALERT|AMM|1/1|Shutting down system in 98 seconds because 1 missing system fan tray(s) exceeds limit of 0
    2024-09-12T09:52:22.627651+00:00 fin-017d-core2-15 fand[1047]: Event|211|LOG_ALERT|AMM|1/1|Shutting down system in 103 seconds because 1 missing system fan tray(s) exceeds limit of 0
    2024-09-12T09:52:17.562304+00:00 fin-017d-core2-15 fand[1047]: Event|211|LOG_ALERT|AMM|1/1|Shutting down system in 108 seconds because 1 missing system fan tray(s) exceeds limit of 0
    2024-09-12T09:52:12.488185+00:00 fin-017d-core2-15 fand[1047]: Event|211|LOG_ALERT|AMM|1/1|Shutting down system in 113 seconds because 1 missing system fan tray(s) exceeds limit of 0
    2024-09-12T09:52:07.423000+00:00 fin-017d-core2-15 fand[1047]: Event|211|LOG_ALERT|AMM|1/1|Shutting down system in 118 seconds because 1 missing system fan tray(s) exceeds limit of 0
    2024-09-12T09:52:02.355211+00:00 fin-017d-core2-15 fand[1047]: Event|211|LOG_ALERT|AMM|1/1|Shutting down system in 123 seconds because 1 missing system fan tray(s) exceeds limit of 0
    2024-09-12T09:51:57.289276+00:00 fin-017d-core2-15 fand[1047]: Event|211|LOG_ALERT|AMM|1/1|Shutting down system in 128 seconds because 1 missing system fan tray(s) exceeds limit of 0
    2024-09-12T09:51:52.222617+00:00 fin-017d-core2-15 fand[1047]: Event|211|LOG_ALERT|AMM|1/1|Shutting down system in 133 seconds because 1 missing system fan tray(s) exceeds limit of 0
    2024-09-12T09:51:47.156762+00:00 fin-017d-core2-15 fand[1047]: Event|211|LOG_ALERT|AMM|1/1|Shutting down system in 138 seconds because 1 missing system fan tray(s) exceeds limit of 0
    2024-09-12T09:51:42.091592+00:00 fin-017d-core2-15 fand[1047]: Event|211|LOG_ALERT|AMM|1/1|Shutting down system in 143 seconds because 1 missing system fan tray(s) exceeds limit of 0
    2024-09-12T09:51:37.026645+00:00 fin-017d-core2-15 fand[1047]: Event|211|LOG_ALERT|AMM|1/1|Shutting down system in 148 seconds because 1 missing system fan tray(s) exceeds limit of 0
    2024-09-12T09:51:31.961693+00:00 fin-017d-core2-15 fand[1047]: Event|211|LOG_ALERT|AMM|1/1|Shutting down system in 154 seconds because 1 missing system fan tray(s) exceeds limit of 0
    2024-09-12T09:51:26.897294+00:00 fin-017d-core2-15 fand[1047]: Event|211|LOG_ALERT|AMM|1/1|Shutting down system in 159 seconds because 1 missing system fan tray(s) exceeds limit of 0
    2024-09-12T09:51:21.832151+00:00 fin-017d-core2-15 fand[1047]: Event|211|LOG_ALERT|AMM|1/1|Shutting down system in 164 seconds because 1 missing system fan tray(s) exceeds limit of 0
    2024-09-12T09:51:15.851596+00:00 fin-017d-core2-15 fand[1047]: Event|211|LOG_ALERT|AMM|1/1|Shutting down system in 170 seconds because 1 missing system fan tray(s) exceeds limit of 0
    2024-09-12T09:51:10.785277+00:00 fin-017d-core2-15 fand[1047]: Event|211|LOG_ALERT|AMM|1/1|Shutting down system in 175 seconds because 1 missing system fan tray(s) exceeds limit of 0
    2024-09-12T09:51:05.718763+00:00 fin-017d-core2-15 fand[1047]: Event|218|LOG_INFO|AMM|1/1|Fan speed index for thermal zone 0 is at maximum.
    2024-09-12T09:51:05.718558+00:00 fin-017d-core2-15 fand[1047]: Event|211|LOG_ALERT|AMM|1/1|Shutting down system in 180 seconds because 1 missing system fan tray(s) exceeds limit of 0
    2024-09-12T09:51:05.661258+00:00 fin-017d-core2-15 fand[1047]: Event|204|LOG_INFO|AMM|1/1|Fan tray Tray-1/1 was removed.



    ------------------------------
    Jori Luoto
    AV-IT Specialist
    ------------------------------



  • 4.  RE: 8360 and removal of fans...

    Posted Sep 16, 2024 03:51 PM

    Yeah, I have tested this on the CX switches (6000/8000) switches and the way that Aruba has programed this is kinda dumb.  If you fail more than 1 fan it will cause the switch to start a shutdown timer and it will be in a perpetual loop of booting back up to just reboot again in 180 seconds.

    The "more than 1 fan" failure is important to note here.

    CX6300:
    2 fan trays - this is 4 total fans/
    Pull out one fan try and it will start the shutdown timer because "more than 1 fan" has failed (two fans in each tray).
    Unplug one fan from each fan tray so that each fan tray has only one 1 working in it - system will start a reboot timer since there is two fans that are failed.

    I can power up a CX6300 with one fan tray and 1 power supply.  If one of the fan trays fails (two fans) it should just turn off one PSU and not reboot the switch.  The logic for the reboot seems really basic and not well thought out.

    CX8325 48port - 6 fans - Remove two of them and it will start the reboot timer.





  • 5.  RE: 8360 and removal of fans...

    Posted Sep 17, 2024 07:55 AM
    Edited by parnassus Sep 17, 2024 08:21 AM

    Hi, are we saying here that the Switch reacts against the failure of just one single Fan installed into a very single Fan Tray (a Fan Tray is made of a Fans pair working in push/pull mode) as that eventi is compared to the failure of an entire Fan Tray, thus triggering the shutdown timer? ...or are we saying that the Switch reacts against the failure of just one single Fan Tray (no matter which Fan failed inside that particular Fan Tray) and so it triggers the shutdown timer?

    In any case, that would be quite strange...if any of the above scenarios is true...then I expect the Switch to trigger the shutdown timer on our Aruba 8320 even with the simple failure of just one Fan of the ten it is equipped with (the Switch works with five Fan Trays...so five Fans pairs...so ten Fans in total): if that is true, it would be very problematic IMHO because the failure of just one Fan Tray or the Failure of one Fan in a Fan Tray or the concurrent failure of two Fans (belonging either to the very same Fan Tray or belonging to two different Fan Trays) could be a quite common event during the entire Switch lifecycle.

    Maybe the failure of just one single Fan made its Fan Tray unusable at all (I really doubt since each Fan unit is RPM controlled so it is precisely managed by the Switch Hardware/Software logic)...could it be?

    The Aruba 8320 documentation reports: "System can sustain 4 fans only for a short period of time during fan replacement. 5 fans should
    be used in normal operation. Each unit is a push/pull" so, given that "5 fans" are cited, I understand that a Fan is a Fan Tray (so 2 Fans) in reality...and so the Switch should be capable to sustain (for a limited amout of time: 3 minutes?) the operation with 4 Fan Trays only (8 Fans) 

    Looking at the logs you initially posted above it looks like you had two sets of failure (Fan Trays removal and PSU removal), did  you tested the sequential removal of one Fan Tray (2 Fans) at time only without touching the PSUs?




  • 6.  RE: 8360 and removal of fans...

    Posted Sep 18, 2024 05:41 AM
    Hello,

    I haven't had testested to break single fan inside fan module but I assume if fan inside module breaks it renders module as non-functional (pure speculation thoug!!)

    In my logs there is few different tests in that log and I think psu removal had no effect to fan case. "Issue" came up when I took off second fan module. As far as I undestood documentation, system worked almost like that.. One thing I did not catch from documentation was that switch reboots immediately and after that it will reboot between 180minutes until there is altleast 50% of functional fan modules available (6300 series has two modules and it runs ok with one fan module)


    Ystävällisin terveisin / Med vänliga hälsningar / Yours sincerely

    Jori Luoto

    Technical Specialist
    Audico Logo Vaaka Azure.png
    Audico Systems Oy
    Olarinluoma 12
    FI-02200 Espoo
    T: +358 20 767 9 429 
    @: Jori.Luoto@audico.fi
    www.audicosystems.com








  • 7.  RE: 8360 and removal of fans...

    Posted Sep 18, 2024 09:05 AM

    Hello Jori,

    "I haven't had testested to break single fan inside fan module but I assume if fan inside module breaks it renders module as non-functional (pure speculation thoug!!)"

    Yes, me too...pure speculations. We can't (well, we don't want to) perform test against our production VSX Clusters just to see what is going to happen by removing a Fan Tray or a pair of Fan Trays; OTOH we've no staging switches to safely play with and I personally love to keep our switches as "up&running" as I can (apart from being normally rebooted during scheduled updates/upgrades, during the last six years they suffered zero Hardware/Software issues...that's a very good track record IMHO).

    "In my logs there is few different tests in that log and I think psu removal had no effect to fan case. "Issue" came up when I took off second fan module. As far as I undestood documentation, system worked almost like that.. One thing I did not catch from documentation was that switch reboots immediately and after that it will reboot between 180minutes until there is altleast 50% of functional fan modules available (6300 series has two modules and it runs ok with one fan module)"

    OK so the Fan Tray(s) various requirements reasonably adhere to Aruba known documentation (180 seconds - 3 minutes, not 180 minutes). The more Fan Trays a switch owns the better...




  • 8.  RE: 8360 and removal of fans...

    Posted Sep 12, 2024 06:35 PM

    Here we go: