Developer

Β View Only
last person joined: 4 days ago 

Expand all | Collapse all

AOSCX 8360 REST: declarative switch configuration fail triggered by ospf sessions (?)

This thread has been viewed 31 times
  • 1.  AOSCX 8360 REST: declarative switch configuration fail triggered by ospf sessions (?)

    Posted 23 days ago

    Hi,

    I want to automate the configuration of some 8360-32Y4C switches using the following approach:

    • based on ansible (using ansible.builtin.uri)
    • create the complete text configuration for each switch
    • replace the running configuration via the following api calls:
      1. upload & validate config: /rest/v10.13/fullconfigs/startup-config?from=tftp://xxx.xxx.xxx.xxx/config.txt
      2. replace running config: /rest/v10.13/fullconfigs/running-config?from=/rest/v10.13/fullconfigs/startup-config

    This works almost smooth in the lab but after moving the switches to the data center i hit some real weird issue with the first api call resulting in a HTTP 422 error code:

    fatal: [switch -> localhost]: FAILED! => {"cache_control": "no-cache, no-store", "changed": true, "connection": "close", "content": "Error in config operation. unknown\n", "content_length": "35", ... "status": 422, ...

    The switch is a router with a dual-stack ospf configuration and it seems that the first api call triggers the issue if there are established ospf sessions (we have aoscx, cisco and bird neighbours). This is realy weird since replacing the startup config should never conflict with the running config nor any process states.

    Doing the same via cli also triggers an error:

    switch# copy tftp://xxx.xxx.xxx.xxx/config.txt startup-config vrf mgmt
    Copying configuration: [/]   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100  6169  100  6169    0     0  1506k      0 --:--:-- --:--:-- --:--:-- 1506k
    100  6169  100  6169    0     0  1506k      0 --:--:-- --:--:-- --:--:-- 1506k
    Copying configuration: [Failure]
    remove copy operation to startup config failed.

    2024-06-27T13:20:50.985780+0000 hpe-config[87535] <INFO> Event|6801|LOG_INFO|AMM|-|Copying configs from: URL: tftp://xxx.xxx.xxx.xxx/config.txt, using vrf: mgmt to: startup-config

    Sadly I was not able to find some more debugging details. The problem occurs at least on LL.10.13.1020 and LL.10.14.0001.

    I am grateful for any hints what the problem might be.



  • 2.  RE: AOSCX 8360 REST: declarative switch configuration fail triggered by ospf sessions (?)

    Posted 23 days ago

    After some more debugging I've found the trigger for this issue πŸ˜…  in the local syslog file /var/log/messages (access via start-shell):

    2024-06-27T14:06:16.359435+00:00 switch hpe-config-ckptpostcfg[3339]: 2024/06/27 14:06:16 Transaction Failed due to an error: constraint violation, details: Transaction causes multiple rows in "Route_Resolution" table to have identical values (ad98438f-42a0-4fbf-bbb3-1a065ef4bf72, bfd, "fe80::xxxx:xxff:fexx:xxxx", and []) for index on columns "vrf", "origin", "address", and "port".  First row, with UUID b173fcc4-d383-4a41-a43b-f2d5ebcb424a, had the following index values before the transaction: ad98438f-42a0-4fbf-bbb3-1a065ef4bf72, bfd, "fe80::xxxx:xxff:fexx:xxxx", and 30bde81c-c719-426e-be59-eca418912ebe.  Second row, with UUID 55af9777-6b5d-4f56-8c1a-49785c61b9a2, had the following index values before the transaction: ad98438f-42a0-4fbf-bbb3-1a065ef4bf72, bfd, "fe80::xxxx:xxff:fexx:xxxx", and 1e27d3d0-ff6e-416a-b547-66229b5ebd19. in operation: commit, table: , transaction info: 

    So it seems there is a index violation in a routing related table, triggered by the current bfd state! It seems only to be triggered if there are some routes learned from ospf neighbors with a established bfd session.




  • 3.  RE: AOSCX 8360 REST: declarative switch configuration fail triggered by ospf sessions (?)

    Posted 22 days ago

    Follow-up: when enabling bfd for ospfv3 the switch logs that it is not supported:

    2024-06-28T06:52:27.298857+0000 bfdd[1637] <WARN> Event|7317|LOG_WARN|AMM|1/1|BFD echo is not supported on IPv6 sessions

    You can still configure it and it seems to work... as long you do not try to replace your running config. There seems to be some limitations on bfd and ipv6 in AOSCX - vrrp does only support bfd in ipv4 😞




  • 4.  RE: AOSCX 8360 REST: declarative switch configuration fail triggered by ospf sessions (?)

    Posted 22 days ago

    Question regarding the found issue - are the commands/configurations you're applying valid if you were to just enter through CLI and not apply through REST/TFTP transfer?



    ------------------------------
    Ti Chiapuzio-Wong (they/them)
    HPE Aruba Networking
    ------------------------------



  • 5.  RE: AOSCX 8360 REST: declarative switch configuration fail triggered by ospf sessions (?)

    Posted 22 days ago

    It does not matter if cli or rest api is used at all, I can reproduce it both ways.

    The bug triggers when replacing the running-config by the startup-config (or any other checkpoint I guess), but only if bfd is already active. I assume that it is required to have more than one ospfv3 adj to another router while having ecmp enabled. Our setup has two broadcast domains where each router is attached and the bug only triggered when having ospfv3 adj on both simultaneously.




  • 6.  RE: AOSCX 8360 REST: declarative switch configuration fail triggered by ospf sessions (?)

    Posted 22 days ago

    Hi, is it json configs you are creating? I just generate cli based configs for ztp, and are using ansible collection for Aruba CX to do changes.

    The cli based configs will act as a merge and not replace is my observation. The only way I have been able to replacing configs is to used a backup of config or a snapshot, both in json format. 



    ------------------------------
    Arne Opdal
    ------------------------------



  • 7.  RE: AOSCX 8360 REST: declarative switch configuration fail triggered by ospf sessions (?)

    Posted 22 days ago

    Hi Arne,

    we always build the complete switch configuration from scratch using ansible as a single source of truth. When a cli based config is uploaded via `/rest/v10.13/fullconfigs/startup-config` (or copy REMOTE_URL startup-config on the cli) the complete startup-config is replaced:

    The only caveat is that you need to pull the config via a remote url and not upload it directly via the REST API. There after the configuration can be activated with another PUT:

    The source parameter is a little bit weird (/rest/'version'/fullconfigs/startup-config), looks like the switch pulls the startup-config via a local api call as json.

    We had one problem when defining multiple keyrings for ospf: the order in the text configuration seems no to be predictable. The configuration still worked but comparing the configs always did report a required change. As a workaround we moved the keys into the interface's ospf config. 

    HTH,

    Thomas




  • 8.  RE: AOSCX 8360 REST: declarative switch configuration fail triggered by ospf sessions (?)

    Posted 22 days ago

    When you say you move the switches to the data center, is it physically or via config?



    ------------------------------
    Ti Chiapuzio-Wong (they/them)
    HPE Aruba Networking
    ------------------------------



  • 9.  RE: AOSCX 8360 REST: declarative switch configuration fail triggered by ospf sessions (?)

    Posted 22 days ago

    Both ;) 

    The lab environment was much simpler so the bug did not trigger. With moving to the data center we switched to a more complex poc setup which still allows to perform debugging on this issue.




  • 10.  RE: AOSCX 8360 REST: declarative switch configuration fail triggered by ospf sessions (?)

    Posted 11 days ago

    Doing some more debugging I think I found the trigger of the issue:

    router# show bfd | incl fe80::208:e3ff:feff:fc1c
    54      lag33.32  default                          fe80::d4e0:53c0:209c:3e40               fe80::208:e3ff:feff:fc1c                N/A      up           ospfv3      
    82      lag31.31  default                          fe80::d4e0:53c0:1f9c:3e40               fe80::208:e3ff:feff:fc1c                N/A      down         ospfv3      

    This neighbor is one of our Cisco switches and they use the same mac addresses on all virtual interfaces and use the same ipv6 link-local addresses (this cannot be changed). This seems to brick the bfd feature of ipv6 ospf on Aruba CX (for the copy command and the bfd itself).




  • 11.  RE: AOSCX 8360 REST: declarative switch configuration fail triggered by ospf sessions (?)

    Posted 3 days ago

    Hi @fiasko ! Thank you for putting in all that labor to find the route cause - is this something you can log with a TAC case or with your SE? Seems to be an issue with the CX switch itself and not so much the development/API?



    ------------------------------
    Ti Chiapuzio-Wong (they/them)
    HPE Aruba Networking
    ------------------------------



  • 12.  RE: AOSCX 8360 REST: declarative switch configuration fail triggered by ospf sessions (?)

    Posted 3 days ago

    Hi,

    I already have a opened a TAC case and it is currently being processed by the ERT. I will give an update here as soon as there is a valuable result.

    Regards,

    Thomas