The bug triggers when replacing the running-config by the startup-config (or any other checkpoint I guess), but only if bfd is already active. I assume that it is required to have more than one ospfv3 adj to another router while having ecmp enabled. Our setup has two broadcast domains where each router is attached and the bug only triggered when having ospfv3 adj on both simultaneously.
Original Message:
Sent: Jun 28, 2024 12:48 PM
From: Tiffany.Chiapuzio-Wong
Subject: AOSCX 8360 REST: declarative switch configuration fail triggered by ospf sessions (?)
Question regarding the found issue - are the commands/configurations you're applying valid if you were to just enter through CLI and not apply through REST/TFTP transfer?
------------------------------
Ti Chiapuzio-Wong (they/them)
HPE Aruba Networking
Original Message:
Sent: Jun 28, 2024 02:58 AM
From: fiasko
Subject: AOSCX 8360 REST: declarative switch configuration fail triggered by ospf sessions (?)
Follow-up: when enabling bfd for ospfv3 the switch logs that it is not supported:
2024-06-28T06:52:27.298857+0000 bfdd[1637] <WARN> Event|7317|LOG_WARN|AMM|1/1|BFD echo is not supported on IPv6 sessions
You can still configure it and it seems to work... as long you do not try to replace your running config. There seems to be some limitations on bfd and ipv6 in AOSCX - vrrp does only support bfd in ipv4 😞
Original Message:
Sent: Jun 27, 2024 03:13 PM
From: fiasko
Subject: AOSCX 8360 REST: declarative switch configuration fail triggered by ospf sessions (?)
After some more debugging I've found the trigger for this issue 😅 in the local syslog file /var/log/messages
(access via start-shell
):
2024-06-27T14:06:16.359435+00:00 switch hpe-config-ckptpostcfg[3339]: 2024/06/27 14:06:16 Transaction Failed due to an error: constraint violation, details: Transaction causes multiple rows in "Route_Resolution" table to have identical values (ad98438f-42a0-4fbf-bbb3-1a065ef4bf72, bfd, "fe80::xxxx:xxff:fexx:xxxx", and []) for index on columns "vrf", "origin", "address", and "port". First row, with UUID b173fcc4-d383-4a41-a43b-f2d5ebcb424a, had the following index values before the transaction: ad98438f-42a0-4fbf-bbb3-1a065ef4bf72, bfd, "fe80::xxxx:xxff:fexx:xxxx", and 30bde81c-c719-426e-be59-eca418912ebe. Second row, with UUID 55af9777-6b5d-4f56-8c1a-49785c61b9a2, had the following index values before the transaction: ad98438f-42a0-4fbf-bbb3-1a065ef4bf72, bfd, "fe80::xxxx:xxff:fexx:xxxx", and 1e27d3d0-ff6e-416a-b547-66229b5ebd19. in operation: commit, table: , transaction info:
So it seems there is a index violation in a routing related table, triggered by the current bfd state! It seems only to be triggered if there are some routes learned from ospf neighbors with a established bfd session.
Original Message:
Sent: Jun 27, 2024 09:44 AM
From: fiasko
Subject: AOSCX 8360 REST: declarative switch configuration fail triggered by ospf sessions (?)
Hi,
I want to automate the configuration of some 8360-32Y4C switches using the following approach:
- based on ansible (using
ansible.builtin.uri
) - create the complete text configuration for each switch
- replace the running configuration via the following api calls:
- upload & validate config:
/rest/v10.13/fullconfigs/startup-config?from=tftp://xxx.xxx.xxx.xxx/config.txt
- replace running config:
/rest/v10.13/fullconfigs/running-config?from=/rest/v10.13/fullconfigs/startup-config
This works almost smooth in the lab but after moving the switches to the data center i hit some real weird issue with the first api call resulting in a HTTP 422 error code:
fatal: [switch -> localhost]: FAILED! => {"cache_control": "no-cache, no-store", "changed": true, "connection": "close", "content": "Error in config operation. unknown\n", "content_length": "35", ... "status": 422, ...
The switch is a router with a dual-stack ospf configuration and it seems that the first api call triggers the issue if there are established ospf sessions (we have aoscx, cisco and bird neighbours). This is realy weird since replacing the startup config should never conflict with the running config nor any process states.
Doing the same via cli also triggers an error:
switch# copy tftp://xxx.xxx.xxx.xxx/config.txt startup-config vrf mgmt
Copying configuration: [/] % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 6169 100 6169 0 0 1506k 0 --:--:-- --:--:-- --:--:-- 1506k
100 6169 100 6169 0 0 1506k 0 --:--:-- --:--:-- --:--:-- 1506k
Copying configuration: [Failure]
remove copy operation to startup config failed.
2024-06-27T13:20:50.985780+0000 hpe-config[87535] <INFO> Event|6801|LOG_INFO|AMM|-|Copying configs from: URL: tftp://xxx.xxx.xxx.xxx/config.txt, using vrf: mgmt to: startup-config
Sadly I was not able to find some more debugging details. The problem occurs at least on LL.10.13.1020
and LL.10.14.0001
.
I am grateful for any hints what the problem might be.