ArubaOS and Controllers

Reply
Occasional Contributor II

Mesh: error has occurred at file meshd_cfg

Hello, We have an Aruba 6000 controller with software version 3.4.2.3.

We are using AP 60's to create a Mesh between 36 buses and garage. The way it works is that there are 4 Mesh Portals in the garage and the Mesh Points are on the buses. When a bus with a Mesh Point comes within range it connects up to the Mesh Portal and devices on the bus can communicate to our in-house servers etc.

After a firmware update to 3.4.2.3 we had to pull all of the Mesh Points and re-provision them which is expected because the software upgrade failed across the bridge. About 3 weeks later we started seeing Mesh Points dropping off from our AP list and showing up as unprovisioned. We pulled all of those out again and re-provisioned them. Another 2-3 weeks passed without any problems and then a bunch of AP's dropped off again. This time we saw that there were some mesh_cfg errors coming from the Mesh Portals. We re-provisioned the Mesh Portals again.

Now the problem is that in our log we are still seeing errors from the Mesh Points and I'm curious if this is what's causing the Mesh Points to loose their configuration and becoming unprovisioned. Here is the Log:

Jun 16 09:00:44 :326085: |AP MPoint.14@10.X.XX.XXX sapd| AM: Setting collection of statistics to : enabled
Jun 16 09:03:36 :399803: |AP MPoint.28@10.X.XX.XXX meshd| An internal system error has occurred at file meshd_cfg.c function meshd_set_parent line 2205 error meshd_set_parent: WARNING, setting new parent 00:0b:86:33:e8:40, but old non-NULL: 00:0b:86:33:ed:a0!.
Jun 16 09:07:16 :326085: |AP MPoint.29@10.X.XX.XXX sapd| AM: Setting collection of statistics to : enabled
Jun 16 09:10:41 :326085: |AP MPoint.2@10.X.XX.XXX sapd| AM: Setting collection of statistics to : enabled
Jun 16 09:24:32 :326085: |AP MPoint.35@10.X.XX.XXX sapd| AM: Setting collection of statistics to : enabled
Jun 16 09:24:39 :326085: |AP MPoint.24@10.X.XX.XXX sapd| AM: Setting collection of statistics to : enabled
Jun 16 09:35:18 :399803: |AP MPoint.24@10.X.XX.XXX meshd| An internal system error has occurred at file meshd_cfg.c function meshd_set_parent line 2205 error meshd_set_parent: WARNING, setting new parent 00:0b:86:3b:e9:20, but old non-NULL: 00:0b:86:33:ed:a0!.
Jun 16 09:43:03 :303022: |AP MPortal.3@10.X.XX.XXX nanny| Reboot Reason: Reboot caused by kernel page fault at virtual address 00000000, epc == 00000000, ra == 80126420
Jun 16 09:44:51 :326085: |AP MPoint.7@10.X.XX.XXX sapd| AM: Setting collection of statistics to : enabled
Jun 16 09:46:51 :326085: |AP MPortal.3@10.X.XX.XXX sapd| AM: Setting collection of statistics to : enabled
Jun 16 09:47:06 :326085: |AP MPoint.14@10.X.XX.XXX sapd| AM: Setting collection of statistics to : enabled
Jun 16 09:54:20 :326085: |AP MPoint.14@10.X.XX.XXX sapd| AM: Setting collection of statistics to : enabled
Jun 16 09:56:57 :326085: |AP MPoint.33@10.X.XX.XXX sapd| AM: Setting collection of statistics to : enabled


The Error that we're seeing a lot is the--- An internal system error has occurred at file meshd_cfg.c function meshd_set_parent line 2205 error meshd_set_parent: WARNING, setting new parent 00:0b:86:3b:e9:20, but old non-NULL: 00:0b:86:33:ed:a0!.

In this log I only show 1 but usually there is one of these every 2-3 minutes for different Mesh Points that come within the Portal range.

I understand that Mesh Points and Portals are supposed to stay within the range of each other as a "bridge" to traverse gaps that are not reachable by wire. For the most part this solution to connect the busses up to the network has worked well even though they come in and out of the Mesh Portal range.
Aruba Employee

Re: Mesh: error has occurred at file meshd_cfg

The WARNING is for a program inconsistency (setting parent but parent not currently NULL) that may be getting exposed by your use case. It should not cause any problems as the program proceeds to set the new parent.

The bigger concern is why you are seeing mesh points show as unprovisioned. Two reasons this would happen are: (1) the mesh-point is provisoned with a group that does not exist on the controller, or (2) the mesh-point is in recovery because it cannot connect using its provisioned cluster(s).

You can check which reason with 'show ap database unprovisioned' e.g. a non-existent group AP:

(ad-sw-3200) (AP provisioning) #show ap database unprovisioned

AP Database
-----------
Name Group AP Type IP Address Status Flags Switch IP
---- ----- ------- ---------- ------ ----- ---------
ad-65-11 nogroup 65 10.3.129.178 Up 14m:23s UGI 10.3.129.232

Flags: U = Unprovisioned; N = Duplicate name; G = No such group; L = Unlicensed
I = Inactive; H = Using 802.11n license
X = Maintenance Mode; P = PPPoE AP; B = Built-in AP
R = Remote AP; R- = Remote AP requires Auth; C = Cellular RAP; c = CERT-based RAP
M = Mesh node; Y = Mesh Recovery

Total APs:1

Note the 'UG' flags indiciating "unprovisioned" and "No such group". For the Recovery case you would see 'UY'.

As you have already reprovisioned the mesh-points it is likely that your mesh points are going into recovery - the CLI command above will show if this is happening. In addition 'show ap mesh active' would show a recovery Cluster-name. If you can ping the mesh points from the controller you should be able to do any reconfiguration/reprovisioning from the controller without physically recovering the mesh-points.

If the mesh-points are going into recovery you can cross check the provisioning of the mesh points and portals is in agreement/mismatch with these commands:
'encrypt disable'
'show ap mesh debug provisioned-clusters '
'encrypt enable'

This should show where the mismatch is and you can fix it by reprovisioning as aprpropriate. As you have already reprovisioned the mesh points, it may be that reprovisioning the portals would fix things.

It's not clear what went wrong during your upgrade: if you upgraded from a pre-3.4 release an upgrade script migrates mesh data to new groups e.g. from mesh_grp3.3 to mesh_grp3.3_MeSh (adds '_MeSh' suffix in new 3.4+ group). It would be helpful prior to any reprovisioning if you could save the output for 'show ap mesh tech-support' for a portal and one of the unprovisioned mesh-points and pass it to Aruba tech-support. If the solution suggested here does not cover your case please open a support ticket.

Hope this helps,
Aidan
Occasional Contributor II

Re: Mesh: error has occurred at file meshd_cfg

Hi Aidan,

Thanks for the feedback. That is all very good information. I will have to wait for a week or two to see if the points drop off again.

Fernando
Contributor I

Re: Mesh: error has occurred at file meshd_cfg

Hi,

we came across a similar issue. It was confirmed to be a BUG on 3.4.x.
you can alternativelty take a look at the meshd logs and confirm whether you get the following notification:

show ap mesh debug meshd-log ap-name

2010-06-11 21:48:03.693: 2 errors found in configuration file '/aruba/bin/mesh_psk.conf'

-======================
This is caused if the mesh cluster name has a space in between. The fix for this has been made available on 5.x. It is however still pending a fix on 3.4.
The symptom included the mesh cluster moved to its recovery profile(where it showed up as unprovisoned).
Guru Elite

Different Issue


Hi,

we came across a similar issue. It was confirmed to be a BUG on 3.4.x.
you can alternativelty take a look at the meshd logs and confirm whether you get the following notification:

show ap mesh debug meshd-log ap-name

2010-06-11 21:48:03.693: 2 errors found in configuration file '/aruba/bin/mesh_psk.conf'

-======================
This is caused if the mesh cluster name has a space in between. The fix for this has been made available on 5.x. It is however still pending a fix on 3.4.
The symptom included the mesh cluster moved to its recovery profile(where it showed up as unprovisoned).




That is a different issue. In your case, if there was a space in the Mesh Profile name AND you are using WPA for the mesh, it would not come up at all. In this thread it would seem that there are different issues.


Colin Joseph
Aruba Customer Engineering

Looking for an Answer? Search the Community Knowledge Base Here: Community Knowledge Base

Occasional Contributor II

Re: Mesh: error has occurred at file meshd_cfg

UPDATE: One of our Mesh Portals was spewing out these errors, filled up the log:

Jul 6 08:42:24 :399803: |AP MPortal.9@10.X.XX.XXX meshd| An internal system error has occurred at file meshd_cfg.c function meshd_sysctl_read_param line 533 error meshd_sysctl_read_param: Error opening /proc/sys/net/aruba_asap/mtu : Too many open files.

I rebooted the Mesh Portal 9 and it works fine for now. However, now we have about 8 Mesh Points that have gone into Unprovisioned state. I did the 'show ap database unprovisioned' command and the unprovisioned mesh points are in the default group. They only show a 'U' in the status field, no 'UY' or anything else.


AP Database
-----------
Name Group AP Type IP Address Status Flags Switch IP
---- ----- ------- ---------- ------ ----- ---------
00:0b:86:ca:0d:24 default 60 10.X.XX.XXX Down U 10.X.XX.XXX


We are planning to replace the Mesh Portal that was causing the errors just to make sure but I'm not so sure it's the mesh portal that is causing this error. Any suggestions?
Occasional Contributor II

Re: Mesh: error has occurred at file meshd_cfg

UPDATE: One of our Mesh Portals was spewing out these errors, filled up the log:

Jul 6 08:42:24 :399803: |AP MPortal.9@10.X.XX.XXX meshd| An internal system error has occurred at file meshd_cfg.c function meshd_sysctl_read_param line 533 error meshd_sysctl_read_param: Error opening /proc/sys/net/aruba_asap/mtu : Too many open files.

I rebooted the Mesh Portal 9 and it works fine for now. However, now we have about 8 Mesh Points that have gone into Unprovisioned state. I did the 'show ap database unprovisioned' command and the unprovisioned mesh points are in the default group. They only show a 'U' in the status field, no 'UY' or anything else.


AP Database
-----------
Name Group AP Type IP Address Status Flags Switch IP
---- ----- ------- ---------- ------ ----- ---------
00:0b:86:ca:0d:24 default 60 10.X.XX.XXX Down U 10.X.XX.XXX


We are planning to replace the Mesh Portal that was causing the errors just to make sure but I'm not so sure it's the mesh portal that is causing this error. Any suggestions?
Occasional Contributor II

Re: Mesh: error has occurred at file meshd_cfg

UPDATE: One of our Mesh Portals was spewing out these errors, filled up the log:

Jul 6 08:42:24 :399803: |AP MPortal.9@10.X.XX.XXX meshd| An internal system error has occurred at file meshd_cfg.c function meshd_sysctl_read_param line 533 error meshd_sysctl_read_param: Error opening /proc/sys/net/aruba_asap/mtu : Too many open files.

I rebooted the Mesh Portal 9 and it works fine for now. However, now we have about 8 Mesh Points that have gone into Unprovisioned state. I did the 'show ap database unprovisioned' command and the unprovisioned mesh points are in the default group. They only show a 'U' in the status field, no 'UY' or anything else.


AP Database
-----------
Name Group AP Type IP Address Status Flags Switch IP
---- ----- ------- ---------- ------ ----- ---------
00:0b:86:ca:0d:24 default 60 10.X.XX.XXX Down U 10.X.XX.XXX


We are planning to replace the Mesh Portal that was causing the errors just to make sure but I'm not so sure it's the mesh portal that is causing this error. Any suggestions?
Guru Elite

Case


UPDATE: One of our Mesh Portals was spewing out these errors, filled up the log:

Jul 6 08:42:24 :399803: |AP MPortal.9@10.X.XX.XXX meshd| An internal system error has occurred at file meshd_cfg.c function meshd_sysctl_read_param line 533 error meshd_sysctl_read_param: Error opening /proc/sys/net/aruba_asap/mtu : Too many open files.

I rebooted the Mesh Portal 9 and it works fine for now. However, now we have about 8 Mesh Points that have gone into Unprovisioned state. I did the 'show ap database unprovisioned' command and the unprovisioned mesh points are in the default group. They only show a 'U' in the status field, no 'UY' or anything else.


AP Database
-----------
Name Group AP Type IP Address Status Flags Switch IP
---- ----- ------- ---------- ------ ----- ---------
00:0b:86:ca:0d:24 default 60 10.X.XX.XXX Down U 10.X.XX.XXX


We are planning to replace the Mesh Portal that was causing the errors just to make sure but I'm not so sure it's the mesh portal that is causing this error. Any suggestions?




If you haven't opened a case. please open one so that they can collect ALL the information and get you some help.


Colin Joseph
Aruba Customer Engineering

Looking for an Answer? Search the Community Knowledge Base Here: Community Knowledge Base

Search Airheads
cancel
Showing results for 
Search instead for 
Did you mean: