SD-WAN

 View Only
  • 1.  controller lose connection to switch

    Posted Jul 27, 2015 01:58 AM

    Hi

     

    I got this strange problem in which controller will lose connection to switch ( 2920 -24g) .

    It is a test environment , there is minimal load .

     

    I am using HP protector 1.3.13.458 , HP SDN controller 2.5.15.1175 and our custom build app

     

    Anyone can provide a pointer on how to troubleshoot the problem ?

     

    Thanks

     

    [2015-07-27 01:36:59.077] ERROR http-bio-8443-exec-508       hp.keystone                                                       Failed to validate token 2250accef6414d17b97de9f6e41be635 due to com.hp.api.auth.AuthenticationException: Validation error code 404
    [2015-07-27 01:36:59.092] WARN  of-io-74-thread-6            hp.of.ctl                                                         Datapath REVOKED: 10:00:40:a8:f0:ce:86:40, neg=V_1_3, ip=192.168.10.101
    [2015-07-27 01:36:59.092] INFO  DpQPool-9-thread-6           com.mimos.nc.listeners.SwitchListener                            DE0005I SwitchListener event DATAPATH_REVOKED
    [2015-07-27 01:36:59.092] ERROR of-io-74-thread-6            hp.of.ctl                                                         Intercepted unexpected exception: java.lang.IllegalStateException: Main connection already established for dpid 10:00:40:a8:f0:ce:86:40
      com.hp.of.ctl.impl.OpenflowController.panic(OpenflowController.java:1151)
      com.hp.of.ctl.impl.OpenflowController.newMainConnectionReady(OpenflowController.java:859)
      com.hp.of.ctl.impl.OpenflowController.handshakeComplete(OpenflowController.java:848)
      com.hp.of.ctl.impl.OpenflowMessageBuffer.handshakeComplete(OpenflowMessageBuffer.java:234)
      com.hp.of.ctl.impl.OpenflowConnection.inBoundFeaturesReply(OpenflowConnection.java:256)
      ...
    [2015-07-27 01:37:00.080] ERROR http-bio-8443-exec-527       hp.keystone                                                       Failed to validate token 2250accef6414d17b97de9f6e41be635 due to com.hp.api.auth.AuthenticationException: Validation error code 404

    [2015-07-27 01:37:08.490] WARN  of-idle-timer                hp.of.ctl                                                         Closing unresponsive connection from 192.168.10.101
    [2015-07-27 01:37:08.492] INFO  of-idle-timer                hp.of.ctl                                                         Datapath removed: 10:00:40:a8:f0:ce:86:40, neg=V_1_3, ip=192.168.10.101
    [2015-07-27 01:37:08.492] INFO  devown-lh-18-thread-4        com.hp.magellan.ha.MagellanSystem                                 Device Owner event type: OWNERSHIP_LOST. Datapath ID: 10:00:40:a8:f0:ce:86:40
    [2015-07-27 01:37:08.492] INFO  devown-lh-18-thread-4        com.hp.magellan.ha.MagellanSystem                                 New owner: null
    [2015-07-27 01:37:08.492] INFO  DpQPool-9-thread-6           com.mimos.nc.listeners.SwitchListener                            DE0005I SwitchListener event DATAPATH_DISCONNECTED
    [2015-07-27 01:37:08.492] INFO  DpQPool-9-thread-6           com.mimos.nc.listeners.SwitchListener                            DE0005I Remove switch 10:00:40:a8:f0:ce:86:40 Clear client lists
    [2015-07-27 01:37:08.493] INFO  DpQPool-9-thread-6           com.mimos.nc.impl.NetworkAccessControlSwitchManager              DE0005I --[NetworkAccessControlSwitchManager removeSwitch] Removed switch 10:00:40:a8:f0:ce:86:40
    [2015-07-27 01:37:08.493] INFO  MsgDispatcher-5932-thread-1  com.hp.magellan.devicethrottling.ThrottlerServiceImpl            DE0005I Stopping device counter for dpid 10:00:40:a8:f0:ce:86:40.
    [2015-07-27 01:37:08.492] INFO  devown-lh-18-thread-4        com.hp.magellan.ha.MagellanSystem                                 Previous owner: null
    [2015-07-27 01:37:08.493] INFO  devown-lh-18-thread-4        com.hp.magellan.ha.MagellanSystem                                 Is the controller master for the dpid 10:00:40:a8:f0:ce:86:40? true
    [2015-07-27 01:37:08.493] INFO  devown-lh-18-thread-4        com.hp.magellan.ha.MagellanSystem                                 Stopping device actor for dpid 10:00:40:a8:f0:ce:86:40.
    [2015-07-27 01:37:08.494] INFO  .actor.default-dispatcher-24 com.hp.magellan.ha.actor.ControllerActor                          Received device message: {dpid: 10:00:40:a8:f0:ce:86:40, request: STOP}
    [2015-07-27 01:37:08.494] INFO  .actor.default-dispatcher-24 com.hp.magellan.ha.actor.ControllerActor                          Stoping device actor for dpid: 10:00:40:a8:f0:ce:86:40
    [2015-07-27 01:37:08.494] INFO  .actor.default-dispatcher-23 com.hp.magellan.bandwidthmonitor.actor.BandwidthControllerActor   Stoping device bandwidth actor for dpid: 10:00:40:a8:f0:ce:86:40
    [2015-07-27 01:37:08.494] INFO  .actor.default-dispatcher-23 System.out                                                        [ERROR] [07/27/2015 01:37:08.494] [mysystem-akka.actor.default-dispatcher-23] [akka://mysystem/user/bandwidthcontrolleractor/bandwidthactor10:00:40:a8:f0:ce:86:40] Kill (akka.actor.ActorKilledException)
    [2015-07-27 01:37:08.495] INFO  .actor.default-dispatcher-23 System.out                                                        [ERROR] [07/27/2015 01:37:08.494] [mysystem-akka.actor.default-dispatcher-24] [akka://mysystem/user/controlleractor/deviceactor10:00:40:a8:f0:ce:86:40] Kill (akka.actor.ActorKilledException)
    [2015-07-27 01:37:08.550] INFO  sdn-topo-12-thread-1         hp.sdn.net.topo                                                  DE0005I Received new topology: DefaultTopology{supplierId=com.hp.sdn.topo.compute, activeAt=2015-07-27T05:37:08.550Z, deviceCount=0, clusterCount=0, linkCount=0, data=DefaultTopologyData{ts=418637434599668, computeTime=28us}} due to: DEVICE_AVAILABILITY_CHANGED:1



  • 2.  RE: controller lose connection to switch

    Posted Jul 28, 2015 09:34 AM

    sllow,

     

    I am not sure if you checked this article but it seems they are related:

     

    http://h30499.www3.hp.com/t5/SDN-Discussions/Keystone-Validation-Error-Code-404/td-p/6646356#.VbeDafnOd1A

     

    If you still have issues then let us know and we will try to help.

     

    Best Regards,

     

    Carlos

    CoE SDN Team



  • 3.  RE: controller lose connection to switch

    Posted Jul 28, 2015 11:20 PM

    Thanks

    For my case , i realize  this

    [2015-07-28 23:10:28.008] ERROR http-bio-8443-exec-1005      hp.keystone                                                       Failed to validate token 0937a63c15484b1abf2152f3cabe1ee1 due to com.hp.api.auth.AuthenticationException: Validation error code 404

    Is due to the fact when you login to say https://172.16.4.8:8443/sdn/ui/ , but did not close the page and let it idle there ,  you will get the above error once the page show "expired", all you need to do is to close the page in your web browser and the error will go away .


    Anyway my problem is more related to

    [2015-07-27 01:36:59.092] ERROR of-io-74-thread-6            hp.of.ctl                                                         Intercepted unexpected exception: java.lang.IllegalStateException: Main connection already established for dpid 10:00:40:a8:f0:ce:86:40
      com.hp.of.ctl.impl.OpenflowController.panic(OpenflowController.java:1151)
      com.hp.of.ctl.impl.OpenflowController.newMainConnectionReady(OpenflowController.java:859)
      com.hp.of.ctl.impl.OpenflowController.handshakeComplete(OpenflowController.java:848)
      com.hp.of.ctl.impl.OpenflowMessageBuffer.handshakeComplete(OpenflowMessageBuffer.java:234)
      com.hp.of.ctl.impl.OpenflowConnection.inBoundFeaturesReply(OpenflowConnection.java:256)
     

     

    After some investigation , it seems to related to a random dead lock that occurs in our application event processing thread .

    Whenener that occurs , it block the event processing thread and trigger the above problem , I suspect  it will cause  controller to get stuck and thus panic , once it panic , it recreates the connection to the switch .

     

    Once the dead lock issue is solved , the problem goes away .

     

    Thanks for your help !!

     



  • 4.  RE: controller lose connection to switch

    Posted Jul 31, 2015 09:55 AM

    Sllow,

     

    Could you provide us with more information on what application is causing the controller to panic?

     

    Best Regards,

     

    Carlos