TransformationSampleJapiSpec failed
Leave a comment
on 2012-12-20 10:38 *
By bjorn.antonsson@typesafe.com
Assigned to set to bjorn.antonsson@typesafe.com
Status changed from New to Accepted
So from looking at the logs, the whole system went out for a coffe break for about 10 seconds in the middle of the test, and we got back a negative time to wait for a message. Writing this off as the system misbehaving.
Will close as invalid.
Will close as invalid.
I wish that was true, but it failed again (not sure if it's the same thing).
https://jenkins.akka.io:8498/job/akka-local-jdk7/112/consoleFull
https://jenkins.akka.io:8498/job/akka-local-jdk7/112/consoleFull
on 2012-12-21 07:09 *
By bjorn.antonsson@typesafe.com
O yes, it looks very much like the same thing. My previous comment doesn't hold. I miss read some of the test code. I've been running that test with debug logging the whole night (60 times) and not a single failure. :(
I just noticed that they both happened on a2 (might be a coincidence), but I've changed my job to only run on that machine. Let's see what happens.
I just noticed that they both happened on a2 (might be a coincidence), but I've changed my job to only run on that machine. Let's see what happens.
on 2012-12-21 13:05 *
By bjorn.antonsson@typesafe.com
So we have a failure with debug output:
https://jenkins.akka.io:8498/job/repeat-akka-local-jdk7/99/consoleFull
What happens is that JVM-2 which is the second frontend (the one that times out) only receives its first BackendRegistration message after 15 seconds.
I think that it's simply a matter joining taking a while and of the gossip randomness not hitting the right nodes and "convergence" on the backend nodes being to slow, hence the MemberUp event not being published on any of the backends in a timely manner.
https://jenkins.akka.io:8498/job/repeat-akka-local-jdk7/99/consoleFull
What happens is that JVM-2 which is the second frontend (the one that times out) only receives its first BackendRegistration message after 15 seconds.
[JVM-2] [DEBUG] [12/21/2012 11:31:30.285] [main] [ActorSystem(TransformationSampleJapiSpec)] passed barrier all-started
<Lots of chatter here>
[JVM-3] [INFO] [12/21/2012 11:31:41.178] [TransformationSampleJapiSpec-akka.actor.default-dispatcher-4] [akka://TransformationSampleJapiSpec/system/cluster/core] Cluster Node [tcp.akka://TransformationSampleJapiSpec@localhost:34720] - Leader is moving node [tcp.akka://TransformationSampleJapiSpec@localhost:50062] from JOINING to UP
<Some more chatter here>
[JVM-2] [DEBUG] [12/21/2012 11:31:45.170] [TransformationSampleJapiSpec-akka.actor.default-dispatcher-3] [akka://TransformationSampleJapiSpec/system/endpointManager/endpointWriter-tcp.akka%3A%2F%2FTransformationSampleJapiSpec%40localhost%3A40530-2] received local message RemoteMessage: [BackendRegistration] to [Actor[akka://TransformationSampleJapiSpec/user/frontend]]<+[akka://TransformationSampleJapiSpec/user/frontend] from [Actor[tcp.akka://TransformationSampleJapiSpec@localhost:40530/user/backend]]
I think that it's simply a matter joining taking a while and of the gossip randomness not hitting the right nodes and "convergence" on the backend nodes being to slow, hence the MemberUp event not being published on any of the backends in a timely manner.
on 2012-12-21 13:08 *
By Patrik Nordwall
yes, sounds reasonable, publish on convergence takes longer time than before