RemoteDeathWatchSpec failure on jenkins.typesafe.com
This failed on jenkins.typesafe.com
https://jenkins.typesafe.com/job/akka/1461
I'm not certain that there isn't a race in the test code. How do we know that we have killed the actor, since PoisonPill is a normal message and Watch is a system message?
Maybe we have been testing the wrong thing for a while?
https://jenkins.typesafe.com/job/akka/1461
[info] RemoteDeathWatchSpec:
[info] The Death Watch
[info] - must notify with one Terminated message when an Actor is stopped (105 milliseconds)
[info] - notify with one Terminated message when an Actor is already dead *** FAILED *** (25 seconds, 6 milliseconds)
[info] java.lang.AssertionError: assertion failed: timeout (25 seconds) during expectMsg: Actor[akka://RemoteDeathWatchSpec/user/$b]: Stopped or Already terminated when linking
[info] at scala.Predef$.assert(Predef.scala:173)
[info] at akka.testkit.TestKitBase$class.expectMsgPF(TestKit.scala:313)
[info] at akka.testkit.TestKit.expectMsgPF(TestKit.scala:643)
[info] at akka.actor.DeathWatchSpec$$anonfun$2.akka$actor$DeathWatchSpec$class$$anonfun$$expectTerminationOf$1(DeathWatchSpec.scala:42)
[info] at akka.actor.DeathWatchSpec$$anonfun$2$$anonfun$apply$mcV$sp$2.apply$mcV$sp(DeathWatchSpec.scala:62)
[info] at akka.actor.DeathWatchSpec$$anonfun$2$$anonfun$apply$mcV$sp$2.apply(DeathWatchSpec.scala:56)
[info] at akka.actor.DeathWatchSpec$$anonfun$2$$anonfun$apply$mcV$sp$2.apply(DeathWatchSpec.scala:56)
[info] at org.scalatest.WordSpec$$anon$2.apply(WordSpec.scala:2179)
[info] at org.scalatest.Suite$class.withFixture(Suite.scala:1974)
[info] at akka.testkit.AkkaSpec.withFixture(AkkaSpec.scala:54)
[info] at org.scalatest.WordSpec$class.invokeWithFixture$1(WordSpec.scala:2176)
[info] at org.scalatest.WordSpec$$anonfun$runTest$1.apply(WordSpec.scala:2185)
[info] at org.scalatest.WordSpec$$anonfun$runTest$1.apply(WordSpec.scala:2185)
[info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:198)
[info] at org.scalatest.WordSpec$class.runTest(WordSpec.scala:2185)
[info] at akka.testkit.AkkaSpec.runTest(AkkaSpec.scala:54)
[info] at org.scalatest.WordSpec$$anonfun$runTests$1.apply(WordSpec.scala:2250)
[info] at org.scalatest.WordSpec$$anonfun$runTests$1.apply(WordSpec.scala:2250)
[info] at org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:260)
[info] at org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:249)
[info] at scala.collection.immutable.List.foreach(List.scala:309)
[info] at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:249)
[info] at org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:265)
[info] at org.scalatest.SuperEngine$$anonfun$org$scalatest$SuperEngine$$runTestsInBranch$1.apply(Engine.scala:249)
[info] at scala.collection.immutable.List.foreach(List.scala:309)
[info] at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:249)
[info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:326)
[info] at org.scalatest.WordSpec$class.runTests(WordSpec.scala:2250)
[info] at akka.testkit.AkkaSpec.runTests(AkkaSpec.scala:54)
[info] at org.scalatest.Suite$class.run(Suite.scala:2303)
[info] at akka.testkit.AkkaSpec.org$scalatest$WordSpec$$super$run(AkkaSpec.scala:54)
[info] at org.scalatest.WordSpec$$anonfun$run$1.apply(WordSpec.scala:2297)
[info] at org.scalatest.WordSpec$$anonfun$run$1.apply(WordSpec.scala:2297)
[info] at org.scalatest.SuperEngine.runImpl(Engine.scala:362)
[info] at org.scalatest.WordSpec$class.run(WordSpec.scala:2297)
[info] at akka.testkit.AkkaSpec.org$scalatest$BeforeAndAfterAll$$super$run(AkkaSpec.scala:54)
[info] at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:213)
[info] at akka.testkit.AkkaSpec.run(AkkaSpec.scala:54)
[info] at org.scalatest.tools.ScalaTestFramework$ScalaTestRunner.run(ScalaTestFramework.scala:180)
[info] at org.scalatools.testing.Runner2.run(Runner2.java:16)
[info] at sbt.TestRunner.delegateRun(TestFramework.scala:57)
[info] at sbt.TestRunner.run(TestFramework.scala:51)
[info] at sbt.TestRunner.runTest$1(TestFramework.scala:71)
[info] at sbt.TestRunner.run(TestFramework.scala:80)
[info] at sbt.TestFramework$$anonfun$6$$anonfun$apply$8$$anonfun$7$$anonfun$apply$9.apply(TestFramework.scala:178)
[info] at sbt.TestFramework$$anonfun$6$$anonfun$apply$8$$anonfun$7$$anonfun$apply$9.apply(TestFramework.scala:178)
[info] at sbt.TestFramework$.sbt$TestFramework$$withContextLoader(TestFramework.scala:190)
[info] at sbt.TestFramework$$anonfun$6$$anonfun$apply$8$$anonfun$7.apply(TestFramework.scala:178)
[info] at sbt.TestFramework$$anonfun$6$$anonfun$apply$8$$anonfun$7.apply(TestFramework.scala:178)
[info] at sbt.Tests$$anonfun$makeSerial$1$$anonfun$apply$8.apply(Tests.scala:121)
[info] at sbt.Tests$$anonfun$makeSerial$1$$anonfun$apply$8.apply(Tests.scala:121)
[info] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
[info] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233)
[info] at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
[info] at scala.collection.immutable.List.foreach(List.scala:76)
[info] at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
[info] at scala.collection.immutable.List.map(List.scala:76)
[info] at sbt.Tests$$anonfun$makeSerial$1.apply(Tests.scala:121)
[info] at sbt.Tests$$anonfun$makeSerial$1.apply(Tests.scala:121)
[info] at sbt.std.Transform$$anon$3$$anonfun$apply$2.apply(System.scala:47)
[info] at sbt.std.Transform$$anon$3$$anonfun$apply$2.apply(System.scala:47)
[info] at sbt.std.Transform$$anon$5.work(System.scala:71)
[info] at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:232)
[info] at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:232)
[info] at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:18)
[info] at sbt.Execute.work(Execute.scala:238)
[info] at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:232)
[info] at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:232)
[info] at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159)
[info] at sbt.CompletionService$$anon$2.call(CompletionService.scala:30)
[info] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
[info] at java.util.concurrent.FutureTask.run(FutureTask.java:138)
[info] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
[info] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
[info] at java.util.concurrent.FutureTask.run(FutureTask.java:138)
[info] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
[info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
[info] at java.lang.Thread.run(Thread.java:662)
I'm not certain that there isn't a race in the test code. How do we know that we have killed the actor, since PoisonPill is a normal message and Watch is a system message?
Maybe we have been testing the wrong thing for a while?
Leave a comment
on 2012-10-01 12:44 *
By bjorn.antonsson@typesafe.com
Assigned to set to bjorn.antonsson@typesafe.com
on 2012-10-02 06:32 *
By Patrik Nordwall
I agree that it could be so that the actor is not stopped when adding the watch, but in that case the test should become the same as the one above, i.e. "notify with one Terminated message when an Actor is stopped"
on 2012-10-02 13:14 *
By bjorn.antonsson@typesafe.com
Yes, if the actor is properly and fully watched before the PoisonPill is processed it will be the same as the test before it.. Maybe there is a small race there that makes us loose the message.
Can't find anything inherently wrong with the test. Either it tests watching after termination or watching before termination.
So after wading through the watch, dispatch and system message mailbox code and inserting delays and and message reordering trying to provoke things I still can neither reproduce the bug or find a hole in the watch/terminate code.
Will close this for now.
So after wading through the watch, dispatch and system message mailbox code and inserting delays and and message reordering trying to provoke things I still can neither reproduce the bug or find a hole in the watch/terminate code.
Will close this for now.
on 2012-10-08 07:50 *
By Patrik Nordwall
Assigned to changed from bjorn.antonsson@typesafe.com to -none-
Status changed from Invalid to New
Failed again: https://jenkins.typesafe.com/job/akka/1487/consoleText
on 2012-10-08 23:13 *
By Patrik Nordwall
Assigned to set to Patrik Nordwall
Status changed from New to Accepted
on 2012-10-09 05:23 *
By bjorn.antonsson@typesafe.com
So after a great team effort we have now found the problem.
These are the steps:
These are the steps:
- Actor A is constructed by actorOf and its name is reserved in the children container, and the actor instance is created but we haven't yet run intiChild, that puts the real actor into the children container.
- In Actor As constructor we send Watch to Actor B on a remote system, but Actor B is dead so we get a terminated sent to Actor A
- The remoting tries to look up Actor A but finds a reserved name in the children container and returns Nobody which gives us an EmptyLocalActorRef
- The message is dropped on the floor. Hard!
on 2012-10-09 05:28 *
By Patrik Nordwall
Assigned to changed from Patrik Nordwall to bjorn.antonsson@typesafe.com
on 2012-10-09 18:34 *
By viktorklang
Looking forward to the PR!