Reliable remote supervision and death watch
Supervision and detection of termination of remote children needs a water proof solution. Currently we have the cluster add-on via RemoteDeploymentWatcher in ClusterActorRefProvider, which uses watch and generates ChildTerminated messages.
This is not perfect, since ChildTerminated is not idempotent and there is a potential race when creating a new child with the same name.
Explore the possibility to fold RemoteDeploymentWatcher functionality into the ActorCell so that the parent takes care of AddressTerminated itself.
Remember that Terminated can also be generated via deadLetters from connection failures when sending Watch.
A complete re-design together with reliable system messages might be considered. See #1478
Also, remove the note in cluster documentation: ".. note:: Creating a remote deployed child actor with the same name as the terminated
actor is not fully supported..."
This is not perfect, since ChildTerminated is not idempotent and there is a potential race when creating a new child with the same name.
Explore the possibility to fold RemoteDeploymentWatcher functionality into the ActorCell so that the parent takes care of AddressTerminated itself.
Remember that Terminated can also be generated via deadLetters from connection failures when sending Watch.
A complete re-design together with reliable system messages might be considered. See #1478
Also, remove the note in cluster documentation: ".. note:: Creating a remote deployed child actor with the same name as the terminated
actor is not fully supported..."
Leave a comment
on 2013-01-31 10:34 *
By Patrik Nordwall
Description changed from Supervision and detection o... to Supervision and detection o...
on 2013-02-22 09:21 *
By Patrik Nordwall
Component changed from None to actor
Summary changed from Reliable remote supervision to Discuss reliable remote supervision
on 2013-04-04 18:54 *
By Jonas Bonér
This is a feature that we can not ship 2.2 without.
What is the status?
I'd like to see it done pretty soon so we can test the crap out of it.
What is the status?
I'd like to see it done pretty soon so we can test the crap out of it.
on 2013-04-08 12:13 *
By Patrik Nordwall
Estimate changed from Small to Large
Sum of child estimates changed from 1.0 to 7.0
on 2013-04-08 12:13 *
By Patrik Nordwall
Summary changed from Reliable remote supervision to Reliable remote supervision and death watch
on 2013-04-09 06:19 *
By Patrik Nordwall
I will take a first stab at the FD and heartbeating for remote watch.
Then we need to integrate with #2594 and friends.
Then we need to integrate with #2594 and friends.
on 2013-04-09 14:39 *
By Patrik Nordwall
Assigned to set to Patrik Nordwall
Status changed from New to Accepted