REMOTE: Support trap exit of remote actor crash generically
Currently we only track trap exit when sending the reply message after a ? message.
Proposed solution:
Clustered actor has two parts:
Part 1 - The remote instance side of things
1. Create a hidden "system" Supervisor actor on the cluster node
2. Link the remote actor instance to this Supervisor
3. ZK is used to set a Watch (ephemeral node) for linked actor
4. The path for this Watch is send back to the ClusterActorRef that represents this instance (see part 2 for how that is managed on in the ClusterActorRef)
5. Now when remote actor fails the Watch should be removed (which will trigger the ClusterActorRef on the client side)
6. ... wait for ClusterActorRef to do its job
7. When Supervisor receives a Restart message from the ClusterActorRef then it sends an Exit message to its linked actor to restart the remote actor.
Part 2 - The client/ClusterActorRef side of things
1. The ClusterActorRef receives the path to the Watch from the remote instance
2. It then adds a Listener to this path (Watch)
3. When Watch is triggered, notifying our Listener then we send the remote actor a Restart message to the "system" Supervisor on this node (go to part 1 item 7)
Use the RemoteDaemonActor channel (in ClusterNode) for all "system" communication. Add necessary commands to the ClusterProtocol.proto.
All life-cycle messages (Restart, Exit, etc.) should be sent to a the EventHandler.
Proposed solution:
Clustered actor has two parts:
- The client/ClusterActorRef on the "client" cluster node
- The remote instance(s) (that the client talks to) on another cluster node
Part 1 - The remote instance side of things
1. Create a hidden "system" Supervisor actor on the cluster node
2. Link the remote actor instance to this Supervisor
3. ZK is used to set a Watch (ephemeral node) for linked actor
4. The path for this Watch is send back to the ClusterActorRef that represents this instance (see part 2 for how that is managed on in the ClusterActorRef)
5. Now when remote actor fails the Watch should be removed (which will trigger the ClusterActorRef on the client side)
6. ... wait for ClusterActorRef to do its job
7. When Supervisor receives a Restart message from the ClusterActorRef then it sends an Exit message to its linked actor to restart the remote actor.
Part 2 - The client/ClusterActorRef side of things
1. The ClusterActorRef receives the path to the Watch from the remote instance
2. It then adds a Listener to this path (Watch)
3. When Watch is triggered, notifying our Listener then we send the remote actor a Restart message to the "system" Supervisor on this node (go to part 1 item 7)
Use the RemoteDaemonActor channel (in ClusterNode) for all "system" communication. Add necessary commands to the ClusterProtocol.proto.
All life-cycle messages (Restart, Exit, etc.) should be sent to a the EventHandler.
Leave a comment
on 2010-09-02 14:03 *
By Jonas Bonér
Description changed from Currently we only track tra... to Currently we only track tra...
Priority changed from Normal (3) to Highest (1)
on 2010-09-23 13:53 *
By viktorklang
I'm guessing that this comment is linked to this: -)
private def notifySupervisorWithMessage(notification: LifeCycleMessage) = {
// FIXME to fix supervisor restart of remote actor for oneway calls, inject a supervisor proxy that can send notification back to client
_supervisor.foreach { sup =>
if (sup.isShutdown) { // if supervisor is shut down, game over for all linked actors
shutdownLinkedActors
stop
} else sup ! notification // else notify supervisor
}
}
Couldn't it be solved in implementing "notifySupervisorWithMessage" differently in RemoteActorRef (which I am assuming that _supervisor is transformed into)?
private def notifySupervisorWithMessage(notification: LifeCycleMessage) = {
// FIXME to fix supervisor restart of remote actor for oneway calls, inject a supervisor proxy that can send notification back to client
_supervisor.foreach { sup =>
if (sup.isShutdown) { // if supervisor is shut down, game over for all linked actors
shutdownLinkedActors
stop
} else sup ! notification // else notify supervisor
}
}
Couldn't it be solved in implementing "notifySupervisorWithMessage" differently in RemoteActorRef (which I am assuming that _supervisor is transformed into)?
This is highest prio IMHO.
on 2010-10-28 14:54 *
By viktorklang
I'd like some assistance with this, what's needed and which solution is most viable?
on 2011-02-23 15:28 *
By Jonas Bonér
Assigned to changed from viktorklang to -none-
Milestone changed from 1.1 to 1.2
Priority changed from Highest (1) to Normal (3)
on 2011-07-06 08:47 *
By Jonas Bonér
Description changed from Currently we only track tra... to Currently we only track tra...
on 2011-07-06 09:03 *
By Jonas Bonér
Description changed from Currently we only track tra... to Currently we only track tra...
on 2011-10-07 14:13 *
By viktorklang
Summary changed from Support trap exit of remote actor crash generically to REMOTE: Support trap exit of remote actor crash generically
This is superceded by the Remote DeathWatch