Improve efficiency of the FD heartbeat

We could, instead of letting everyone send heartbeats to everyone, let each node in the node ring only send heartbeats to N number of nodes after him in the node ring (and since every node does that we are covering/monitoring the whole cluster that way - by continuously shifting 1 step in the ring- including any potential jumps between racks or data centers).
If a node is marked as unreachable then that information is gossiped out (to the whole cluster) anyway through the regular gossip protocol.
I can't see why this scheme would be any less efficient in detecting failure. But would be a lot more efficient in terms of resources.
Thoughts?

Leave a comment

on 2012-06-28 08:02 *

By Patrik Nordwall

That is the same idea that I had with ticket #2283.
Perhaps we should use the deputy nodes in the algorithm also, typically in different datacenters.

When rebalancing (changing buddies) you need to tell the monitor that you will stop heartbeating. Changing buddies for heartbeating shouldn't be done too often, because it resets the heartbeat history.

on 2012-06-28 08:17 *

By Patrik Nordwall

Priority changed from Normal (3) to Low (4)

on 2012-06-28 08:17 *

By Jonas Bonér

Funny since the idea came to me when walking back from lunch, before seeing your ticket.

on 2012-10-07 23:48 *

By Patrik Nordwall

Assigned to set to Patrik Nordwall

Status changed from New to Test

Since this is the number one known scalability bottleneck I took a stab at it. I use consistent hashing instead of ring order, since this will have better re-balancing characteristics. https://github.com/akka/akka/pull/787

on 2012-10-07 23:48 *

By Patrik Nordwall

Priority changed from Low (4) to Normal (3)

on 2012-10-07 23:48 *

By Patrik Nordwall

Component changed from None to cluster

on 2012-10-15 02:54 *

By Patrik Nordwall

Milestone changed from Coltrane to 2.1-RC1

Status changed from Test to Fixed

Drop the files anywhere in this page to upload them as attachments.

Add a Relation

Attachments

Related Tickets

Followers

Improve efficiency of the FD heartbeat

Related Tickets

Add people from your team or external to follow ticket activity

Followers will receive email updates about new ticket activity or emails sent to akka+2284@tickets.assembla.com

Attachments

Related Tickets

Followers

Improve efficiency of the FD heartbeat

Granting access, please wait...

Related Tickets

Add people from your team or external to follow ticket activity

Followers will receive email updates about new ticket activity or emails sent to akka+2284@tickets.assembla.com