Auto-down threshold
To avoid auto-down of small group of nodes in case of network partition, see #2265, we need a configurable threshold to disable auto-down.
Configuration property could be a percentage of all nodes.
E.g. disable auto-down if unreachable / (unreachable + liveMembers) > 70%.
or enable auto-down only when unreachable / (unreachable + liveMembers) < 30%
Any personal preferences?
I think we need to introduce some delay of the auto-down action, otherwise it will not have the full picture of unreachable when taking this decision.
Configuration property could be a percentage of all nodes.
E.g. disable auto-down if unreachable / (unreachable + liveMembers) > 70%.
or enable auto-down only when unreachable / (unreachable + liveMembers) < 30%
Any personal preferences?
I think we need to introduce some delay of the auto-down action, otherwise it will not have the full picture of unreachable when taking this decision.
Leave a comment
on 2012-06-25 15:39 *
By Peter Vlugter
Might be interesting to see how Riak decided to auto-down. From memory, only explicit user down was supported there when we were thinking about this.
on 2012-06-25 16:17 *
By Jonas Bonér
Yeah. The safest is only allow user-down, but that would be limiting I think. If we can get auto-down to work then it would be great. We need more tests to verify.
on 2012-06-26 11:41 *
By Patrik Nordwall
We redefined how network partition should be handled. Is this still something we need?
Changing to lower prio, and we can think about it.
Changing to lower prio, and we can think about it.
This is probably important. We should discuss the split brain semantics with the whole team.
on 2012-06-27 04:49 *
By Jonas Bonér
Let's discuss tomorrow.
We discussed how to handle network partitions again, and the conclusion is that current implementation is fine. Changing priority to low, threshold for auto-down is something we can add later.