Exclude unreachable observations from downed node in convergence check
Investigate if unreachable observations from downed node can be excluded in the convergence check.
Discussed in context of the failure in SurviveNetworkInstabilitySpec #3871
Discussed in context of the failure in SurviveNetworkInstabilitySpec #3871
Leave a comment
on 2014-02-15 03:01 *
By Patrik Nordwall
I took a deep dive into this. I wrote a failing test with partly disconnected nodes, which I got working. One scary implication is that I must remove 2 checks when receiving gossips:
- "Ignoring received gossip with myself as unreachable"
- "Ignoring received gossip from unreachable"
From the user perspective nodes should still not become reachable just because downing started, so if the above checks must be removed we must at least filter out Unreachable/Reachable events for selfAddress.
I like the idea, and we should explore it further (but it is too scary to change for 2.3.0 in my opinion).
- "Ignoring received gossip with myself as unreachable"
- "Ignoring received gossip from unreachable"
From the user perspective nodes should still not become reachable just because downing started, so if the above checks must be removed we must at least filter out Unreachable/Reachable events for selfAddress.
I like the idea, and we should explore it further (but it is too scary to change for 2.3.0 in my opinion).