Replace the PhiAccrualDetector with a simple timeout detector
The phi accrual detector does not bring anything on the table.
- according to measurements, heartbeats delays cannot be assumed to be normal in general
- most of our advice boils down to tweaking the acceptable-heartbeat-pause parameter
- the original promise of the paper -- probabilities being easier to use compared to plain timeouts -- is not true. Most people are not interested whether it is 99.66% probable that the system is up, they are interested how much time it takes for that system to respond to requests
I recommend using a simple timeout based detector, which reduces the tunable parameters to one, reducing configuration confusion as a result. Since FDs are pluggable, people can drop in whatever they want for custom behavior.
- according to measurements, heartbeats delays cannot be assumed to be normal in general
- most of our advice boils down to tweaking the acceptable-heartbeat-pause parameter
- the original promise of the paper -- probabilities being easier to use compared to plain timeouts -- is not true. Most people are not interested whether it is 99.66% probable that the system is up, they are interested how much time it takes for that system to respond to requests
I recommend using a simple timeout based detector, which reduces the tunable parameters to one, reducing configuration confusion as a result. Since FDs are pluggable, people can drop in whatever they want for custom behavior.
Leave a comment
on 2013-11-22 07:36 *
By Patrik Nordwall
I don't really disagree, but a possible usage for **some** users of cluster FD with the new back from unreachable would be to have a low (or zero) acceptable-heartbeat-pause to be able to detect failures quickly with acceptable false positives.
on 2013-12-03 10:43 *
By viktorklang
I think Accrual FD will be more interesting once we mix trust into the picture.
since FDs are pluggable, the user can choose whatever FD he/she wants.
Yes, the question here is the recommended default. Since most of the actual use of the Phi accrual failure detector boils down to tuning the acceptable-heartbeat-pause we can make the default configuration easier by having a simple threshold detector as default. I don't see this ticket as invalid.
on 2014-02-28 06:46 *
By viktorklang
It's outside the scope of what we are going to deliver out of the box right now.
Feel free to open it and move it to community contributions.
Feel free to open it and move it to community contributions.