ConvergenceSpec failed
http://217.142.157.108:8080/job/akka-multi-node/95/
Might be due to fourth becomes leader, and moves himself to Up without waiting for convergence?
Might be due to fourth becomes leader, and moves himself to Up without waiting for convergence?
Leave a comment
on 2012-06-08 06:42 *
By Jonas Bonér
Fails due to nodes not marked as unreachable by the FD. Awaiting Patrik's fix.
on 2012-06-08 07:20 *
By Patrik Nordwall
should be fixed now then
on 2012-06-14 01:52 *
By Patrik Nordwall
This failed when I run akka.cluster.ConvergenceWithFailureDetectorPuppet. I have debug log (attached). It's in there somewhere...
on 2012-06-14 02:25 *
By Patrik Nordwall
I think this failure is also due to wrong gossip merging.
fourth is first singleton cluster, i.e. Up.
fourth joins first
fourth receives conflicting gossip
As you see the resulting gossip contains fourth with status Up as member, i.e. fourth becomes member without really passing the ordinary leader transition phases.
Let's discuss...
fourth is first singleton cluster, i.e. Up.
fourth joins first
fourth receives conflicting gossip
[fourth] [INFO] [06/14/2012 08:19:18.973] [ConvergenceSpec-akka.actor.default-dispatcher-3] [Node(akka://ConvergenceSpec)] Can't establish a causal relationship between
"remote" gossip [Gossip(overview = GossipOverview(seen = [akka://ConvergenceSpec@third -> VectorClock(Node(266d4125d05511b21390a70cf021ffc5) -> 00000137e9a55b5a, Node(84eaec785751a55240075d71f7e29938) -> 00000137e9a5588c, Node(c747d1bd83d00e993fcd418f2e67a10f) -> 00000137e9a55919), akka://ConvergenceSpec@second -> VectorClock(Node(266d4125d05511b21390a70cf021ffc5) -> 00000137e9a56098, Node(84eaec785751a55240075d71f7e29938) -> 00000137e9a5588c, Node(c747d1bd83d00e993fcd418f2e67a10f) -> 00000137e9a55919), akka://ConvergenceSpec@first -> VectorClock(Node(266d4125d05511b21390a70cf021ffc5) -> 00000137e9a56098, Node(84eaec785751a55240075d71f7e29938) -> 00000137e9a5588c, Node(c747d1bd83d00e993fcd418f2e67a10f) -> 00000137e9a55919)], unreachable = [Member(address = akka://ConvergenceSpec@third, status = Up)]), members = [Member(address = akka://ConvergenceSpec@first, status = Up), Member(address = akka://ConvergenceSpec@second, status = Up), Member(address = akka://ConvergenceSpec@fourth, status = Joining)], meta = [], version = VectorClock(Node(266d4125d05511b21390a70cf021ffc5) -> 00000137e9a56098, Node(84eaec785751a55240075d71f7e29938) -> 00000137e9a5588c, Node(c747d1bd83d00e993fcd418f2e67a10f) -> 00000137e9a55919))] and
"local" gossip [Gossip(overview = GossipOverview(seen = [akka://ConvergenceSpec@fourth -> VectorClock(Node(78b82d21ce185801aa210dc681a06b00) -> 00000137e9a5619a)], unreachable = []), members = [Member(address = akka://ConvergenceSpec@fourth, status = Up)], meta = [], version = VectorClock(Node(78b82d21ce185801aa210dc681a06b00) -> 00000137e9a5619a))] -
merging them into [Gossip(overview = GossipOverview(seen = [], unreachable = [Member(address = akka://ConvergenceSpec@third, status = Up)]), members = [Member(address = akka://ConvergenceSpec@first, status = Up), Member(address = akka://ConvergenceSpec@second, status = Up), Member(address = akka://ConvergenceSpec@fourth, status = Up)], meta = [], version = VectorClock(Node(266d4125d05511b21390a70cf021ffc5) -> 00000137e9a56098, Node(84eaec785751a55240075d71f7e29938) -> 00000137e9a5588c, Node(78b82d21ce185801aa210dc681a06b00) -> 00000137e9a5623b, Node(c747d1bd83d00e993fcd418f2e67a10f) -> 00000137e9a55919))]
As you see the resulting gossip contains fourth with status Up as member, i.e. fourth becomes member without really passing the ordinary leader transition phases.
Let's discuss...
on 2012-06-14 02:44 *
By Jonas Bonér
Very good analysis Patrik. You found it. Now let's nail the sucker.
on 2012-06-15 06:46 *
By Patrik Nordwall
(In revision:f7a01505baedf47be473874097bc8f995ba9311b) Correction of gossip merge when joining, see #2204
The problem:
The solution:
Branch: wip-2223-step-by-step-patriknw
The problem:
- Node that is Up joins a cluster and becomes Joining in that cluster
- The joining node receives gossip, which results in conflict,
- It became Up in the new cluster without passing the ordinary leader
The solution:
- Change priority order of Up and Joining so that Joining is used when
Branch: wip-2223-step-by-step-patriknw
on 2012-06-15 10:13 *
By Patrik Nordwall
(In revision:f7a01505baedf47be473874097bc8f995ba9311b) Correction of gossip merge when joining, see #2204
The problem:
The solution:
Branch: master
The problem:
- Node that is Up joins a cluster and becomes Joining in that cluster
- The joining node receives gossip, which results in conflict,
- It became Up in the new cluster without passing the ordinary leader
The solution:
- Change priority order of Up and Joining so that Joining is used when
Branch: master
on 2012-06-15 10:46 *
By Patrik Nordwall
(In revision:f7a01505baedf47be473874097bc8f995ba9311b) Correction of gossip merge when joining, see #2204
The problem:
The solution:
Branch: wip-2201-cache-node-lookup-patriknw
The problem:
- Node that is Up joins a cluster and becomes Joining in that cluster
- The joining node receives gossip, which results in conflict,
- It became Up in the new cluster without passing the ordinary leader
The solution:
- Change priority order of Up and Joining so that Joining is used when
Branch: wip-2201-cache-node-lookup-patriknw
on 2012-06-15 17:52 *
By Patrik Nordwall
(In revision:f7a01505baedf47be473874097bc8f995ba9311b) Correction of gossip merge when joining, see #2204
The problem:
The solution:
Branch: wip-2162-redesign-of-management-of-the-exiting-to-removed-life-cycle-jboner
The problem:
- Node that is Up joins a cluster and becomes Joining in that cluster
- The joining node receives gossip, which results in conflict,
- It became Up in the new cluster without passing the ordinary leader
The solution:
- Change priority order of Up and Joining so that Joining is used when
Branch: wip-2162-redesign-of-management-of-the-exiting-to-removed-life-cycle-jboner
on 2012-06-19 09:34 *
By Patrik Nordwall
(In revision:f7a01505baedf47be473874097bc8f995ba9311b) Correction of gossip merge when joining, see #2204
The problem:
The solution:
Branch: wip-2218-test-conductor-barrier-timeouts
The problem:
- Node that is Up joins a cluster and becomes Joining in that cluster
- The joining node receives gossip, which results in conflict,
- It became Up in the new cluster without passing the ordinary leader
The solution:
- Change priority order of Up and Joining so that Joining is used when
Branch: wip-2218-test-conductor-barrier-timeouts
on 2012-06-20 04:31 *
By Patrik Nordwall
(In revision:f7a01505baedf47be473874097bc8f995ba9311b) Correction of gossip merge when joining, see #2204
The problem:
The solution:
Branch: wip-scala210M4-√
The problem:
- Node that is Up joins a cluster and becomes Joining in that cluster
- The joining node receives gossip, which results in conflict,
- It became Up in the new cluster without passing the ordinary leader
The solution:
- Change priority order of Up and Joining so that Joining is used when
Branch: wip-scala210M4-√
Updating tickets (#620, #679, #725, #750, #752, #753, #754, #763, #789, #870, #893, #922, #953, #954, #971, #977, #983, #985, #987, #991, #1026, #1045, #1051, #1060, #1061, #1084, #1098, #1099, #1133, #1134, #1135, #1136, #1137, #1194, #1225, #1226, #1243, #1245, #1247, #1248, #1254, #1261, #1300, #1317, #1391, #1412, #1791, #1793, #1901, #1908, #1911, #1912, #1913, #1914, #1915, #1916, #1917, #1922, #1983, #1987, #1996, #1997, #1998, #2066, #2077, #2105, #2117, #2133, #2143, #2149, #2151, #2152, #2153, #2155, #2157, #2158, #2159, #2160, #2161, #2162, #2163, #2164, #2165, #2167, #2171, #2175, #2176, #2177, #2180, #2182, #2184, #2185, #2193, #2199, #2202, #2204, #2206, #2207, #2209, #2210)