Add a way of transporting out relevant metrics info to the different metrics dependent routers
Dependent on #939
1. Use a scheme similar to the AccrualFailureDetector to detect anomalies in the node regarding the different stats to track (CPU, MEMORY, MBOX etc).
2. Only gossip out new information if an anomaly has been detected on the local node
3. The info gossiped out should NOT be the stats but only a FLAG on what info has been detected out of the ordinary
4. The nodes that are running actors on the node that have gossiped out the anomaly will now pull the stats from the node and take action
1. Use a scheme similar to the AccrualFailureDetector to detect anomalies in the node regarding the different stats to track (CPU, MEMORY, MBOX etc).
2. Only gossip out new information if an anomaly has been detected on the local node
3. The info gossiped out should NOT be the stats but only a FLAG on what info has been detected out of the ordinary
4. The nodes that are running actors on the node that have gossiped out the anomaly will now pull the stats from the node and take action
Leave a comment
on 2011-09-22 18:52 *
By vasil.remeniuk
Draft version of LeastMem, LeastCPU and LeastMessages router + tests is ready: https://github.com/jboner/akka/tree/wip-940-remeniuk. Haven't rebased from master yet, since akka-cluster module is temporarily set on hold in master (currently it doesn't compile due to extraction of akka-remote, and some other changes).
on 2011-09-23 06:31 *
By Patrik Nordwall
Isn't it more relevant to use "load average for the past 1 minute", which is available in Sigar API with
sigar.getLoadAverage()(0)
instead of using
sigar.getCpuPerc.getCombined
?
https://github.com/jboner/akka/blob/wip-940-remeniuk/akka-cluster/src/main/scala/akka/cluster/metrics/MetricsProvider.scala#L131
sigar.getLoadAverage()(0)
instead of using
sigar.getCpuPerc.getCombined
?
https://github.com/jboner/akka/blob/wip-940-remeniuk/akka-cluster/src/main/scala/akka/cluster/metrics/MetricsProvider.scala#L131
on 2011-09-23 08:03 *
By vasil.remeniuk
Sigar#getLoadAverage uses JMX behind the scenes. I don't remember the details, but it either throws an exception on Windows, or returns -1.
on 2012-04-24 12:35 *
By Jonas Bonér
Assigned to changed from vasil.remeniuk to -none-
Status changed from Fixed to New
Needs to be changed to use the gossiping protocol (Gossip.meta map).
on 2012-06-18 09:35 *
By Jonas Bonér
Description changed from Dependent on #939
* P... to Dependent on #939
1. Use a...
on 2012-07-07 20:18 *
By Helena Edelson
(Comment removed)
Modifying the old design from zookeeper to new cluster and Gossip protocol
on 2012-07-13 15:48 *
By Helena Edelson
Question on point 2. Only gossip out new information if an anomaly has been detected on the local node:
Who to gossip to, all members are to receive metrics meta?
Who to gossip to, all members are to receive metrics meta?
on 2012-07-13 16:11 *
By Patrik Nordwall
It doesnt matter who to gossip to. It will eventually be spread to everyone. I thought all nodes might be interested in metrics of all other nodes.
on 2012-07-18 18:46 *
By Helena Edelson
In reviewing the reqs and an email from Jonas, any metrics anomalies detected on a node should be added to gossip.meta, correct?
Currently, this is meta: Map[String, Array[Byte]]
It's not an ideal way to store metrics meta (the node address the anomalies were found on with the anomaly type identifiers) but can work abstractly to keep it light.
Currently, this is meta: Map[String, Array[Byte]]
It's not an ideal way to store metrics meta (the node address the anomalies were found on with the anomaly type identifiers) but can work abstractly to keep it light.
on 2012-07-18 19:07 *
By Helena Edelson
(Comment removed)
on 2012-07-18 20:43 *
By Patrik Nordwall
Yes, separate metrics task is better.
/Patrik
/Patrik
on 2012-09-24 23:35 *
By Helena Edelson
The transport mechanism is included in the merge for the cluster metrics api.
I would suggest decoupling these router tickets and closing this, and create a new cluster LB/metric router parent
I would suggest decoupling these router tickets and closing this, and create a new cluster LB/metric router parent
on 2012-09-24 23:36 *
By Helena Edelson
also much of the description for this ticket is outdated :-)
Updating tickets (#939, #940, #1941, #2213, #2214, #2215, #2219, #2222, #2223, #2239, #2240, #2249, #2250, #2252, #2253, #2254, #2256, #2259, #2263, #2264, #2265, #2267, #2270, #2271, #2275, #2277, #2286, #2287, #2289, #2290, #2303, #2304, #2308, #2310, #2311, #2317, #2323, #2331, #2374, #2392, #2405, #2423, #2425, #2440, #2444, #2445, #2453, #2456, #2459, #2473, #2477, #2491, #2495, #2523, #2534, #2541, #2544, #2545, #2549, #2582, #2583, #2589, #2626)
Updating tickets (#939, #940, #1941, #2081, #2126, #2213, #2214, #2215, #2219, #2222, #2223, #2239, #2240, #2249, #2250, #2252, #2253, #2254, #2256, #2259, #2263, #2264, #2265, #2267, #2270, #2271, #2275, #2277, #2286, #2287, #2289, #2290, #2303, #2304, #2308, #2310, #2311, #2317, #2323, #2331, #2374, #2392, #2394, #2405, #2408, #2423, #2424, #2425, #2440, #2444, #2445, #2449, #2453, #2456, #2459, #2461, #2473, #2477, #2485, #2491, #2495, #2498, #2501, #2505, #2515, #2517, #2523, #2534, #2541, #2544, #2545, #2549, #2582, #2583, #2588, #2589, #2598, #2599, #2618, #2623, #2626, #2627, #2630, #2631, #2633, #2634, #2635, #2637, #2638, #2642, #2643, #2646, #2647, #2648, #2649, #2650, #2653, #2655, #2657, #2658)