Create LeastCPU router

Load Average tells us if we should trust the CPU usage on the one hand and if the machine is overloaded on the other, but also if the shortage of CPU might impact the application negatively.

If the load average is the higher than the number of cores/processors, we should either see near 100% CPU utilization, and the lower they fall below the number of processors, the more untapped CPU capacity there is. But http://www.linuxjournal.com/article/9001?page=0,1

From https://github.com/akka/akka/blob/master/akka-cluster/src/main/scala/akka/cluster/ClusterReadView.scala via https://github.com/akka/akka/blob/master/akka-cluster/src/main/scala/akka/cluster/ClusterMetricsCollector.scala we get the following latest sampled metrics on the node (we do not track any of these data streams for volatility modeling because they are finite or averages already)

Metric("system-load-average"...)
SIGAR or JMX - the OS-specific system load average on the CPUs for the past 1 minute. Node: On some systems the JMX OS system load average may not be available, in which case the metric value is a None, but on the next sampling it could be a Some

Metric("processors"...)
JMX - the total number of processors

Metric("total-cores"...)
SIGAR - the total number of cores.

Metric("cpu-combined"..)
(SIGAR) the combined CPU sum of User + Sys + Nice + Wait, in percentage. This metric can describe the amount of time the CPU spent executing code during n-interval and how much more it could theoretically. Note that 99% CPU utilization can be optimal or indicative of failure. In the data stream, this will sometimes return with a valid metric value, and sometimes as a NaN or Infinite. Documented bug https://bugzilla.redhat.com/show_bug.cgi?id=749121 and several others.

A few possibly misleading cases:
- If there is one CPU on a vm and the one-minute load average is 1.00, the vm has been utilizing its processors perfectly for the last 60 seconds.
- If there are four CPUs on a vm and the one-minute load average is 4.00, the vm has been utilizing its processors perfectly for the last 60 seconds.

Old description:
Dependent on #940

Base it on internal API for metrics
Use Hyperic's Sigar

Check out: https://github.com/akka/akka/commit/3cee2fc8ec18f6e5aa61e714083b033c5ac21d38

Leave a comment

on 2011-06-16 12:41 *

By Jonas Bonér

Description changed from Dependent on #940 * Base i... to Dependent on #940 * Base i...

on 2011-06-25 16:41 *

By Jonas Bonér

Updating tickets (#769, #774, #875, #889, #917, #920, #928, #929, #930, #931, #932, #933, #934, #935, #936, #938, #939, #940, #941, #942, #943, #944, #951, #958, #959, #960, #962, #630, #870, #891, #895)

on 2011-06-25 17:02 *

By Jonas Bonér

Milestone changed from 2.0 to 2.1

on 2011-07-06 12:16 *

By Jonas Bonér

Assigned to changed from pveentjer to -none-

Updating tickets (#87, #620, #644, #679, #750, #752, #753, #754, #764, #875, #876, #929, #938, #939, #940, #941, #942, #943, #944, #953, #954, #977, #983, #987, #996, #630, #643, #725, #892, #893)

on 2011-10-03 14:57 *

By Jonas Bonér

Status changed from New to Fixed

Updating tickets (#941, #942, #943)

on 2012-04-24 14:38 *

By Jonas Bonér

Status changed from Fixed to New

Needs to be retrieved from Git history (if it is of any use, else reimplemented from scratch) and adapted to new clustering.

on 2012-06-12 12:31 *

By Jonas Bonér

Description changed from Dependent on #940 * Base i... to Dependent on #940 * Base i...

on 2012-06-12 12:34 *

By Patrik Nordwall

We have similar things in Atmos, so we will double check that with the impl in git history.

on 2012-06-12 12:35 *

By Jonas Bonér

Sounds good. Please double check and comment on the parent task #940.

on 2012-06-13 14:23 *

By Helena Edelson

Assigned to set to login

on 2012-09-28 13:22 *

By Patrik Nordwall

Priority changed from Normal (3) to High (2)

on 2012-09-30 19:18 *

By Helena Edelson

Description changed from Dependent on #940 * Base i... to In general, the idea of loa...

on 2012-09-30 19:24 *

By Helena Edelson

Description changed from In general, the idea of loa... to Load Average tells us if we...

on 2012-09-30 19:37 *

By Helena Edelson

Description changed from Load Average tells us if we... to Load Average tells us if we...

on 2012-09-30 19:37 *

By Helena Edelson

Description changed from Load Average tells us if we... to Load Average tells us if we...

on 2012-09-30 19:42 *

By Helena Edelson

Description changed from Load Average tells us if we... to Load Average tells us if we...

on 2012-10-18 20:49 *

By Helena Edelson

Status changed from New to Test

Pushed to https://github.com/helena/akka/commit/a6bf53df3ecca31fbb57b2c1ab1716f31f043a4e
If you are interested in it I can do a PR.

All preliminary metric work is completed
- creation of NodeMetricsComparator for ordering of (Address, Long/Double) values in question to iterate through the nodes based on available routees (see the load balancing router) and return the address with min/max depending
- creationg of sealed trait MetricValues and its impls of HeapMemory, NetworkLatency and CPU for clean extraction (conversion) of node.metric. particular metric (heap mem used, system load average, etc) and delegation to the cluster metrics api vs exposing in the cluster router api
- creation of MetricsAwareClusterNodeSelector for evaluation, extraction, and getting of the address of the node that fulfils the criteria of the load balancing router implementation in question.
- the above was created as a trait to allow for the created ClusterAdaptiveMetricsLoadBalancingRouter, which will be able to provide by all metrics vs just one.
- creation of MetricsAwareClusterNodeSelector for
- extraction of the data w/out metric logic in the router package

Creation of the following Router and Router Impls
- ClusterAdaptiveLoadBalancingRouterLike extends RoundRobinLike with LoadBalancer
Status: complete the strategy and getNext() algorithm for round robin selection based on healthiest node

Well stubbed out in MetricsAwareClusterNodeSelector:
- created CpuLoadBalancer - algorithm impl needed for systemLoadAverage / combinedCPU, processors, cores to produce the node address of healthiest by memory - see ClusterAdaptiveLoadBalancingRouterLike

on 2012-10-18 20:49 *

By Helena Edelson

Assigned to changed from Helena Edelson to -none-

Unassigning myself, will not have time to complete soon enough

on 2012-10-19 09:19 *

By Patrik Nordwall

Assigned to set to Patrik Nordwall

Status changed from Test to Accepted

on 2012-11-15 07:52 *

By Patrik Nordwall

Status changed from Accepted to Test

on 2012-12-01 08:38 *