Create LeastCPU router
Load Average tells us if we should trust the CPU usage on the one hand and if the machine is overloaded on the other, but also if the shortage of CPU might impact the application negatively.
If the load average is the higher than the number of cores/processors, we should either see near 100% CPU utilization, and the lower they fall below the number of processors, the more untapped CPU capacity there is. But http://www.linuxjournal.com/article/9001?page=0,1
From https://github.com/akka/akka/blob/master/akka-cluster/src/main/scala/akka/cluster/ClusterReadView.scala via https://github.com/akka/akka/blob/master/akka-cluster/src/main/scala/akka/cluster/ClusterMetricsCollector.scala we get the following latest sampled metrics on the node (we do not track any of these data streams for volatility modeling because they are finite or averages already)
Metric("system-load-average"...)
SIGAR or JMX - the OS-specific system load average on the CPUs for the past 1 minute. Node: On some systems the JMX OS system load average may not be available, in which case the metric value is a None, but on the next sampling it could be a Some
Metric("processors"...)
JMX - the total number of processors
Metric("total-cores"...)
SIGAR - the total number of cores.
Metric("cpu-combined"..)
(SIGAR) the combined CPU sum of User + Sys + Nice + Wait, in percentage. This metric can describe the amount of time the CPU spent executing code during n-interval and how much more it could theoretically. Note that 99% CPU utilization can be optimal or indicative of failure. In the data stream, this will sometimes return with a valid metric value, and sometimes as a NaN or Infinite. Documented bug https://bugzilla.redhat.com/show_bug.cgi?id=749121 and several others.
A few possibly misleading cases:
- If there is one CPU on a vm and the one-minute load average is 1.00, the vm has been utilizing its processors perfectly for the last 60 seconds.
- If there are four CPUs on a vm and the one-minute load average is 4.00, the vm has been utilizing its processors perfectly for the last 60 seconds.
Old description:
Dependent on #940
If the load average is the higher than the number of cores/processors, we should either see near 100% CPU utilization, and the lower they fall below the number of processors, the more untapped CPU capacity there is. But http://www.linuxjournal.com/article/9001?page=0,1
From https://github.com/akka/akka/blob/master/akka-cluster/src/main/scala/akka/cluster/ClusterReadView.scala via https://github.com/akka/akka/blob/master/akka-cluster/src/main/scala/akka/cluster/ClusterMetricsCollector.scala we get the following latest sampled metrics on the node (we do not track any of these data streams for volatility modeling because they are finite or averages already)
Metric("system-load-average"...)
SIGAR or JMX - the OS-specific system load average on the CPUs for the past 1 minute. Node: On some systems the JMX OS system load average may not be available, in which case the metric value is a None, but on the next sampling it could be a Some
Metric("processors"...)
JMX - the total number of processors
Metric("total-cores"...)
SIGAR - the total number of cores.
Metric("cpu-combined"..)
(SIGAR) the combined CPU sum of User + Sys + Nice + Wait, in percentage. This metric can describe the amount of time the CPU spent executing code during n-interval and how much more it could theoretically. Note that 99% CPU utilization can be optimal or indicative of failure. In the data stream, this will sometimes return with a valid metric value, and sometimes as a NaN or Infinite. Documented bug https://bugzilla.redhat.com/show_bug.cgi?id=749121 and several others.
A few possibly misleading cases:
- If there is one CPU on a vm and the one-minute load average is 1.00, the vm has been utilizing its processors perfectly for the last 60 seconds.
- If there are four CPUs on a vm and the one-minute load average is 4.00, the vm has been utilizing its processors perfectly for the last 60 seconds.
Old description:
Dependent on #940
- Base it on internal API for metrics
- Use Hyperic's Sigar
Leave a comment
on 2011-06-16 12:41 *
By Jonas Bonér
Description changed from Dependent on #940
* Base i... to Dependent on #940
* Base i...
Needs to be retrieved from Git history (if it is of any use, else reimplemented from scratch) and adapted to new clustering.
on 2012-06-12 12:31 *
By Jonas Bonér
Description changed from Dependent on #940
* Base i... to Dependent on #940
* Base i...
on 2012-06-12 12:34 *
By Patrik Nordwall
We have similar things in Atmos, so we will double check that with the impl in git history.
on 2012-06-12 12:35 *
By Jonas Bonér
Sounds good. Please double check and comment on the parent task #940.
on 2012-09-30 19:18 *
By Helena Edelson
Description changed from Dependent on #940
* Base i... to In general, the idea of loa...
on 2012-09-30 19:24 *
By Helena Edelson
Description changed from In general, the idea of loa... to Load Average tells us if we...
on 2012-09-30 19:37 *
By Helena Edelson
Description changed from Load Average tells us if we... to Load Average tells us if we...
on 2012-09-30 19:37 *
By Helena Edelson
Description changed from Load Average tells us if we... to Load Average tells us if we...
on 2012-09-30 19:42 *
By Helena Edelson
Description changed from Load Average tells us if we... to Load Average tells us if we...
Pushed to https://github.com/helena/akka/commit/a6bf53df3ecca31fbb57b2c1ab1716f31f043a4e
If you are interested in it I can do a PR.
All preliminary metric work is completed
- creation of NodeMetricsComparator for ordering of (Address, Long/Double) values in question to iterate through the nodes based on available routees (see the load balancing router) and return the address with min/max depending
- creationg of sealed trait MetricValues and its impls of HeapMemory, NetworkLatency and CPU for clean extraction (conversion) of node.metric. particular metric (heap mem used, system load average, etc) and delegation to the cluster metrics api vs exposing in the cluster router api
- creation of MetricsAwareClusterNodeSelector for evaluation, extraction, and getting of the address of the node that fulfils the criteria of the load balancing router implementation in question.
- the above was created as a trait to allow for the created ClusterAdaptiveMetricsLoadBalancingRouter, which will be able to provide by all metrics vs just one.
- creation of MetricsAwareClusterNodeSelector for
- extraction of the data w/out metric logic in the router package
Creation of the following Router and Router Impls
- ClusterAdaptiveLoadBalancingRouterLike extends RoundRobinLike with LoadBalancer
Status: complete the strategy and getNext() algorithm for round robin selection based on healthiest node
Well stubbed out in MetricsAwareClusterNodeSelector:
- created CpuLoadBalancer - algorithm impl needed for systemLoadAverage / combinedCPU, processors, cores to produce the node address of healthiest by memory - see ClusterAdaptiveLoadBalancingRouterLike
If you are interested in it I can do a PR.
All preliminary metric work is completed
- creation of NodeMetricsComparator for ordering of (Address, Long/Double) values in question to iterate through the nodes based on available routees (see the load balancing router) and return the address with min/max depending
- creationg of sealed trait MetricValues and its impls of HeapMemory, NetworkLatency and CPU for clean extraction (conversion) of node.metric. particular metric (heap mem used, system load average, etc) and delegation to the cluster metrics api vs exposing in the cluster router api
- creation of MetricsAwareClusterNodeSelector for evaluation, extraction, and getting of the address of the node that fulfils the criteria of the load balancing router implementation in question.
- the above was created as a trait to allow for the created ClusterAdaptiveMetricsLoadBalancingRouter, which will be able to provide by all metrics vs just one.
- creation of MetricsAwareClusterNodeSelector for
- extraction of the data w/out metric logic in the router package
Creation of the following Router and Router Impls
- ClusterAdaptiveLoadBalancingRouterLike extends RoundRobinLike with LoadBalancer
Status: complete the strategy and getNext() algorithm for round robin selection based on healthiest node
Well stubbed out in MetricsAwareClusterNodeSelector:
- created CpuLoadBalancer - algorithm impl needed for systemLoadAverage / combinedCPU, processors, cores to produce the node address of healthiest by memory - see ClusterAdaptiveLoadBalancingRouterLike
Unassigning myself, will not have time to complete soon enough
on 2012-10-19 09:19 *
By Patrik Nordwall
Assigned to set to Patrik Nordwall
Status changed from Test to Accepted