Create topology-aware routers

Similar to Voldermort's Zones: https://github.com/voldemort/voldemort/wiki/Topology-awareness-capability
Would allow spreading the risk by routing to different instances on different racks or datacenters.

Also see Cassandras RackAwareStrategy: http://wiki.apache.org/cassandra/Operations#Replication
Cassandra's config is like this, with 'Data Center:Rack' mapped to host:port:

#Cassandra Node IP:Port=Data Center:Rack
192.168.1.200\:7000=dc1:r1
192.168.2.300\:7000=dc2:rA

Parsed by: http://svn.apache.org/repos/asf/cassandra/tags/cassandra-0.6.1/contrib/property_snitch/src/java/org/apache/cassandra/locator/PropertyFileEndPointSnitch.java

But I think I prefer Voldemort's scheme. Let's think about what is most useful in our env and come up with our own.

Leave a comment

on 2012-06-16 11:17 *

By Jonas Bonér

Description changed from Similar to Voldermort's Zon... to Similar to Voldermort's Zon...

on 2012-06-16 12:50 *

By Jonas Bonér

Description changed from Similar to Voldermort's Zon... to Similar to Voldermort's Zon...

on 2012-06-16 12:50 *

By Jonas Bonér

Description changed from Similar to Voldermort's Zon... to Similar to Voldermort's Zon...

on 2012-06-16 15:34 *

By Helena Edelson

(Comment removed)

on 2012-06-28 15:16 *

By Patrik Nordwall

Priority changed from Normal (3) to Low (4)

on 2012-11-01 12:46 *

By viktorklang

Priority changed from Low (4) to Normal (3)

on 2012-11-01 12:46 *

By viktorklang

Milestone changed from Coltrane to Rollins

on 2013-07-14 16:09 *

By Helena Edelson

Assigned to set to login

I'd like to do this feature, if I can steal it from Patrick and work with him.
Our use case related to global network topology and availability zones (AZ). I've reviewed the linked resources above.

on 2013-07-14 18:20 *

By Helena Edelson

Started a google doc to outline an initial proposal - will share link with Jonas/Patrik/Roland/Viktor when general ideas from linked resources and some real world use cases are incorporated on Monday.

on 2013-07-14 19:26 *

By Helena Edelson

Jonas - here's the most recent parser version of the above snitch link:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/PropertyFileSnitch.java

I recommend considering these conceptually well:
Cassandra's Ec2MultiRegionSnitch which Gossips EC2 Region
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/Ec2MultiRegionSnitch.java

Voldemort's Zone Routing Strategy:
https://github.com/voldemort/voldemort/blob/master/src/java/voldemort/routing/ZoneRoutingStrategy.java

Cassandra's Network topology snitch to route requests more efficiently, takes data center and rack into consideration
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/AbstractNetworkTopologySnitch.java

on 2013-07-15 07:29 *

By Patrik Nordwall

Sounds good

on 2013-07-15 14:34 *

By Helena Edelson

Assigned to set to Helena Edelson

Status changed from New to Accepted

on 2013-07-16 20:27 *

By Helena Edelson

I can start this in 2 weeks, will that work?

on 2013-08-10 17:37 *

By Helena Edelson

I finally have a moment free to start this. Here is an initial framework concept stubbed out, minus any topology logic yet, just the general collection / gossip / delivery / router structures / selectors https://github.com/helena/akka/commit/f9fc258da3c221f0f161f6a79eff70aba2ba280d

Let me know if this is the right strategy for the routers WRT holding the logic necessary.
- Helena

on 2013-08-10 17:42 *

By Patrik Nordwall

Great, I will look at it next week

on 2013-08-13 09:22 *

By Patrik Nordwall

Wrote some comments/questions in https://github.com/helena/akka/commit/f9fc258da3c221f0f161f6a79eff70aba2ba280d

on 2013-08-13 11:45 *

By Helena Edelson

(Comment removed)

on 2013-08-29 19:27 *

By Helena Edelson

(Comment removed)

on 2013-08-30 16:03 *

By Helena Edelson

(Comment removed)

on 2013-09-06 00:33 *

By Helena Edelson

This is the initial concept describing the Network Topology API, which will be used by the TopologyAwareRouter to select nearest neighbors.
https://github.com/helena/akka/commit/45956d1d4737cc75ced6b9301d71a4ee4e5adc9d

on 2013-09-09 10:44 *

By Patrik Nordwall

The two typical use cases for the router, stolen from Voldemort (https://github.com/voldemort/voldemort/wiki/Topology-awareness-capability):
1) "Now due to possible high latency between these zones we want the routing strategy in the client to optimize for the closest zones."
2) "Now since these zones may be geographically distributed we can expect network partitions to be very common. Hence we would like every store to have some amount of redundancy in every zone."

I looked at the new suggestion, and talked a bit with the other guys at the office.
I'm not sure what is the exact best way to define the things in the configuration.
I think we have seen two alternatives so far:
1) flat structure of zones with proximity-list
2) hierarchical structure, like EC2

I'm not sure which one is best. Perhaps the hierarchical is more convenient after all, but then it should not be fixed to 2 levels (regions, availability zones).

In addition to the host name (wildcard) matching we could later add matching of cluster node role, which makes it possible at startup of a node to "place" it in a zone without depending on a specific hostname schema.

This feature should be born in the contrib area, since we want to be able to evolve it more easily based on usage feedback.
http://doc.akka.io/docs/akka/2.2.1/contrib/index.html

on 2013-09-09 11:26 *

By Patrik Nordwall

Milestone changed from Rollins to Current

on 2013-09-09 14:07 *

By Helena Edelson

Yes:
1) high latency between zones - awareness of nearest-neighbor (data center, availability zones..)
2) redundancy in every zone - this part is done in terms of failover: instances per data center, data centers per geographical region

What it defines now is:
per geographical partition: Region > DataCenters and their proximal DCs > for each DC their deployed instances (so that we can map this instance (cluster node) to its DataCenter - i added this yesterday but it is not ready to push
This also allows mapping only those instances per DataCenter that are up, and to apply the failure detector strategy to know when a data center goes down, and remove it from the proximal dcs for a given dc, and periodically check for when it comes back online.

That all said, this has to cover both cloud, rack, and neither of the two user types - i.e. the non-cloud singular location deployment in one data center.

The latter case means I could add a configuration to describe a flat topology, but even so that could fall into what we have thus far:

topology {
  # One region (geographical partition) - all local to a NOC
  paris-dc-1 {
      zone-id = 0
      #  List these in order of proximity as this implicitly becomes their weight.
      # defaults to proximal-to = []
      proximal-to = [1]
      instances = [....]
    }
   # Same location, multiple dcs
   paris-dc-2 {
      zone-id = 1 
      proximal-to = [0]
      instances = [....]
    }
}

If just one datacenter, failovers are to deployed instances which we can map to in the cluster for those that have joined.
Distilled, this ultimately needs to be flexible but hierarchical IMO.

Optimally, it would be nice to offer a configurable strategy type - by data center or by rack, but i think the light partition wrapper around these of by region is important and will become more so over time.
This could simply be:

config match {
  case "rack" => RackStrategy
  case "dataCenter" => DataCenterStrategy
}

on 2013-09-09 14:08 *

By Helena Edelson

I see that contrib has the dep for cluster, I will move it there.

on 2013-09-09 14:35 *

By Helena Edelson

I see common logic for node awareness between the new ClusterNetworkTopology and the existing ClusterMetricsCollector actors.
I'm going to push to github and idea that describes a new ClusterMemberAware base trait for actors to mixin, where the watching of the Up/Downed etc node logic would reside - what is currently in the ClusterMetricsCollector, so that any can leverage that without duplication.

The ClusterNetworkTopology would use this to track nodes that are up in its DC and pass these to the topology-aware router.
It would provide a flexible topology vs a static one.
Mapping to node roles is actually almost done in my branch.

- Helena

on 2013-09-10 15:50 *

By Patrik Nordwall

Sounds like you have a good plan.