Cluster usage of SecureRandom is slow
Change to ThreadLocalRandom
Leave a comment
on 2012-05-30 20:53 *
By Jonas Bonér
It actually is thread safe (java.util.Random as well).
But using ThreadLocalRandom is better anyway since it reduces contention.
Good.
But using ThreadLocalRandom is better anyway since it reduces contention.
Good.
on 2012-05-30 23:08 *
By Patrik Nordwall
ah, that is true, then I'm puzzled, I changed to ThreadLocalRandom and it solved the gossiping failure that I have been debugging on scalable1 (the 20 second delay that I showed you). Perhaps it's a heisenbug, but the tests runs fine on scalable1 now with the ThreadLocalRandom. I have started a script running all remote and cluster tests many times in a row, we'll see the result later.
on 2012-05-31 00:17 *
By Patrik Nordwall
The problem is totally reproducible on scalable1 (linux). I verified again that it is the change to ThreadLocalRandom, commit cd8e0ab3, that solves it. I think it is due to that SHA1PRNG is very slow (on linux). In tests we do frequent gossiping. See for example http://stackoverflow.com/questions/137212/how-to-solve-performance-problem-with-java-securerandom
on 2012-05-31 00:17 *
By Patrik Nordwall
Summary changed from Cluster usage of SecureRandom is not thread safe to Cluster usage of SecureRandom is slow
on 2012-05-31 00:34 *
By Jonas Bonér
Ah, damn.
This is what happens with bad memory. I'm getting old. I ran into the exact same problem some time ago, debugged it and fixed it.
See this commit message:
This is what happens with bad memory. I'm getting old. I ran into the exact same problem some time ago, debugged it and fixed it.
See this commit message:
commit ec7772b7869a492f9e37d4fa0298de81a504a5cd
Author: Jonas Bonér <jonas@jonasboner.com>
Date: Fri Feb 3 14:55:16 2012 +0100
Fixes bug in RandomRouter.
Fixes an interesting "bug" in RandomRouter. Tests failed on my 12 core Linux box. After some investigation I found that it hanged randomly inside the SecureRandom seed generator.
[JVM-Node4] "main" prio=10 tid=0x0000000001701000 nid=0x1942 runnable [0x00007fee631dc000]
[JVM-Node4] java.lang.Thread.State: RUNNABLE
[JVM-Node4] at java.io.FileInputStream.readBytes(Native Method)
[JVM-Node4] at java.io.FileInputStream.read(FileInputStream.java:236)
[JVM-Node4] at sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedBytes(SeedGenerator.java:509)
[JVM-Node4] at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:135)
[JVM-Node4] at sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:131)
[JVM-Node4] at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:188)
[JVM-Node4] - locked <0x00000007c3d84130> (a sun.security.provider.SecureRandom)
[JVM-Node4] at java.security.SecureRandom.nextBytes(SecureRandom.java:450)
[JVM-Node4] - locked <0x00000007c3d843d0> (a java.security.SecureRandom)
[JVM-Node4] at java.security.SecureRandom.next(SecureRandom.java:472)
[JVM-Node4] at java.util.Random.nextInt(Random.java:272)
[JVM-Node4] at akka.routing.RandomLike$class.getNext$2(Routing.scala:466)
Puzzled at first I Googled the problem and found this bug report: http://bugs.sun.com/view_bug.do?bug_id=6521844
In short it is designed to block on /dev/random (on Linux) when the entropy pool is empty until some "environmental noise is gathered".
From the Linux manual:
"Hanging at generateSeed is not a bug, since that's what was designed:
When the entropy pool is empty, reads from /dev/random will block until
additional environmental noise is gathered.
(Source: Linux Programmer's Manual, section 4)"
Fix was to switch to java.util.Random.
Fun one
on 2012-05-31 02:42 *
By viktorklang
You still look like you're 25 though!
on 2012-05-31 18:45 *
By Jonas Bonér
Ah, that warms my heart.
Liar.
Liar.
on 2012-06-04 17:28 *
By Patrik Nordwall
(In revision:cba5c9b27a2076bb3909d5f057863bd2abb1a042) Merge pull request #497 from akka/wip-2123-cluster-random-patriknw
Cluster usage of SecureRandom is slow, see #2153
Branch: wip-2134-deathwatch2.0-√
Cluster usage of SecureRandom is slow, see #2153
Branch: wip-2134-deathwatch2.0-√
Updating tickets (#620, #679, #725, #750, #752, #753, #754, #763, #789, #870, #893, #922, #953, #954, #971, #977, #983, #985, #987, #991, #1026, #1045, #1051, #1060, #1061, #1084, #1098, #1099, #1133, #1134, #1135, #1136, #1137, #1194, #1225, #1226, #1243, #1245, #1247, #1248, #1254, #1261, #1300, #1317, #1391, #1412, #1791, #1793, #1901, #1908, #1911, #1912, #1913, #1914, #1915, #1916, #1917, #1922, #1983, #1987, #1996, #1997, #1998, #2066, #2077, #2105, #2117, #2133, #2143, #2149, #2151, #2152, #2153, #2155, #2157, #2158, #2159, #2160, #2161, #2162, #2163, #2164, #2165, #2167, #2171, #2175, #2176, #2177, #2180, #2182, #2184, #2185, #2193, #2199, #2202, #2204, #2206, #2207, #2209, #2210)