Skip to content

Add a fast path for hashing numerics and immediates#9102

Merged
headius merged 3 commits intojruby:10.1-devfrom
headius:fast-path-hashing
Dec 2, 2025
Merged

Add a fast path for hashing numerics and immediates#9102
headius merged 3 commits intojruby:10.1-devfrom
headius:fast-path-hashing

Conversation

@headius
Copy link
Copy Markdown
Member

@headius headius commented Dec 1, 2025

See #9086

This is an attempt to eliminate dynamic hash calls for all numeric and immediate values (nil, booleans). Additional changes to support this:

  • The per-runtime hash randomizer becomes per-process.
  • safeHash logic will no longer ever dispatch to the hash method for the fast path SimpleHash subtypes.

@headius headius force-pushed the fast-path-hashing branch 6 times, most recently from 3930eb9 to 3837ce6 Compare December 2, 2025 03:51
@headius
Copy link
Copy Markdown
Member Author

headius commented Dec 2, 2025

This is basically done. Several core types that can't normally be extended have been set up to do "simple hashing", and the safeHash logic detects that to avoid constructing unnecessary intermediate objects. SimpleHash objects encountered during this process will not be entered into the safeRecurse identity hashes and will provide their long hash values directly.

The following types implement SimpleHash:

  • Both RubyInteger subtypes, RubyFixnum and RubyBignum
  • RubyFloat
  • RubyBigDecimal
  • RubyNil
  • RubyBoolean, for true and false immediates

Any types that used safeHash now will try first to use simple hashing via safeHashLong, including:

  • RubyArray
  • RubyHash
  • RubyArithmeticSequence

To support randomized hashing, the seed has been promoted to classloader-global from runtime-global. Stable hashing can still be enabled globally, although it was not configurable on a per-runtime basis before. All runtimes in a given JVM will use the same randomized seed. This could be a larger concern, so I am not merging yet.

headius added a commit to headius/jruby that referenced this pull request Dec 2, 2025
See jruby#9086

This is an attempt to eliminate dynamic hash calls for all numeric
and immediate values (nil, booleans).

See jruby#9102 for more details on this implementation.
@headius
Copy link
Copy Markdown
Member Author

headius commented Dec 2, 2025

A quick benchmark shows the benefit of this change:

BEFORE:

$ jruby -rbenchmark -e "ARY = (1000..2000).to_a; loop { puts Benchmark.measure { i = 0; while i < 100_000; ARY.hash + ARY.hash + ARY.hash; i+=1; end } }"
 10.250000   0.320000  10.570000 ( 10.591724)
  7.280000   0.100000   7.380000 (  7.422632)
  5.760000   0.050000   5.810000 (  5.746896)
  5.380000   0.040000   5.420000 (  5.382854)
  5.300000   0.040000   5.340000 (  5.295419)

AFTER:

$ jruby -rbenchmark -e "ARY = (1000..2000).to_a; loop { puts Benchmark.measure { i = 0; while i < 100_000; ARY.hash + ARY.hash + ARY.hash; i+=1; end } }"              
  1.180000   0.010000   1.190000 (  1.120627)
  1.080000   0.000000   1.080000 (  1.062609)
  1.060000   0.010000   1.070000 (  1.058566)
  1.140000   0.010000   1.150000 (  1.062300)
  1.060000   0.000000   1.060000 (  1.058508)

This is very nearly as fast as CRuby with YJIT, but the Fixnum construction cost and other missing optimizations are likely to blame.

Further improvements could be done here by making more core types be simple hashable if their hash methods have not been overridden. Most of the core types could be optimized in this way to do no dynamic dispatch at all for hashing.

@headius
Copy link
Copy Markdown
Member Author

headius commented Dec 2, 2025

A naive implementation of fast-path hashing for String:

diff --git a/core/src/main/java/org/jruby/runtime/Helpers.java b/core/src/main/java/org/jruby/runtime/Helpers.java
index 5d39c27df1..bd252c6172 100644
--- a/core/src/main/java/org/jruby/runtime/Helpers.java
+++ b/core/src/main/java/org/jruby/runtime/Helpers.java
@@ -3123,8 +3123,12 @@ public class Helpers {
 
     // MRI: rb_hash but fast path for simple hashable objects
     public static long safeHashLong(final ThreadContext context, IRubyObject obj) {
-        if (obj instanceof SimpleHash) {
-            return safeHashLong((SimpleHash) obj);
+        if (obj instanceof SimpleHash simpleHash) {
+            return safeHashLong(simpleHash);
+        } else if (obj instanceof RubyString) {
+            if (sites(context).hash.isBuiltin(obj)) {
+                return obj.hashCode();
+            }
         }
 
         return safeHash(context, obj).getLongValue();

This improves performance of a string-based array hash benchmark many times and beats YJIT:

BEFORE:

$ jruby -rbenchmark -e "ARY = (1000..2000).to_a; loop { puts Benchmark.measure { i = 0; while i < 100_000; ARY.hash + ARY.hash + ARY.hash; i+=1; end } }"
 10.250000   0.320000  10.570000 ( 10.591724)
  7.280000   0.100000   7.380000 (  7.422632)
  5.760000   0.050000   5.810000 (  5.746896)
  5.380000   0.040000   5.420000 (  5.382854)
  5.300000   0.040000   5.340000 (  5.295419)

AFTER:

$ jruby -rbenchmark -e "ARY = (1000..2000).map(&:to_s).to_a; loop { puts Benchmark.measure { i = 0; while i < 100_000; ARY.hash + ARY.hash + ARY.hash; i+=1; end } }"
  2.040000   0.010000   2.050000 (  1.947957)
  2.630000   0.020000   2.650000 (  2.621692)
  2.390000   0.010000   2.400000 (  2.418628)
  2.010000   0.010000   2.020000 (  2.039710)
  2.010000   0.000000   2.010000 (  1.998742)
  2.000000   0.010000   2.010000 (  2.011736)
  2.000000   0.000000   2.000000 (  1.997365)
  1.970000   0.010000   1.980000 (  1.992914)
  2.000000   0.070000   2.070000 (  1.980395)
  1.830000   0.000000   1.830000 (  1.830372)
  1.800000   0.000000   1.800000 (  1.803007)
  1.800000   0.010000   1.810000 (  1.809211)
  1.810000   0.000000   1.810000 (  1.810937)
  1.850000   0.010000   1.860000 (  1.819400)

YJIT:

$ cx ruby-master ruby --yjit -rbenchmark -e "ARY = (1000..2000).map(&:to_s).to_a; loop { puts Benchmark.measure { i = 0; while i < 100_000; ARY.hash + ARY.hash + ARY.hash; i+=1; end } }"
  1.919594   0.006731   1.926325 (  1.947849)
  1.929497   0.003139   1.932636 (  1.938189)
  1.924403   0.002287   1.926690 (  1.930522)
  1.906535   0.002409   1.908944 (  1.913716)
  1.907520   0.005783   1.913303 (  1.926194)
  1.900510   0.004719   1.905229 (  1.911048)

(YJIT is not super relevant here... CRuby has similar optimizations in their core classes that we've simply never implemented.)

@headius
Copy link
Copy Markdown
Member Author

headius commented Dec 2, 2025

The logic in CRuby that bypasses dynamic calls to hash for several other core types:

https://github.com/ruby/ruby/blob/6014ed99680fd3beb013dd8564f81f6c2f2a9f34/hash.c#L170-L210

@headius
Copy link
Copy Markdown
Member Author

headius commented Dec 2, 2025

So it turns out CRuby doesn't dispatch to hash for either Symbol or String, even if you extend String and define your own hash method. This appears to have been the behavior for a very long time. Patch incoming.

@headius headius changed the base branch from master to 10.1-dev December 2, 2025 16:57
See jruby#9086

This is an attempt to eliminate dynamic hash calls for all numeric
and immediate values (nil, booleans).

See jruby#9102 for more details on this implementation.
CRuby appears to skip calling `hash` for Symbol and String in all
cases, including subclasses of String that replace the `hash`
method.
@headius headius marked this pull request as ready for review December 2, 2025 18:09
@headius headius merged commit b012f56 into jruby:10.1-dev Dec 2, 2025
77 checks passed
@headius headius deleted the fast-path-hashing branch December 2, 2025 18:13
@headius headius added this to the JRuby 10.1.0.0 milestone Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant