Skip to content

Fast builtin validation experiment#9123

Closed
headius wants to merge 2 commits intojruby:10.1-devfrom
headius:fast_builtin_validation
Closed

Fast builtin validation experiment#9123
headius wants to merge 2 commits intojruby:10.1-devfrom
headius:fast_builtin_validation

Conversation

@headius
Copy link
Copy Markdown
Member

@headius headius commented Dec 8, 2025

Experimental implementation of fast built-in method invalidation bits, similar to CRuby's implementation.

See #9119.

@headius headius added this to the JRuby 10.1.0.0 milestone Dec 8, 2025
@headius headius changed the base branch from master to 10.1-dev December 8, 2025 16:56
@CufeHaco
Copy link
Copy Markdown
Contributor

CufeHaco commented Dec 9, 2025

First run on my modified code of your builtin speed up experiment @headius.

I haven't updated Jruby on my workstation yet, and im also not running on a beefy setup.

I added to the benchmark run the overhead stack of running wsl Ubuntu on win11 pro, so it could be calculated on a native linux machine.

Screenshot 2025-12-08 194121.png

Screenshot 2025-12-08 194136.png

Core changes:

Builtins.java (new): Core flags, mappings, and check methods. Static init for class/method maps. Expanded with all CRuby ops + Range ext. Added grouped checks for Integer/Float/String/Array/Hash/Range/etc.

RubyRange.java (modified):

include_p/member?: If checkRangeInclude(), try fastIncludeCheck (pure arith/comp for int/float ranges, using checkIntegerCompare/checkFloatCompare). Fallback to original impl.

  • cover_p: Similar fast path with direct comps (strings too, via checkStringEquals).

  • op_eqq (===): Reuses fast include for case-when opts.

  • min/max: Fast return begin/end if checkRangeMin/Max (handles exclusive ints with checkIntegerMinus).

  • Integration.java (snippets for existing files):

  • Ruby.java: Add builtinBits field + getBuiltinBits()/invalidateBuiltin() (calls Builtins.invalidate).

  • ThreadContext.java: Add public final builtinBits ref (init from runtime).

-RubyModule.java: Call invalidateBuiltin in putMethod (post-profile).

  • RubyObject.java: Swap old isBuiltin() to new checks in fastNumEqualInternal.

  • Benchmark_test.rb (new): Full script for repro. Includes baselines (int arith/method calls/array access) for WSL/native calibration, various include? cases, monkey-patch test, and raw comp baseline.

Testing/Verification

  • Benchmark confirms opts engage (times drop) and disengage on monkey-patch (via alias/redefine).

  • Specs: Core suite passes (ran locally). Added flag shouldn't break anything—it's opt-in per-method.

  • Edge: Handles exclusive ranges, floats/strings (in cover?), non-numeric (fallback), redefs on deps (e.g., Integer#< redefined trips compare checks).

I can drop the new code in a new repo for you to clone so you can test and evaluate between machine and env differences if you're satisfied with these results.

@headius
Copy link
Copy Markdown
Member Author

headius commented Dec 9, 2025

@CufeHaco Yeah put the code somewhere and I'll have a look. I'm sure there's lots of places that CRuby has added these built-in method checks over the years, where JRuby only has them in a few specific places. Any places where we need to make dynamic calls to standard core methods would candidates.

@CufeHaco
Copy link
Copy Markdown
Contributor

CufeHaco commented Dec 9, 2025

I just dropped the code into a new repo. Give me just a min and I'll post the link.

I also started a pure jruby win32 and .net compatible layer implementation last night if you want to start a newnthresd for it. It should take the headache of dealing windows for jruby. It shouldn't effect the core internals at all, just jruby <->windows api.

@CufeHaco
Copy link
Copy Markdown
Contributor

CufeHaco commented Dec 9, 2025

@headius heres the link

https://github.com/CufeHaco/Jruby10-builtin-test

The benchmark.rb is ready to go.

@CufeHaco
Copy link
Copy Markdown
Contributor

CufeHaco commented Dec 9, 2025

OH! My bad, you may need to comment out the wsl benchmark, i dont remember if I added a condition handler to it. I dont think i did. Otherwise its good to go.

@headius I apologize for thst oversight. Dad life and all.

@CufeHaco
Copy link
Copy Markdown
Contributor

CufeHaco commented Dec 9, 2025

I'll start looking in

RubyFixnum.java
RubyArray.java
RubyString.java
RubyHash.java
RubyObject.java

And start searching this evening and see if I can get a punch out list for a debug report.

@headius
Copy link
Copy Markdown
Member Author

headius commented Dec 10, 2025

@CufeHaco The implementation looks about right, but is there a reason you haven't done it as a diff against JRuby directly? I believe you are using some LLM tooling for this (which is fine), but eventually it will need to be a PR against JRuby. Might as well start from a PR and we can refine it from there. This is a pretty benign change to introduce, since it's mostly new code and the existing checks can be migrated incrementally.

The implementation of Builtins does seem to have expanded like I expected, so that much is probably good to go.

A good way to find places we're using the existing CallSite-based check would be to look for calls to DynamicMethod.isBuiltin.

@CufeHaco
Copy link
Copy Markdown
Contributor

@headius Just to clarify, the repo at https://github.com/CufeHaco/jrubytest-jruby-jep380-prototype/tree/CufeHaco-JEP-380-full-prototype is already a fork of jruby/jruby with the changes applied to the actual source files (DynamicMethod.java, InterpretedIRMethod.java, etc.).

I'm still learning Java and GitHub's PR workflow isn't intuitive for me yet, so I've been working in an isolated fork where you can review the complete diff. The code is structured to drop into JRuby with minimal changes.

If the implementation looks right, I can work on opening a formal PR against jruby/jruby main - just need some guidance on the GitHub mechanics to make sure I do it correctly.

@CufeHaco
Copy link
Copy Markdown
Contributor

Same with the jruby10 repo. I write it then hook it into jruby for testing and benchmarks, then I drop a repo (because I know I can do that) for you to review. That way I can get feedback on what and where to improve code and skill wise. Im learning this stack as I go @headius

@CufeHaco
Copy link
Copy Markdown
Contributor

@headius the past couple of days ive been coding a benchmark scraper of the all the core java files and builtins. So far I got around 969 total points for optimizing. Here are the ones that I wanted you to see:

  • dig_misc - 188K ops/sec <- biggest win potential
  • write - 270K ops/sec
  • op_and - 423K ops/sec
  • each - 475K ops/sec
  • to_s - 487K ops/sec

The scraper works just like rubian and the rubytk patch. It dynamically goes though the core files and runs benchmarks on everything and saves the data.

I did a couple just like the original to get some numbers, and we are matching and or exceeding in some cases, with arrays, hashes, and strings.

Instead of hard coding and having to write in at each optimization, we could do just 1 Java file that hooks in.

What are your ideas and how would you like to proceed on the matter? Im sorting though the data I have now so you can review and compare.

Screenshot 2025-12-11 144449.png

Screenshot 2025-12-11 144541.png

Screenshot 2025-12-11 144612.png

@CufeHaco
Copy link
Copy Markdown
Contributor

@headius the scraper works great, but I’m running into some friction getting the Java syntax exactly right in the codegen phase.

Conceptually I’m using the same base I sent you before, but expanding it via Ruby’s file handling and string interpolation. Ruby is only being used to author Java source. The runtime still sees normal static code.

For example, generating builtin flags like:

File.write(path, %Q^
    public static final int #{const_name} = 1 << #{bit};
^, mode: "a")

The idea is to avoid hand-coding or relying on LLMs: let Ruby scrape JRuby core, then expand the builtin speedups mechanically wherever they apply. The only tricky part is having the right Java templates so the generated code lands in the correct places.

After that, the remaining work is just peppering:

Builtins.invalidate(Builtins.INTEGER_EQ)

where appropriate.

I may not be explaining it perfectly. I’ve been pretty deep in JRuby internals and a bit burned out, but the intent is straightforward: Ruby writes the Java based on the java template we provide and ruby adds in the real JRuby constant/method names from the scraper, and JRuby runs it as static code. We can make this surgical.

@CufeHaco
Copy link
Copy Markdown
Contributor

@headius Last night I watched one of your older presentations for ruby 9000 to give me a break from looking at code. It was the one you were talking about JVM pressure points, and issues you've had, and then you talked about the byte arrays. Thats when it hit me.

We can directly manipulate the byte arrays dynamically, and cache the patterns. This should also help with boot time as well if im correct. This is where I'm definitely going to need your expertise on the matter. This morning I dug i to it more, and threw my idea between some llms to get a concept going. The code here is pure llm generated for a blueprint of what im thinking, I just need to know the best way to implement the idea with the current builtin experiment as well as future experiments. Its based on my own personal algorithms ive been building, so I understand the logic, I just need to know where to use it. Again, im using this code example to try to convey my idea, as well as a personal blueprint to understand Java and the JVM better.

If you like the idea, would you mind coding a skeleton script? My biggest hurdle is writing Java syntax itself. I can fill in the logic if i have a blueprint to go by. Because this is llm generated, I cant trust the alignment with this ongoing PR. Its just me trying to deliver an idea. RBM is just a short hand i use for recursive bit/binary mapping. Even if it doesnt fit the current builtin experiment, i think its something worth exploring down the road.

package org.jruby.util;

import org.jruby.util.ByteList;

import java.util.concurrent.ConcurrentHashMap;

/**
 * Experimental byte-array pattern engine — build once, match many.
 * Core idea: precompute pattern structures from ByteList bytes for fast reuse.
 * Designed for boot workloads where the same patterns are checked repeatedly.
 */
public class ByteArrayPatternEngine {

    public static final int MAX_PATTERN_LENGTH = 62; // For bitap include?; longer fallback

    // Cache built patterns (immutable key)
    private static final class PatternKey {
        final byte[] bytes;
        final int hash;

        PatternKey(ByteList bl) {
            int len = bl.getRealSize();
            this.bytes = new byte[len];
            System.arraycopy(bl.getUnsafeBytes(), bl.getBegin(), bytes, 0, len);
            this.hash = java.util.Arrays.hashCode(bytes);
        }

        @Override public int hashCode() { return hash; }
        @Override public boolean equals(Object o) {
            if (!(o instanceof PatternKey)) return false;
            return java.util.Arrays.equals(bytes, ((PatternKey)o).bytes);
        }
    }

    // Built pattern structure
    private static class BuiltPattern {
        final byte[] bytes;
        final long[] bitapMask; // null if too long or not needed
        final int length;

        BuiltPattern(ByteList bl) {
            this.length = bl.getRealSize();
            this.bytes = new byte[length];
            System.arraycopy(bl.getUnsafeBytes(), bl.getBegin(), bytes, 0, length);

            if (length <= MAX_PATTERN_LENGTH && length > 0) {
                this.bitapMask = computeBitapMask(bytes);
            } else {
                this.bitapMask = null;
            }
        }
    }

    private static final ConcurrentHashMap<PatternKey, BuiltPattern> PATTERN_CACHE = new ConcurrentHashMap<>();

    // Build or retrieve precomputed pattern
    private static BuiltPattern getBuilt(ByteList pattern) {
        if (pattern.getRealSize() == 0) return null; // special case
        PatternKey key = new PatternKey(pattern);
        return PATTERN_CACHE.computeIfAbsent(key, k -> new BuiltPattern(pattern));
    }

    // --- start_with? using built byte array ---
    public static boolean startsWith(ByteList haystack, ByteList prefix) {
        BuiltPattern built = getBuilt(prefix);
        if (built == null) return true;
        if (built.length > haystack.getRealSize()) return false;

        byte[] hayBytes = haystack.getUnsafeBytes();
        int hayBegin = haystack.getBegin();

        for (int i = 0; i < built.length; i++) {
            if (hayBytes[hayBegin + i] != built.bytes[i]) return false;
        }
        return true;
    }

    // --- include? using built bitap mask (RBM core) ---
    public static boolean includes(ByteList haystack, ByteList needle) {
        BuiltPattern built = getBuilt(needle);
        if (built == null) return true;
        if (built.bitapMask == null) {
            // Fallback for long patterns
            return haystack.indexOf(needle) != -1;
        }

        long state = ~0L;
        long activeMask = (1L << built.length) - 1L;
        long deadState = activeMask;

        byte[] bytes = haystack.getUnsafeBytes();
        int begin = haystack.getBegin();
        int end = begin + haystack.getRealSize();

        for (int i = begin; i < end; i++) {
            int b = bytes[i] & 0xFF;
            state = ((state << 1) | built.bitapMask[b]) & activeMask;
            if ((state & 1L) == 0) return true;
            if (state == deadState) return false;
        }
        return false;
    }

    private static long[] computeBitapMask(byte[] pattern) {
        int len = pattern.length;
        long[] mask = new long[256];
        java.util.Arrays.fill(mask, ~0L);

        for (int i = 0; i < len; i++) {
            int b = pattern[i] & 0xFF;
            mask[b] &= ~(1L << i);
        }
        return mask;
    }

    public static void clearCache() { PATTERN_CACHE.clear(); }
    public static int cacheSize() { return PATTERN_CACHE.size(); }
}

@headius
Copy link
Copy Markdown
Member Author

headius commented Jan 8, 2026

@CufeHaco Your experiments are interesting, but I really need to see some working code in a PR to evaluate it. I'm going to go ahead with a basic implementation of these builtin checks for now but I'd love to see a PR from you with improvements and perhaps your benchmarking rig!

@CufeHaco
Copy link
Copy Markdown
Contributor

CufeHaco commented Jan 8, 2026

@headius ive been working hard to get you some code I'm satisfied with. I've been experimenting with it as well as the other prototypes ive done in my personal microkernel project, and im rather delighted at the first benchmarks even from a crude thrown together draft.

Im in the middle of the second revision, so i dont have stats on it yet. This gives me a chance to experiment with my personal project but as well as giving me a chance to see how it works under load and how it should flow. It also gives me a chance to see how it everything works together.

Redroot microkernel rubian shell Performance vs Bash:

  • File.read: 28-33x faster than bash cat
  • Dir.entries: 33-35x faster than bash ls
  • Ruby pattern matching: 4-5x faster than bash grep
  • Dir.glob: 3.9-4.5x faster than bash find
  • Lines.count: 19-20x faster than bash wc

JRuby Builtin Performance (Java NIO):

  • File I/O with Java NIO: 1.11-1.58x faster than standard Ruby File.read
  • Array operations: 0.95-1.53x faster (competitive with Java's Arrays.sort)

IPC Event Bus: (JEP-380 prototype)

  • 27K ops/sec publish rate
  • 15-20K ops/sec message send rate
  • 36-77μs average latency

ByteMatcher Pattern Matching:

  • 1.3 to1.88x faster than String matching (improves more the longer it runs like the JVM, these are the initial improvments.)

System Stats:

  • Boot time: ~300ms (1504-1704ms actual)
  • Running on $200 Kamrui mini PC (Intel Celeron N5105, quad core, 7.68GB RAM)
  • JRuby 10.0.2.0, Ruby 3.4.2, JVM 21.0.9

Native Threading (without GVL):

  • 2.21-2.68x parallelism factor (true concurrent execution)

These numbers are from it running on 1 core, the microkernel bugged out and only was picking up 1 instead of all 4.

I need to finish up some work at my in-laws store in the morning, but I will try to get some code for you to review afterwards. Im sure you'll be able to point me at what needs to be tweaked.

@headius
Copy link
Copy Markdown
Member Author

headius commented Jan 8, 2026

@headius ive been working hard to get you some code I'm satisfied with. I've been experimenting with it as well as the other prototypes ive done in my personal microkernel project, and im rather delighted at the first benchmarks even from a crude thrown together draft.

Sounds good! I'm going to proceed to land a basic version of the builtin validation, since that particular case has stood out in recent benchmarks. You may be able to find more places to apply those builtin checks once the basic framework is in place (other places we dynamically call core methods, etc).

Redroot microkernel rubian shell Performance vs Bash:

These are interesting cases that likely all need optimization in the current JRuby logic, so I'm looking forward to see what you come up with.

JRuby Builtin Performance (Java NIO):

  • File I/O with Java NIO: 1.11-1.58x faster than standard Ruby File.read
  • Array operations: 0.95-1.53x faster (competitive with Java's Arrays.sort)

Of course any changes in core classes must still be compatible with Ruby, but there's probably many ways we can find fast(er) paths that avoid unnecessary overhead. File and other IO is particularly heavy due to emulating CRuby's buffering and character transcoding logic. For many cases that just need to read bytes fast, that's pure overhead.

IPC Event Bus: (JEP-380 prototype)

I think these numbers look good but I have no comparison 🙂

Replacing our current UNIXSocket implementation with Ruby code calling JEP-380 would be acceptable, again as long as it is compatible (and being more compatible than what we have shouldn't be hard).

ByteMatcher Pattern Matching:

Also good numbers here... one big thing we are missing is recent updates to the "Onigmo" regex engine in CRuby that makes many patterns linear time and avoids ReDOS situations. Those changes (which would go into https://github.com/jruby/joni) aren't huge but they're a C-to-Java porting exercise I have not had time for.

A couple PRs I found related to the linear-time optimizations in CRuby's Regexp:

I need to finish up some work at my in-laws store in the morning, but I will try to get some code for you to review afterwards. Im sure you'll be able to point me at what needs to be tweaked.

I should have the Builtins stuff landed some time this morning US time and we can chat when you're available later.

This includes all the methods and classes in CRuby's version of the
builtin checks, with a few additional classes and methods. I have
generated all combinations here, though class+method combinations
that don't make sense should be removed eventually.

This commit also replaces most fast-path uses of `isBuiltin`
methods throughout JRuby with the equivalent builtin check, but
there are many that have been added to CRuby in recent years.
Future commits will add those additional fast paths.
@CufeHaco
Copy link
Copy Markdown
Contributor

CufeHaco commented Jan 8, 2026

@headius im back at the house. Would you mind giving me a rundown on how to do a proper PR for you? It can be here or on X. Github confuses me in its UI, and I dont want an AI to do something I cant audit.

Im working on prepping the code base so I can let you review what I have so far.

@headius
Copy link
Copy Markdown
Member Author

headius commented Jan 8, 2026

@CufeHaco There's better guides online (and your LLM may be able to guide you) but basically you commit the changes directly against a clone JRuby repository (on a branch, as small commits ideally), push that branch to your GitHub fork of the repository (fork to CufeHaco/jruby), and then use the GH UI or command-line tool to open a pull request from that branch.

This would allow me and other contributors to view your changes in situ, directly atop the current codebased and with features for auditing and commenting on those patches line-by-line. It would also run your changes through our full CI suite, so we know that we're maintaining compatibility.

Maybe try with something small to get familiar with it?

@headius
Copy link
Copy Markdown
Member Author

headius commented Jan 8, 2026

I have pushed the first pass at implementing builtin method checks, replacing all existing checks with the new utility. It appears to be green, which is nice to see.

I've also filed a series of issues related to this, based on a quick audit of CRuby's use of builtin method checking. There's cases dating back to the late 2000s that we've never done. Some may be unnecessary given invokedynamic and the JVM's excellent inlining, but others are more recent and based on real-world use cases (like the [1,2].include?(x) optimization from #9166). Obviously, most of those optimizations will depend on this PR.

@CufeHaco
Copy link
Copy Markdown
Contributor

CufeHaco commented Jan 8, 2026

Awesome. Yeah i will do some practice pr's on my own repos so I can get the hang of it. I'll stick to what ive been doing for you to keep it simple and for the sake of getting code you need to you. Once I get everything ready, I'll post the repo/fork link so you can have a look.

I appreciate your patience with me on the learning curve im dealing with when it comes to github and Java.

headius added a commit that referenced this pull request Jan 23, 2026
This includes all the methods and classes in CRuby's version of the
builtin checks, with a few additional classes and methods. I have
generated all combinations here, though class+method combinations
that don't make sense should be removed eventually.

This commit also replaces most fast-path uses of `isBuiltin`
methods throughout JRuby with the equivalent builtin check, but
there are many that have been added to CRuby in recent years.
Future commits will add those additional fast paths.

Based off work by @CufeHaco in #9174 merged with my
extensions and integration from #9123.

See #9116 and #9119
@headius
Copy link
Copy Markdown
Member Author

headius commented Jan 23, 2026

I've merged my expansion of classes and methods plus first stages of integration with @CufeHaco's version from #9174 and pushed that directly to 10.1-dev.

See a322ab5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants