Skip to content

Conversation

@jorendorff
Copy link
Contributor

@jorendorff jorendorff commented Jan 15, 2026

Description

For pathological repositories like git bombs with deeply nested tree structures, Rugged's count_recursive method can hang indefinitely even with a file limit, because it has to visit every tree object to find and count the blobs. See libgit2/rugged#995.

That causes RuggedRepository#get_tree_size to hang, so git-linguist stats can hang.

This PR works around that issue by using a custom count method that also applies a tree limit, so that it terminates promptly regardless of repository structure. The return value is still the blob count for normal repos, preserving existing behavior.

Checklist:

  • I am adding new or changing current functionality
    • I have added or updated the tests for the new or changed functionality.

For pathological repositories like git bombs with deeply nested tree
structures, the previous implementation could hang indefinitely even
with a file limit, because it had to visit every tree object to
discover the blobs.

This change adds a separate tree count that also triggers the limit,
ensuring the method terminates promptly regardless of repository
structure. The return value is still the blob count for normal repos,
preserving existing behavior.
@jorendorff jorendorff requested a review from a team as a code owner January 15, 2026 20:43
@jorendorff jorendorff changed the title Jorendorff/git bomb Make get_tree_count robust against git bombs Jan 15, 2026
@jorendorff jorendorff changed the title Make get_tree_count robust against git bombs Make get_tree_size robust against git bombs Jan 15, 2026
@carlosmn
Copy link

This reimplements count_recursive but with the tree limit and the trick in the implementation to return the limit if we have too many trees. I think ideally the tree limit would be different and we could also just return 0 or some value that means unknown if we do see too many trees, but I don't know where all this ends up getting used.

        def get_tree_size(commit_id, limit)
          tree_count = 0
          blob_count = 0
          get_tree(commit_id).walk(:preorder) do |root, entry|
            if blob_count >= limit || tree_count >= limit
              blob_count = limit # If we have too many trees we return the limit
              raise StopIteration
            end

            case entry[:type]
            when :blob
              blob_count += 1
            when :tree
              tree_count += 1
            end

            true # go into a tree if that's what we were given
          end

          blob_count
        rescue StopIteration
          blob_count
        end

The previous test created 1000 git trees, taking 11s. This creates 32 in <0.5s.

This now creates an actual git bomb, if you cloned the repo you'd get 2^32-1 directories.
@jorendorff
Copy link
Contributor Author

I ended up with this. This is now ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants