Skip to content

Duplicate entry hardening#2275

Open
newren wants to merge 4 commits intogit:masterfrom
newren:duplicate-entry-hardening
Open

Duplicate entry hardening#2275
newren wants to merge 4 commits intogit:masterfrom
newren:duplicate-entry-hardening

Conversation

@newren
Copy link
Copy Markdown
Contributor

@newren newren commented Apr 19, 2026

We had some corrupt trees with duplicate entries in real world repositories, which triggered an assertion failure in merge-ort. Further, the corrupt tree creation in the third party tool would have been avoided had verify_cache() correctly checked for D/F conflicts. Provide fixes for both issues, including a preparatory strmap API fix needed for the merge-ort fix.

newren added 4 commits April 20, 2026 07:56
traverse_trees_wrapper() saves entries from a first pass through
traverse_trees() and then replays them through the real callback
(collect_merge_info_callback).  However, the replay loop silently
discards the callback return value.  This means any error reported by
the callback during replay -- including a future check for malformed
trees -- would be ignored, allowing the merge to proceed with corrupt
state.

Capture the return value, stop the loop on negative (error) returns,
and propagate the error to the caller.  Note that the callback returns
a positive mask value on success, so we normalize non-negative returns
to 0 for the caller.

Signed-off-by: Elijah Newren <newren@gmail.com>
…e_info()

collect_merge_info() has set info.show_all_errors = 1 since
d2bc199 (merge-ort: implement a very basic collect_merge_info(),
2020-12-13).  This setting was copied from unpack-trees.c where it
controls batching of error messages for porcelain display, but
merge-ort has no such error-batching logic and never needed it.

With show_all_errors set, traverse_trees() captures a negative callback
return but continues processing remaining entries rather than stopping
immediately.  Removing the setting restores the default behavior where
a negative return from collect_merge_info_callback() breaks out of the
traversal loop right away, allowing a future commit to exit early when
a corrupt tree is detected.

Signed-off-by: Elijah Newren <newren@gmail.com>
Trees with duplicate entries are malformed; fsck reports "contains
duplicate file entries" for them.  merge-ort has from the beginning
assumed that we would never hit such trees.  It was written with the
assumption that traverse_trees() calls collect_merge_info_callback() at
most once per path.  The "sanity checks" in that callback (added in
d2bc199 (merge-ort: implement a very basic collect_merge_info(),
2020-12-13)) verify properties of each individual call but not that
invariant.  The strmap_put() in setup_path_info() silently overwrites
the entry from any prior call for the same path, because it assumed
there would be no other path.  Unfortunately, supplemental data
structures for various optimizations could still be tweaked before the
extra paths were overwritten, and those data structures not matching
expected state could trip various assertions.

Change the return type of setup_path_info() from void to int to allow us
to detect this case, and abort the merge with a clear error message when
it occurs.

Signed-off-by: Elijah Newren <newren@gmail.com>
verify_cache() checks that the index does not contain both "path" and
"path/file" before writing a tree.  It does this by comparing only
adjacent entries, relying on the assumption that "path/file" would
immediately follow "path" in sorted order.  Unfortunately, this
assumption does not always hold.  For example:

    docs                     <-- submodule entry
    docs-internal/README.md  <-- intervening entry
    docs/requirements.txt    <-- D/F conflict, NOT adjacent to "docs"

When this happens, verify_cache() silently misses the D/F conflict and
write-tree produces a corrupt tree object containing duplicate entries
(one for the submodule "docs" and one for the tree "docs").

I could not find any caller in current git that both allows the index to
get into this state and then tries to write it out without doing other
checks beyond the verify_cache() call in cache_tree_update(), but
verify_cache() is documented as a safety net for preventing corrupt
trees and should actually provide that guarantee.  A downstream consumer
that relied solely on cache_tree_update()'s internal checking via
verify_cache() to prevent duplicate tree entries was bitten by the gap.

Add a test that constructs a corrupt index directly (bypassing the D/F
checks in add_index_entry) and verifies that write-tree now rejects it.

Signed-off-by: Elijah Newren <newren@gmail.com>
@newren newren force-pushed the duplicate-entry-hardening branch from 94facb1 to fc565ed Compare April 20, 2026 15:39
@gitgitgadget-git
Copy link
Copy Markdown

There is an issue in commit aff1fdc:
merge-ort: drop unnecessary show_all_errors setting from collect_merge_info()

  • First line of commit message is too long (> 76 columns)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant