Skip to content

Conversation

@kabir
Copy link
Collaborator

@kabir kabir commented Oct 20, 2025

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @kabir, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refines the event queue replication system, focusing on reliability and correctness in distributed and asynchronous scenarios. It addresses a critical bug that caused event loss in blocking API calls by ensuring continuous background consumption. A new transaction-aware "poison pill" mechanism guarantees that distributed queues are gracefully terminated only after task finalization is durably committed, preventing race conditions. The core event queue logic has been upgraded with an EventQueueItem abstraction, allowing for smarter handling of local versus replicated events and enabling more flexible queue lifecycles that support fire-and-forget patterns and late resubscriptions. These changes are thoroughly validated with new multi-instance integration tests using Testcontainers, ensuring robust behavior across a cluster.

Highlights

  • Robust Event Queue Lifecycle Management: Introduced EventQueueItem to differentiate local and replicated events, preventing replication loops and ensuring only local events trigger database writes. The MainQueue now intelligently manages its lifecycle, staying open for non-finalized tasks even after all child consumers disconnect, enabling fire-and-forget patterns and late resubscriptions.
  • Transaction-Aware "Poison Pill" for Distributed Cleanup: Implemented a new mechanism to send QueueClosedEvent (a "poison pill") only after a task's final state is durably committed to the database. This uses CDI events with TransactionPhase.AFTER_SUCCESS, eliminating race conditions and ensuring deterministic, reliable distributed queue termination.
  • Fix for Lost Events in Blocking Calls: Corrected a bug where events produced after the initial response in a blocking call were lost. Background consumption now continues after the first event is returned, ensuring all subsequent events are processed and persisted to the task store.
  • Comprehensive Multi-Instance Integration Tests: Added a new suite of integration tests using Testcontainers to simulate a multi-node cluster with Kafka and PostgreSQL. These tests validate the end-to-end replication, poison pill, and intelligent queue lifecycle behaviors in a distributed environment.
  • Improved SSE Client Completion Handling: Enhanced the SSE client transport to explicitly signal normal stream completion via sseEventListener.onComplete(), providing clearer differentiation from error-based stream termination.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@kabir kabir force-pushed the event-queues-replicated-tests branch from ee415af to 80eedbe Compare October 20, 2025 19:52
@kabir kabir changed the title Event queues replicated tests feat: Add multi-server test using the replicated queue manager. Oct 20, 2025
@kabir
Copy link
Collaborator Author

kabir commented Oct 20, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust and well-designed mechanism for event replication and queue management in a distributed environment. The implementation of a transaction-aware "poison pill" using CDI events is an excellent solution to prevent race conditions during task finalization. The refactoring from a ThreadLocal to an EventQueueItem interface to handle replication loops is a significant improvement in clarity and correctness. Furthermore, the addition of comprehensive multi-instance integration tests using Testcontainers provides high confidence in the new replication logic. The changes to the queue lifecycle, allowing queues for non-finalized tasks to persist, correctly supports fire-and-forget patterns and late resubscriptions. Overall, this is a very strong set of changes that greatly enhances the distributed capabilities of the system.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive multi-server integration test for the replicated queue manager using Testcontainers. The setup with two Quarkus applications, a shared database, and Kafka is well-structured. The test logic correctly validates event replication between instances. I've added a few comments to improve code quality and maintainability, mainly regarding logging practices in the new test class and a design improvement for a utility class. Overall, this is a great addition for ensuring the robustness of the replication feature.

* - A2A Client instances to interact with both applications
*/
@Testcontainers
public class MultiInstanceReplicationTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test class uses System.out.println, System.err.println, and e.printStackTrace() for logging. It is a better practice to use a dedicated logging framework like SLF4J, which is already a dependency in this module. Using a logger provides better control over log levels, formatting, and can be configured for different environments. It also ensures that exception stack traces are logged correctly.

You could introduce a logger like this:

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

// ...

public class MultiInstanceReplicationTest {
    private static final Logger logger = LoggerFactory.getLogger(MultiInstanceReplicationTest.class);
    // ...
}

Then, you can replace calls like:

  • System.out.println("...") with logger.info("...") or logger.debug("...").
  • System.err.println("...") with logger.error("...").
  • e.printStackTrace() with logger.error("An error occurred", e);.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed most of the printlns, but I think for the error ones, they are good to keep to diagnose problems

@kabir kabir merged commit 5f4a277 into a2aproject:main Oct 21, 2025
8 checks passed
kabir added a commit to kabir/a2a-java that referenced this pull request Oct 28, 2025
…nd improve CI diagnostics

  - Filter HTTP/2 stream cancellation errors in KafkaReplicationIntegrationTest (same fix as PR a2aproject#380)
  - Add surefire reports and build logs upload to build-and-test workflow
  - Enhance TCK workflow to capture test output, server logs, and compliance reports

  Verified with 20/20 stress test passes.
kabir added a commit to kabir/a2a-java that referenced this pull request Oct 28, 2025
…nd improve CI diagnostics

  - Filter HTTP/2 stream cancellation errors in KafkaReplicationIntegrationTest (same fix as PR a2aproject#380)
  - Add surefire reports and build logs upload to build-and-test workflow
  - Enhance TCK workflow to capture test output, server logs, and compliance reports
fjuma pushed a commit that referenced this pull request Oct 28, 2025
#398)

…nd improve CI diagnostics

- Filter HTTP/2 stream cancellation errors in
KafkaReplicationIntegrationTest (same fix as PR #380)
- Add surefire reports and build logs upload to build-and-test workflow
- Enhance TCK workflow to capture test output, server logs, and
compliance reports
@kabir kabir deleted the event-queues-replicated-tests branch November 3, 2025 13:02
kabir added a commit to kabir/a2a-java that referenced this pull request Dec 23, 2025
kabir added a commit to kabir/a2a-java that referenced this pull request Dec 23, 2025
a2aproject#398)

…nd improve CI diagnostics

- Filter HTTP/2 stream cancellation errors in
KafkaReplicationIntegrationTest (same fix as PR a2aproject#380)
- Add surefire reports and build logs upload to build-and-test workflow
- Enhance TCK workflow to capture test output, server logs, and
compliance reports
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants