Skip to content

Conversation

@kabir
Copy link
Collaborator

@kabir kabir commented Oct 23, 2025

Add Kubernetes deployment example demonstrating multi-instance A2A agents with database persistence and Kafka-based event replication.

Fixes #281 and #373

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @kabir, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a robust cloud deployment example for A2A agents, showcasing a scalable and resilient architecture on Kubernetes. It integrates PostgreSQL for task persistence and Kafka for real-time event replication, allowing multiple agent instances to operate cohesively. The example comes with comprehensive deployment scripts and a verification client, making it easy to set up and test a distributed A2A agent environment. This directly addresses issues #281 and #373 by providing a concrete implementation for cloud-native A2A agents.

Highlights

  • Cloud Deployment Example: Introduced a new, comprehensive example for deploying A2A agents in a Kubernetes environment using Kind.
  • Multi-Instance Agents: Demonstrates how to run A2A agents with multiple pods for load balancing and high availability.
  • Database Persistence: Configures PostgreSQL for persistent task storage, ensuring task state consistency across agent instances.
  • Kafka Event Replication: Implements Kafka-based event replication to synchronize events across different agent pods in real-time using Strimzi.
  • Automated Deployment & Verification: Provides detailed Kubernetes YAMLs and shell scripts for automated deployment, verification, and cleanup of the entire stack.
  • Test Client: Includes a Java test client to validate the multi-pod behavior, load balancing, and event replication.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/cloud-deployment-example.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a cloud deployment example for the A2A agent using Kubernetes, PostgreSQL, and Kafka. It includes deployment scripts, Kubernetes configurations, and a test client to demonstrate multi-pod behavior with load balancing and event replication. The changes involve adding new files for the deployment setup and modifying the pom.xml to include the new module. The review focuses on ensuring the deployment scripts are robust, the Kubernetes configurations are secure and efficient, and the documentation is clear and comprehensive.

Comment on lines +172 to +181
if ! kubectl get crd kafkas.kafka.strimzi.io > /dev/null 2>&1; then
echo "Installing Strimzi operator..."
kubectl create -f 'https://strimzi.io/install/latest?namespace=kafka' -n kafka
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The script is installing the Strimzi operator without checking if the kafka namespace already exists. This can lead to errors if the namespace was previously created with different configurations. It's recommended to check for the namespace's existence before attempting to create it.

Suggested change
if ! kubectl get crd kafkas.kafka.strimzi.io > /dev/null 2>&1; then
echo "Installing Strimzi operator..."
kubectl create -f 'https://strimzi.io/install/latest?namespace=kafka' -n kafka
if ! kubectl get namespace kafka > /dev/null 2>&1; then
echo "Creating kafka namespace..."
kubectl create namespace kafka
fi

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually have this earlier, on lines 167-170. Also it looks like your suggestion skips installing the operator :-D

Comment on lines +215 to +241
echo "Waiting for Kafka to be ready (using KRaft mode, typically 2-3 minutes)..."

# Monitor progress while waiting
for i in {1..60}; do
echo "Checking Kafka status (attempt $i/60)..."
kubectl get kafka -n kafka -o wide 2>/dev/null || true
kubectl get pods -n kafka -l strimzi.io/cluster=a2a-kafka 2>/dev/null || true

if kubectl wait --for=condition=Ready kafka/a2a-kafka -n kafka --timeout=10s 2>/dev/null; then
echo -e "${GREEN}✓ Kafka deployed${NC}"
break
fi

if [ $i -eq 60 ]; then
echo -e "${RED}ERROR: Timeout waiting for Kafka${NC}"
kubectl describe kafka/a2a-kafka -n kafka
kubectl get events -n kafka --sort-by='.lastTimestamp'
exit 1
fi
done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The script waits for Kafka to be ready but doesn't handle potential errors during the Kafka deployment process effectively. If Kafka fails to deploy, the script continues, leading to subsequent failures. Consider adding more robust error handling and logging to diagnose Kafka deployment issues.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We loop 60 times (10 minutes total)
  • We display status on each attempt
  • On timeout, we run kubectl describe and show recent events
  • We exit with error code 1

We DO exit on timeout (line 232: exit 1). This is already robust.

Comment on lines 54 to 66
if [ "$($CONTAINER_TOOL inspect -f '{{.State.Running}}' "${REG_NAME}" 2>/dev/null || true)" != 'true' ]; then
echo "Creating registry container..."
$CONTAINER_TOOL run \
-d --restart=always -p "127.0.0.1:${REG_PORT}:5000" --network bridge --name "${REG_NAME}" \
registry:2
echo -e "${GREEN}✓ Registry container created${NC}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The script checks if the registry container is running, but it doesn't verify if the registry is healthy and accessible before pushing images. This can lead to image push failures if the registry is in a bad state. Add a check to verify the registry's health before pushing images.

Suggested change
if [ "$($CONTAINER_TOOL inspect -f '{{.State.Running}}' "${REG_NAME}" 2>/dev/null || true)" != 'true' ]; then
echo "Creating registry container..."
$CONTAINER_TOOL run \
-d --restart=always -p "127.0.0.1:${REG_PORT}:5000" --network bridge --name "${REG_NAME}" \
registry:2
echo -e "${GREEN}✓ Registry container created${NC}"
if [ "$($CONTAINER_TOOL inspect -f '{{.State.Running}}' "${REG_NAME}" 2>/dev/null || true)" != 'true' ]; then
echo "Creating registry container..."
$CONTAINER_TOOL run \
-d --restart=always -p "127.0.0.1:${REG_PORT}:5000" --network bridge --name "${REG_NAME}" \
registry:2
echo -e "${GREEN}✓ Registry container created${NC}"
else
echo -e "${GREEN}✓ Registry container already running${NC}"
# Verify registry is accessible
if ! curl -s http://localhost:${REG_PORT}/v2/ > /dev/null 2>&1; then
echo -e "${RED}ERROR: Registry not accessible${NC}"
exit 1
fi
fi

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check this on lines 127-135

Comment on lines +230 to +239
kubectl describe kafka/a2a-kafka -n kafka
kubectl get events -n kafka --sort-by='.lastTimestamp'
exit 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding a check to ensure that the Kafka topic is created successfully before proceeding. This could prevent issues later in the deployment process if the topic creation fails.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check this on line 241

@kabir kabir force-pushed the event-queues-replicated-example-kind branch from 3b51b16 to bd665fd Compare October 24, 2025 13:43
@kabir kabir force-pushed the event-queues-replicated-example-kind branch from bd665fd to 78f807d Compare October 24, 2025 15:04
@fjuma fjuma merged commit 6f1b5b2 into a2aproject:main Oct 27, 2025
11 of 12 checks passed
kabir added a commit to kabir/a2a-java that referenced this pull request Dec 23, 2025
…#389)

Add Kubernetes deployment example demonstrating multi-instance A2A
agents with database persistence and Kafka-based event replication.

Fixes a2aproject#281 and
a2aproject#373

---------

Co-authored-by: Farah Juma <fjuma@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Experiment with deploying an A2A server agent on k8s to identify any issues/gaps with being able to scale agents

3 participants