fix: Resolve multi-worker crash with SQL Registry (#5784) #5795

u-sheikh · 2025-12-28T14:34:53Z

This fix addresses a fork-safety issue that caused the Feast feature server to crash when launched with multiple Gunicorn workers and a SQL Registry backend.

Root Cause:

SQLAlchemy's connection pools are not fork-safe
Background refresh threads inherited from parent become invalid
Forked workers share corrupted connection pool state

Solution:

Add on_worker_init() hook to BaseRegistry for post-fork cleanup
Add reinitialize_engines() to SqlRegistry for engine disposal/recreation
Implement Gunicorn post_fork hook in FeastServeApplication
Each worker now properly reinitializes DB connections after fork

Changes:

sdk/python/feast/infra/registry/base_registry.py: Add on_worker_init()
sdk/python/feast/infra/registry/caching_registry.py: Override to restart refresh thread
sdk/python/feast/infra/registry/sql.py: Add reinitialize_engines() and on_worker_init()
sdk/python/feast/feature_server.py: Add post_fork hook to Gunicorn app

Fixes: #5784

This fix addresses a fork-safety issue that caused the Feast feature server to crash when launched with multiple Gunicorn workers and a SQL Registry backend. Root Cause: - SQLAlchemy's connection pools are not fork-safe - Background refresh threads inherited from parent become invalid - Forked workers share corrupted connection pool state Solution: - Add `on_worker_init()` hook to BaseRegistry for post-fork cleanup - Add `reinitialize_engines()` to SqlRegistry for engine disposal/recreation - Implement Gunicorn `post_fork` hook in FeastServeApplication - Each worker now properly reinitializes DB connections after fork Changes: - sdk/python/feast/infra/registry/base_registry.py: Add on_worker_init() - sdk/python/feast/infra/registry/caching_registry.py: Override to restart refresh thread - sdk/python/feast/infra/registry/sql.py: Add reinitialize_engines() and on_worker_init() - sdk/python/feast/feature_server.py: Add post_fork hook to Gunicorn app Fixes: feast-dev#5784

…dev#5597) This adds support for configuring DuckDB's HTTPFS extension S3 settings, enabling compatibility with S3-compatible storage providers like MinIO, LocalStack, and other providers that require path-style URLs. New configuration options in DuckDBOfflineStoreConfig: - s3_url_style: 'path' or 'vhost' - URL style for S3 requests - s3_endpoint: Custom S3 endpoint URL - s3_access_key_id: AWS access key ID - s3_secret_access_key: AWS secret access key - s3_region: AWS region - s3_use_ssl: Whether to use SSL for S3 connections Example usage in feature_store.yaml: ```yaml offline_store: type: duckdb s3_url_style: path s3_endpoint: localhost:9000 s3_access_key_id: minioadmin s3_secret_access_key: minioadmin s3_region: us-east-1 s3_use_ssl: false ``` Implementation details: - S3 configuration is applied via DuckDB's HTTPFS extension - Settings are cached to avoid reconfiguring on each operation - Graceful error handling if configuration fails - Type validation ensures only valid values are accepted Fixes: feast-dev#5597

devin-ai-integration

Devin Review found 1 potential issue.

View issue and 5 additional flags in Devin Review.

devin-ai-integration · 2026-01-26T14:56:15Z

sdk/python/feast/infra/offline_stores/duckdb.py

+    if _s3_configured:
+        return


🔴 Global _s3_configured flag is not fork-safe, causing S3 failures in Gunicorn workers

The global _s3_configured flag prevents S3 reconfiguration after a process fork, causing S3 operations to fail in multi-worker deployments.

Click to expand

How the bug is triggered

When Gunicorn's master process starts, it may call _configure_duckdb_for_s3() which sets _s3_configured = True at line 100

When Gunicorn forks worker processes, each worker inherits _s3_configured = True from the parent's memory

However, each worker gets a fresh DuckDB connection via ibis.get_backend() - the connection is process-local

When a worker calls _configure_duckdb_for_s3(), the check at line 63-64 returns early because _s3_configured is True:
if _s3_configured: return

The worker's new DuckDB connection never gets S3 settings configured

S3 operations fail in worker processes

Actual vs Expected Behavior

Actual: Workers inherit _s3_configured = True but have unconfigured DuckDB connections, causing S3 operations to fail.

Expected: Each worker should configure its own DuckDB connection with S3 settings after fork.

Impact

This bug causes S3 data access to fail in production multi-worker Gunicorn deployments when using DuckDB offline store with S3-compatible storage (MinIO, LocalStack, AWS S3, etc.).

Recommendation: The _s3_configured flag should either be removed (configure on every call is safe since DuckDB handles idempotent SET operations), or the flag should be reset in forked processes. Consider adding a mechanism similar to on_worker_init() that resets _s3_configured = False after fork, or use a process-local approach instead of a global flag.

Was this helpful? React with 👍 or 👎 to provide feedback.

u-sheikh requested a review from a team as a code owner December 28, 2025 14:34

devin-ai-integration bot reviewed Jan 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Resolve multi-worker crash with SQL Registry (#5784) #5795

fix: Resolve multi-worker crash with SQL Registry (#5784) #5795

Uh oh!

u-sheikh commented Dec 28, 2025 •

edited by devin-ai-integration bot

Loading

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: Resolve multi-worker crash with SQL Registry (#5784) #5795

Are you sure you want to change the base?

fix: Resolve multi-worker crash with SQL Registry (#5784) #5795

Uh oh!

Conversation

u-sheikh commented Dec 28, 2025 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Jan 26, 2026

Choose a reason for hiding this comment

How the bug is triggered

Actual vs Expected Behavior

Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

u-sheikh commented Dec 28, 2025 •

edited by devin-ai-integration bot

Loading