Skip to content

Conversation

@u-sheikh
Copy link

@u-sheikh u-sheikh commented Dec 28, 2025

This fix addresses a fork-safety issue that caused the Feast feature server to crash when launched with multiple Gunicorn workers and a SQL Registry backend.

Root Cause:

  • SQLAlchemy's connection pools are not fork-safe
  • Background refresh threads inherited from parent become invalid
  • Forked workers share corrupted connection pool state

Solution:

  • Add on_worker_init() hook to BaseRegistry for post-fork cleanup
  • Add reinitialize_engines() to SqlRegistry for engine disposal/recreation
  • Implement Gunicorn post_fork hook in FeastServeApplication
  • Each worker now properly reinitializes DB connections after fork

Changes:

  • sdk/python/feast/infra/registry/base_registry.py: Add on_worker_init()
  • sdk/python/feast/infra/registry/caching_registry.py: Override to restart refresh thread
  • sdk/python/feast/infra/registry/sql.py: Add reinitialize_engines() and on_worker_init()
  • sdk/python/feast/feature_server.py: Add post_fork hook to Gunicorn app

Fixes: #5784


Open with Devin

This fix addresses a fork-safety issue that caused the Feast feature
server to crash when launched with multiple Gunicorn workers and a
SQL Registry backend.

Root Cause:
- SQLAlchemy's connection pools are not fork-safe
- Background refresh threads inherited from parent become invalid
- Forked workers share corrupted connection pool state

Solution:
- Add `on_worker_init()` hook to BaseRegistry for post-fork cleanup
- Add `reinitialize_engines()` to SqlRegistry for engine disposal/recreation
- Implement Gunicorn `post_fork` hook in FeastServeApplication
- Each worker now properly reinitializes DB connections after fork

Changes:
- sdk/python/feast/infra/registry/base_registry.py: Add on_worker_init()
- sdk/python/feast/infra/registry/caching_registry.py: Override to restart refresh thread
- sdk/python/feast/infra/registry/sql.py: Add reinitialize_engines() and on_worker_init()
- sdk/python/feast/feature_server.py: Add post_fork hook to Gunicorn app

Fixes: feast-dev#5784
@u-sheikh u-sheikh requested a review from a team as a code owner December 28, 2025 14:34
…dev#5597)

This adds support for configuring DuckDB's HTTPFS extension S3 settings,
enabling compatibility with S3-compatible storage providers like MinIO,
LocalStack, and other providers that require path-style URLs.

New configuration options in DuckDBOfflineStoreConfig:
- s3_url_style: 'path' or 'vhost' - URL style for S3 requests
- s3_endpoint: Custom S3 endpoint URL
- s3_access_key_id: AWS access key ID
- s3_secret_access_key: AWS secret access key
- s3_region: AWS region
- s3_use_ssl: Whether to use SSL for S3 connections

Example usage in feature_store.yaml:
```yaml
offline_store:
    type: duckdb
    s3_url_style: path
    s3_endpoint: localhost:9000
    s3_access_key_id: minioadmin
    s3_secret_access_key: minioadmin
    s3_region: us-east-1
    s3_use_ssl: false
```

Implementation details:
- S3 configuration is applied via DuckDB's HTTPFS extension
- Settings are cached to avoid reconfiguring on each operation
- Graceful error handling if configuration fails
- Type validation ensures only valid values are accepted

Fixes: feast-dev#5597
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View issue and 5 additional flags in Devin Review.

Open in Devin Review

Comment on lines +63 to +64
if _s3_configured:
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Global _s3_configured flag is not fork-safe, causing S3 failures in Gunicorn workers

The global _s3_configured flag prevents S3 reconfiguration after a process fork, causing S3 operations to fail in multi-worker deployments.

Click to expand

How the bug is triggered

  1. When Gunicorn's master process starts, it may call _configure_duckdb_for_s3() which sets _s3_configured = True at line 100
  2. When Gunicorn forks worker processes, each worker inherits _s3_configured = True from the parent's memory
  3. However, each worker gets a fresh DuckDB connection via ibis.get_backend() - the connection is process-local
  4. When a worker calls _configure_duckdb_for_s3(), the check at line 63-64 returns early because _s3_configured is True:
    if _s3_configured:
        return
  5. The worker's new DuckDB connection never gets S3 settings configured
  6. S3 operations fail in worker processes

Actual vs Expected Behavior

Actual: Workers inherit _s3_configured = True but have unconfigured DuckDB connections, causing S3 operations to fail.

Expected: Each worker should configure its own DuckDB connection with S3 settings after fork.

Impact

This bug causes S3 data access to fail in production multi-worker Gunicorn deployments when using DuckDB offline store with S3-compatible storage (MinIO, LocalStack, AWS S3, etc.).

Recommendation: The _s3_configured flag should either be removed (configure on every call is safe since DuckDB handles idempotent SET operations), or the flag should be reset in forked processes. Consider adding a mechanism similar to on_worker_init() that resets _s3_configured = False after fork, or use a process-local approach instead of a global flag.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The feast server crashes when starting multiple workers with SQL registry

2 participants