-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: Resolve multi-worker crash with SQL Registry (#5784) #5795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This fix addresses a fork-safety issue that caused the Feast feature server to crash when launched with multiple Gunicorn workers and a SQL Registry backend. Root Cause: - SQLAlchemy's connection pools are not fork-safe - Background refresh threads inherited from parent become invalid - Forked workers share corrupted connection pool state Solution: - Add `on_worker_init()` hook to BaseRegistry for post-fork cleanup - Add `reinitialize_engines()` to SqlRegistry for engine disposal/recreation - Implement Gunicorn `post_fork` hook in FeastServeApplication - Each worker now properly reinitializes DB connections after fork Changes: - sdk/python/feast/infra/registry/base_registry.py: Add on_worker_init() - sdk/python/feast/infra/registry/caching_registry.py: Override to restart refresh thread - sdk/python/feast/infra/registry/sql.py: Add reinitialize_engines() and on_worker_init() - sdk/python/feast/feature_server.py: Add post_fork hook to Gunicorn app Fixes: feast-dev#5784
…dev#5597) This adds support for configuring DuckDB's HTTPFS extension S3 settings, enabling compatibility with S3-compatible storage providers like MinIO, LocalStack, and other providers that require path-style URLs. New configuration options in DuckDBOfflineStoreConfig: - s3_url_style: 'path' or 'vhost' - URL style for S3 requests - s3_endpoint: Custom S3 endpoint URL - s3_access_key_id: AWS access key ID - s3_secret_access_key: AWS secret access key - s3_region: AWS region - s3_use_ssl: Whether to use SSL for S3 connections Example usage in feature_store.yaml: ```yaml offline_store: type: duckdb s3_url_style: path s3_endpoint: localhost:9000 s3_access_key_id: minioadmin s3_secret_access_key: minioadmin s3_region: us-east-1 s3_use_ssl: false ``` Implementation details: - S3 configuration is applied via DuckDB's HTTPFS extension - Settings are cached to avoid reconfiguring on each operation - Graceful error handling if configuration fails - Type validation ensures only valid values are accepted Fixes: feast-dev#5597
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if _s3_configured: | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 Global _s3_configured flag is not fork-safe, causing S3 failures in Gunicorn workers
The global _s3_configured flag prevents S3 reconfiguration after a process fork, causing S3 operations to fail in multi-worker deployments.
Click to expand
How the bug is triggered
- When Gunicorn's master process starts, it may call
_configure_duckdb_for_s3()which sets_s3_configured = Trueat line 100 - When Gunicorn forks worker processes, each worker inherits
_s3_configured = Truefrom the parent's memory - However, each worker gets a fresh DuckDB connection via
ibis.get_backend()- the connection is process-local - When a worker calls
_configure_duckdb_for_s3(), the check at line 63-64 returns early because_s3_configuredisTrue:if _s3_configured: return
- The worker's new DuckDB connection never gets S3 settings configured
- S3 operations fail in worker processes
Actual vs Expected Behavior
Actual: Workers inherit _s3_configured = True but have unconfigured DuckDB connections, causing S3 operations to fail.
Expected: Each worker should configure its own DuckDB connection with S3 settings after fork.
Impact
This bug causes S3 data access to fail in production multi-worker Gunicorn deployments when using DuckDB offline store with S3-compatible storage (MinIO, LocalStack, AWS S3, etc.).
Recommendation: The _s3_configured flag should either be removed (configure on every call is safe since DuckDB handles idempotent SET operations), or the flag should be reset in forked processes. Consider adding a mechanism similar to on_worker_init() that resets _s3_configured = False after fork, or use a process-local approach instead of a global flag.
Was this helpful? React with 👍 or 👎 to provide feedback.
This fix addresses a fork-safety issue that caused the Feast feature server to crash when launched with multiple Gunicorn workers and a SQL Registry backend.
Root Cause:
Solution:
on_worker_init()hook to BaseRegistry for post-fork cleanupreinitialize_engines()to SqlRegistry for engine disposal/recreationpost_forkhook in FeastServeApplicationChanges:
Fixes: #5784