[v3-2-test] Include TI UUID in scheduler, DAG processor, triggerer, and worker logs (#65458)#65476
Draft
github-actions[bot] wants to merge 20 commits intov3-2-testfrom
Draft
[v3-2-test] Include TI UUID in scheduler, DAG processor, triggerer, and worker logs (#65458)#65476github-actions[bot] wants to merge 20 commits intov3-2-testfrom
github-actions[bot] wants to merge 20 commits intov3-2-testfrom
Conversation
…#63826) (#64723) * Load hook metadata from YAML without importing Hook class * Add hook-name to all provider.yaml connection-types * Add hook-name to connection types and regenerate get_provider_info.py * Fix ruff import order in connections.py * fix: import ProvidersManager at top level per review * Fix provider connection hook display names * Add iter_connection_type_hook_ui_metadata for connection UI hook metadata (cherry picked from commit c4a209b) Co-authored-by: Yuseok Jo <yuseok89@gmail.com>
…evant dirs (#64927) (#64930) Replace two rglob calls with a single os.walk that prunes node_modules and hidden directories (e.g. .git, .venv) in-place, avoiding unnecessary traversal of large directory trees that never contain relevant .pyc files. (cherry picked from commit 27258d5) Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
#65016) * Expose queueing time in the Gantt Chart * Also expose scheduled_dttm in the Gantt Chart * Simplify Gantt tooltip and ensure minimum bar visibility for short segments * Null safety for dayjs calls and add tests for timing segments (cherry picked from commit cd85164) Co-authored-by: Saumyajit Chowdhury <77187489+smyjt@users.noreply.github.com>
…group (#65150) (#65160) Bumps the github-actions-updates group with 1 update: [actions/github-script](https://github.com/actions/github-script). Updates `actions/github-script` from 8.0.0 to 9.0.0 - [Release notes](https://github.com/actions/github-script/releases) - [Commits](actions/github-script@ed59741...3a2844b) (cherry picked from commit e5a047c) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: 9.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions-updates ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…5118) (#65242) * Move release calendar verification to its own scheduled workflow Run dev/verify_release_calendar.py from a dedicated daily scheduled workflow instead of as a canary job in the main CI pipeline, and notify the #release-management Slack channel when the check fails so the issue is surfaced to release managers directly. * Include wiki and calendar links in release calendar Slack alert (cherry picked from commit 048e9a1)
…#63994) (#65226) * Add API check to ensure multi team is enabled when team_name is provided * remove unnecessary arguments in added tests * add variable tests and add slight change to other tests to align with variables test file * Change error message, Modify tests, Add bulk tests, Fix CI issues (cherry picked from commit 6271189) Co-authored-by: ahilashsasidharan <79016853+ahilashsasidharan@users.noreply.github.com>
* Add dag runs filters (Consuming Asset) * Fix: correct consuming asset filter setup using association_table * Trigger CI rebuild * Rename consuming_asset filter to consuming_asset_pattern with database icon * Rename consuming_asset filter to consuming_asset_pattern with database icon * Trigger CI rebuild * Fix consuming_asset_pattern naming * Fix: rename consuming_asset to consuming_asset_pattern * Fix: rename consuming_asset to consuming_asset_pattern * Fix: Resolve PostgreSQL JSON comparison error in _ConsumingAssetFilter * Rebase and fix _ConsumingAssetFilter * Trigger CI * add consumingAsset and filters.searchAsset to en/common.json --------- (cherry picked from commit 5245419) Co-authored-by: fat-catTW <124506982+fat-catTW@users.noreply.github.com> Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com> Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
…tance (#63923) (#65304) After clearing a task instance, the TaskInstances list page was not refreshing to show the updated state. This was because `useClearTaskInstances` was missing `[useTaskInstanceServiceGetTaskInstancesKey]` in the list of query keys to invalidate on success. Both `useClearRun` and `usePatchTaskInstance` correctly invalidate this query — this change brings `useClearTaskInstances` in line with them. Fixes: #60703 (cherry picked from commit f47038e) Co-authored-by: nagasrisai <59650078+nagasrisai@users.noreply.github.com>
The sphinx_airflow_theme default navbar_links includes Registry, but the docs override navbar_links in get_html_theme_options(), so the theme default never applies. (cherry picked from commit d988f75) Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
…e_fileloc + bundle (#65329) (#65343) The public Import Errors API used to match ParseImportError.filename against DagModel.fileloc. In real deployments ``fileloc`` is an absolute path while ``filename`` is relative, so the file-to-DAG resolution often came back empty and the single endpoint fell through to returning the raw error. The list endpoint had a related gap: its CTE was pre-filtered by the caller-visible subset of DAGs, so the per-file authorization check only ever saw the DAGs the caller could already read -- a file containing a mix of readable and unreadable DAGs passed the check on the readable subset alone. * The single endpoint now matches ParseImportError.filename against DagModel.relative_fileloc + DagModel.bundle_name, which is the same key the list endpoint already uses for its join. When the resolved DAG set is empty (parse failed before any DAG was defined, or the name keys did not resolve), the stacktrace is now redacted rather than returned verbatim. * The list endpoint splits the previous ``visible_files_cte`` into two CTEs: ``readable_files_cte`` enumerates the ``(relative_fileloc, bundle_name)`` pairs where the caller can read at least one DAG, and ``file_dags_cte`` enumerates the full ``(relative_fileloc, dag_id, bundle_name)`` set for those files. The per-file authorization check in the groupby loop now receives the complete DAG set for each file and can correctly detect co-located DAGs outside the caller's scope. * The same fall-through in the list endpoint (file has no matching DAGs in DagModel) now redacts the stacktrace before appending. Add a test class that exercises the fix with distinct ``fileloc`` (absolute) and ``relative_fileloc`` (relative) string values, closing the test-fixture gap where both columns previously held the same relative string and the absolute-vs-relative mismatch could not manifest. One existing single-endpoint test documenting the previous fall-through behaviour is updated to assert the new redaction. (cherry picked from commit eba9b65) Generated-by: Claude Opus 4.6 (1M context) following the guidelines at https: //github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
…se log folder (#65325) (#65345) * Refuse to follow log symlinks that resolve outside the base log folder FileTaskHandler._read_from_local used to open every file that matched the task's log glob pattern, including symlinks whose real path was outside the configured base_log_folder. On deployments where worker logs are accessible from the api-server, that meant the log viewer could end up streaming content from files outside the configured log tree whenever a symlink in the task log directory happened to match the glob pattern. Canonicalise self.local_base once via os.path.realpath and, for every glob hit, resolve the path with os.path.realpath and skip it if the resolved form is not contained in the canonicalised base log folder (using os.path.commonpath, with a ValueError fallback for the different-drive case on Windows). Open the resolved path rather than the original glob hit so the file we open is the one we just validated. Append to sources only after a successful open so sources and log_streams stay aligned. Drop the @staticmethod decorator so the method can read self.local_base; existing call sites already invoke it via self. Add a test class covering: regular-file-inside-base is still streamed; a symlink whose real path is outside base_log_folder is skipped; a symlink that stays inside base_log_folder is followed (legitimate rotation case); and base_log_folder itself being a symlink works. Generated-by: Claude Opus 4.6 (1M context) following the guidelines at https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions * Fix test__read_from_local to use valid base_log_folder The existing test passed an empty string as base_log_folder, which after the containment check resolves to CWD via os.path.realpath(""), causing all files under tmp_path to be rejected. Use tmp_path instead. (cherry picked from commit 3eda845) Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…65348) (#65363) JWTRefreshMiddleware derived the cookie Secure flag from the local api.ssl_cert config only. Deployments with TLS terminated at a reverse proxy (no local SSL cert on the Airflow process) therefore received the JWT refresh cookie without the Secure flag. Match the pattern already used by every other cookie-setting location in the codebase (auth.py, simple/routes/login.py, FAB and Keycloak login routes): treat secure as True when either the request came in over HTTPS or a local ssl_cert is configured. (cherry picked from commit 60db83f) Generated-by: Claude Opus 4.6 (1M context) following the guidelines at https: //github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
…_id_pattern (#65309) * [v3-2-test] Bump actions/github-script in the github-actions-updates group (#65150) (#65160) Bumps the github-actions-updates group with 1 update: [actions/github-script](https://github.com/actions/github-script). Updates `actions/github-script` from 8.0.0 to 9.0.0 - [Release notes](https://github.com/actions/github-script/releases) - [Commits](actions/github-script@ed59741...3a2844b) (cherry picked from commit e5a047c) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: 9.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions-updates ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [v3-2-test] Added breeze generate issue content for airflow-ctl (#65042) (#65241) * Add breeze generate issue content for airflow-ctl * add new command to doc (cherry picked from commit b24538b) Co-authored-by: Justin Pakzad <114518232+justinpakzad@users.noreply.github.com> * [v3-2-test] Run release calendar verification on its own schedule (#65118) (#65242) * Move release calendar verification to its own scheduled workflow Run dev/verify_release_calendar.py from a dedicated daily scheduled workflow instead of as a canary job in the main CI pipeline, and notify the #release-management Slack channel when the check fails so the issue is surfaced to release managers directly. * Include wiki and calendar links in release calendar Slack alert (cherry picked from commit 048e9a1) * Fix: PATCH /dags pagination bug and document wildcard dag_id_pattern (#63665) * fixed pagination bug and updated docstring to clarify dag_id_pattern wildcard usage * removed batch loop to update all dags in one shot and added additional test case * Fixed MySQL subquery issue (cherry picked from commit 9504886) --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Co-authored-by: Justin Pakzad <114518232+justinpakzad@users.noreply.github.com>
…nection. (#65231) (#65368) This was discovered by running with a custom External DB manager that had some gnarly queries that ended up being locked behind this transaction. `_single_connection_pool` replaces `settings.engine` with a SingletonThreadPool engine. But `work_session` was created before that — it still holds an internal reference to the old engine object. _single_connection_pool has no way to rebind work_session. So when _get_current_revision(session=work_session) runs on line 1203 — inside the _single_connection_pool() block — it calls session.connection() which goes through the old engine, not the SingletonThreadPool. The old engine's pool was disposed and recreated empty by engine.dispose(), so it creates a brand new connection. That connection is completely outside _single_connection_pool's control. _single_connection_pool guarantees one connection on the new engine. It can't prevent work_session from creating connections on the old one. The name is a bit of a lie — it's really "single connection pool for new code that uses settings.engine", not "single connection total." (cherry picked from commit e3fea3a) (cherry picked from commit f8e0876) Co-authored-by: Ash Berlin-Taylor <ash@apache.org>
…5167) (#65321) * [v3-2-test] Bump actions/github-script in the github-actions-updates group (#65150) (#65160) Bumps the github-actions-updates group with 1 update: [actions/github-script](https://github.com/actions/github-script). Updates `actions/github-script` from 8.0.0 to 9.0.0 - [Release notes](https://github.com/actions/github-script/releases) - [Commits](actions/github-script@ed59741...3a2844b) (cherry picked from commit e5a047c) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: 9.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions-updates ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [v3-2-test] Added breeze generate issue content for airflow-ctl (#65042) (#65241) * Add breeze generate issue content for airflow-ctl * add new command to doc (cherry picked from commit b24538b) Co-authored-by: Justin Pakzad <114518232+justinpakzad@users.noreply.github.com> * [v3-2-test] Run release calendar verification on its own schedule (#65118) (#65242) * Move release calendar verification to its own scheduled workflow Run dev/verify_release_calendar.py from a dedicated daily scheduled workflow instead of as a canary job in the main CI pipeline, and notify the #release-management Slack channel when the check fails so the issue is surfaced to release managers directly. * Include wiki and calendar links in release calendar Slack alert (cherry picked from commit 048e9a1) * [v3-2-test] fix(ui): register trigger and sensor graph node types (#65167) * fix(ui): register trigger and sensor graph node types Adds missing Graph node type mappings for trigger/sensor and includes a focused unit test to prevent regressions where dependency graph rendering breaks for those node kinds. * docs(ui): add graph screenshot showing sensor and trigger nodes * chore(ui): keep PR scoped to graphTypes.ts only --------- (cherry picked from commit e0ed795) Co-authored-by: Windro.xd <88357206+windro-xdd@users.noreply.github.com> Co-authored-by: Kripa Dev <dev@kripa-car-care.local> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Co-authored-by: Justin Pakzad <114518232+justinpakzad@users.noreply.github.com> Co-authored-by: Windro.xd <88357206+windro-xdd@users.noreply.github.com> Co-authored-by: Kripa Dev <dev@kripa-car-care.local>
…#65326) (#65334) Mypy checks for non-provider projects now synchronize the local virtualenv with uv.lock (uv sync --frozen) before running, so contributors see the same dependency set CI uses and avoid results that drift from CI. The update-uv-lock prek hook now runs with --frozen, so pyproject.toml changes that would touch uv.lock fail the hook and require an explicit uv lock + commit instead of silently rewriting the lock during a commit. (cherry picked from commit 9b08d05) Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
…64863) (#65473) * CI: Avoid false recovery alerts when failed job lookup fails * Potential fix for pull request finding --------- (cherry picked from commit b41b11d) Co-authored-by: Henry Chen <henryhenry0512@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…nd worker logs (#65458) Support engineers could not reconstruct a task's full lifecycle from logs because only the Execution API emitted the TaskInstance UUID consistently. Adding ti_id to log lines across the other components makes 'grep ti_id=X' surface every log touching that task, from scheduling through completion. - Worker: bind ti_id to structlog context at startup(). Fresh process per TI means no cross-task leak risk. - Triggerer: extend existing bind_log_contextvars at trigger start. The asyncio.create_task context copy scopes the binding per coroutine. - Scheduler: add ti_id=%s to eight TI-touching log calls across _enqueue_task_instances_with_queued_state, process_executor_events, and _maybe_requeue_stuck_ti. Explicit positional args avoid the contextvar leak a bind+unbind pattern would introduce on exception paths. - DAG processor: add ti_id to callback-processing log lines in _execute_callbacks and _execute_task_callbacks. * Move ti_id into TaskInstance.__repr__; revert redundant log-line additions Addresses review feedback from @jedcunningham on #65458: instead of sprinkling ti_id=%s onto individual scheduler log lines, put the UUID in TaskInstance.__repr__ once and let every %s-formatted TI log line inherit it for free. Strictly better: covers log lines this PR didn't touch and lines added by future PRs without further plumbing. Net diff vs main goes from +61/-16 to +51/-14. Changes: - TaskInstance.__repr__ now appends `ti_id={self.id}` before the closing bracket (matches the existing TaskInstanceNote repr precedent). - Reverted 10 log-line ti_id additions in scheduler_job_runner.py where the existing `%s` format arg was a TaskInstance; repr now supplies ti_id. - Kept the explicit `ti_id=%s` in the "TaskInstance Finished" msg: it formats individual fields (dag_id, task_id, etc.), not %s on the TI, so the repr shortcut does not apply. - Kept DAG processor structlog-kwargs ti_id additions: those go through structlog's kwargs path, not __repr__. - Updated one test assertion in test_scheduler_job.py that hardcoded the exact TaskInstance repr string. * Update test_not_enough_pool_slots for new TaskInstance repr After adding ti_id to TaskInstance.__repr__, test_not_enough_pool_slots needs to include ti_id in the expected log substring. Same fix pattern as test_process_executor_events_with_callback at line 695. * Fix test_not_enough_pool_slots ordering assumption on MySQL dr.task_instances[0] can return can_run first on MySQL (alphabetical default ordering) instead of cannot_run, so the expected ti_id used in the "Not executing" assertion grabbed the wrong task's UUID and the substring check failed on MySQL CI even though it passed on SQLite. Look up the TI by task_id instead to make the assertion order-independent. (cherry picked from commit 1a0efe7) Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
376e74f to
085504d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Support engineers could not reconstruct a task's full lifecycle from logs
because only the Execution API emitted the TaskInstance UUID consistently.
Adding ti_id to log lines across the other components makes 'grep ti_id=X'
surface every log touching that task, from scheduling through completion.
TI means no cross-task leak risk.
asyncio.create_task context copy scopes the binding per coroutine.
_enqueue_task_instances_with_queued_state, process_executor_events,
and _maybe_requeue_stuck_ti. Explicit positional args avoid the
contextvar leak a bind+unbind pattern would introduce on exception paths.
_execute_callbacks and _execute_task_callbacks.
Addresses review feedback from @jedcunningham on #65458: instead of
sprinkling ti_id=%s onto individual scheduler log lines, put the UUID in
TaskInstance.repr once and let every %s-formatted TI log line inherit
it for free. Strictly better: covers log lines this PR didn't touch and
lines added by future PRs without further plumbing.
Net diff vs main goes from +61/-16 to +51/-14.
Changes:
ti_id={self.id}before the closingbracket (matches the existing TaskInstanceNote repr precedent).
the existing
%sformat arg was a TaskInstance; repr now supplies ti_id.ti_id=%sin the "TaskInstance Finished" msg: itformats individual fields (dag_id, task_id, etc.), not %s on the TI,
so the repr shortcut does not apply.
structlog's kwargs path, not repr.
exact TaskInstance repr string.
After adding ti_id to TaskInstance.repr, test_not_enough_pool_slots
needs to include ti_id in the expected log substring. Same fix pattern
as test_process_executor_events_with_callback at line 695.
dr.task_instances[0] can return can_run first on MySQL (alphabetical
default ordering) instead of cannot_run, so the expected ti_id used
in the "Not executing" assertion grabbed the wrong task's UUID and
the substring check failed on MySQL CI even though it passed on SQLite.
Look up the TI by task_id instead to make the assertion order-independent.
(cherry picked from commit 1a0efe7)
Co-authored-by: Kaxil Naik kaxilnaik@gmail.com