Skip to content

[v3-2-test] Include TI UUID in scheduler, DAG processor, triggerer, and worker logs (#65458)#65476

Draft
github-actions[bot] wants to merge 20 commits intov3-2-testfrom
backport-1a0efe7-v3-2-test
Draft

[v3-2-test] Include TI UUID in scheduler, DAG processor, triggerer, and worker logs (#65458)#65476
github-actions[bot] wants to merge 20 commits intov3-2-testfrom
backport-1a0efe7-v3-2-test

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Support engineers could not reconstruct a task's full lifecycle from logs
because only the Execution API emitted the TaskInstance UUID consistently.
Adding ti_id to log lines across the other components makes 'grep ti_id=X'
surface every log touching that task, from scheduling through completion.

  • Worker: bind ti_id to structlog context at startup(). Fresh process per
    TI means no cross-task leak risk.
  • Triggerer: extend existing bind_log_contextvars at trigger start. The
    asyncio.create_task context copy scopes the binding per coroutine.
  • Scheduler: add ti_id=%s to eight TI-touching log calls across
    _enqueue_task_instances_with_queued_state, process_executor_events,
    and _maybe_requeue_stuck_ti. Explicit positional args avoid the
    contextvar leak a bind+unbind pattern would introduce on exception paths.
  • DAG processor: add ti_id to callback-processing log lines in
    _execute_callbacks and _execute_task_callbacks.
  • Move ti_id into TaskInstance.repr; revert redundant log-line additions

Addresses review feedback from @jedcunningham on #65458: instead of
sprinkling ti_id=%s onto individual scheduler log lines, put the UUID in
TaskInstance.repr once and let every %s-formatted TI log line inherit
it for free. Strictly better: covers log lines this PR didn't touch and
lines added by future PRs without further plumbing.

Net diff vs main goes from +61/-16 to +51/-14.

Changes:

  • TaskInstance.repr now appends ti_id={self.id} before the closing
    bracket (matches the existing TaskInstanceNote repr precedent).
  • Reverted 10 log-line ti_id additions in scheduler_job_runner.py where
    the existing %s format arg was a TaskInstance; repr now supplies ti_id.
  • Kept the explicit ti_id=%s in the "TaskInstance Finished" msg: it
    formats individual fields (dag_id, task_id, etc.), not %s on the TI,
    so the repr shortcut does not apply.
  • Kept DAG processor structlog-kwargs ti_id additions: those go through
    structlog's kwargs path, not repr.
  • Updated one test assertion in test_scheduler_job.py that hardcoded the
    exact TaskInstance repr string.
  • Update test_not_enough_pool_slots for new TaskInstance repr

After adding ti_id to TaskInstance.repr, test_not_enough_pool_slots
needs to include ti_id in the expected log substring. Same fix pattern
as test_process_executor_events_with_callback at line 695.

  • Fix test_not_enough_pool_slots ordering assumption on MySQL

dr.task_instances[0] can return can_run first on MySQL (alphabetical
default ordering) instead of cannot_run, so the expected ti_id used
in the "Not executing" assertion grabbed the wrong task's UUID and
the substring check failed on MySQL CI even though it passed on SQLite.

Look up the TI by task_id instead to make the assertion order-independent.
(cherry picked from commit 1a0efe7)

Co-authored-by: Kaxil Naik kaxilnaik@gmail.com

github-actions bot and others added 20 commits April 17, 2026 22:07
…#63826) (#64723)

* Load hook metadata from YAML without importing Hook class

* Add hook-name to all provider.yaml connection-types

* Add hook-name to connection types and regenerate get_provider_info.py

* Fix ruff import order in connections.py

* fix: import ProvidersManager at top level per review

* Fix provider connection hook display names

* Add iter_connection_type_hook_ui_metadata for connection UI hook metadata
(cherry picked from commit c4a209b)

Co-authored-by: Yuseok Jo <yuseok89@gmail.com>
…evant dirs (#64927) (#64930)

Replace two rglob calls with a single os.walk that prunes node_modules
and hidden directories (e.g. .git, .venv) in-place, avoiding unnecessary
traversal of large directory trees that never contain relevant .pyc files.
(cherry picked from commit 27258d5)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
#65016)

* Expose queueing time in the Gantt Chart

* Also expose scheduled_dttm in the Gantt Chart

* Simplify Gantt tooltip and ensure minimum bar visibility for short segments

* Null safety for dayjs calls and add tests for timing segments
(cherry picked from commit cd85164)

Co-authored-by: Saumyajit Chowdhury <77187489+smyjt@users.noreply.github.com>
…group (#65150) (#65160)

Bumps the github-actions-updates group with 1 update: [actions/github-script](https://github.com/actions/github-script).

Updates `actions/github-script` from 8.0.0 to 9.0.0
- [Release notes](https://github.com/actions/github-script/releases)
- [Commits](actions/github-script@ed59741...3a2844b)
(cherry picked from commit e5a047c)



---
updated-dependencies:
- dependency-name: actions/github-script
  dependency-version: 9.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions-updates
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…) (#65241)

* Add breeze generate issue content for airflow-ctl

* add new command to doc
(cherry picked from commit b24538b)

Co-authored-by: Justin Pakzad <114518232+justinpakzad@users.noreply.github.com>
…5118) (#65242)

* Move release calendar verification to its own scheduled workflow

Run dev/verify_release_calendar.py from a dedicated daily scheduled
workflow instead of as a canary job in the main CI pipeline, and
notify the #release-management Slack channel when the check fails so
the issue is surfaced to release managers directly.

* Include wiki and calendar links in release calendar Slack alert
(cherry picked from commit 048e9a1)
…#63994) (#65226)

* Add API check to ensure multi team is enabled when team_name is provided

* remove unnecessary arguments in added tests

* add variable tests and add slight change to other tests to align with variables test file

* Change error message, Modify tests, Add bulk tests, Fix CI issues
(cherry picked from commit 6271189)

Co-authored-by: ahilashsasidharan <79016853+ahilashsasidharan@users.noreply.github.com>
* Add dag runs filters (Consuming Asset)

* Fix: correct consuming asset filter setup using association_table

* Trigger CI rebuild

* Rename consuming_asset filter to consuming_asset_pattern with database icon

* Rename consuming_asset filter to consuming_asset_pattern with database icon

* Trigger CI rebuild

* Fix consuming_asset_pattern naming

* Fix: rename consuming_asset to consuming_asset_pattern

* Fix: rename consuming_asset to consuming_asset_pattern

* Fix: Resolve PostgreSQL JSON comparison error in _ConsumingAssetFilter

* Rebase and fix _ConsumingAssetFilter

* Trigger CI

* add consumingAsset and filters.searchAsset to en/common.json

---------



(cherry picked from commit 5245419)

Co-authored-by: fat-catTW <124506982+fat-catTW@users.noreply.github.com>
Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
…tance (#63923) (#65304)

After clearing a task instance, the TaskInstances list page was not
refreshing to show the updated state. This was because `useClearTaskInstances`
was missing `[useTaskInstanceServiceGetTaskInstancesKey]` in the list of
query keys to invalidate on success.

Both `useClearRun` and `usePatchTaskInstance` correctly invalidate this
query — this change brings `useClearTaskInstances` in line with them.

Fixes: #60703
(cherry picked from commit f47038e)

Co-authored-by: nagasrisai <59650078+nagasrisai@users.noreply.github.com>
The sphinx_airflow_theme default navbar_links includes Registry,
but the docs override navbar_links in get_html_theme_options(),
so the theme default never applies.
(cherry picked from commit d988f75)

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
…e_fileloc + bundle (#65329) (#65343)

The public Import Errors API used to match ParseImportError.filename
against DagModel.fileloc. In real deployments ``fileloc`` is an
absolute path while ``filename`` is relative, so the file-to-DAG
resolution often came back empty and the single endpoint fell through
to returning the raw error. The list endpoint had a related gap: its
CTE was pre-filtered by the caller-visible subset of DAGs, so the
per-file authorization check only ever saw the DAGs the caller could
already read -- a file containing a mix of readable and unreadable
DAGs passed the check on the readable subset alone.

* The single endpoint now matches ParseImportError.filename against
  DagModel.relative_fileloc + DagModel.bundle_name, which is the same
  key the list endpoint already uses for its join. When the resolved
  DAG set is empty (parse failed before any DAG was defined, or the
  name keys did not resolve), the stacktrace is now redacted rather
  than returned verbatim.

* The list endpoint splits the previous ``visible_files_cte`` into
  two CTEs: ``readable_files_cte`` enumerates the ``(relative_fileloc,
  bundle_name)`` pairs where the caller can read at least one DAG,
  and ``file_dags_cte`` enumerates the full ``(relative_fileloc,
  dag_id, bundle_name)`` set for those files. The per-file
  authorization check in the groupby loop now receives the complete
  DAG set for each file and can correctly detect co-located DAGs
  outside the caller's scope.

* The same fall-through in the list endpoint (file has no matching
  DAGs in DagModel) now redacts the stacktrace before appending.

Add a test class that exercises the fix with distinct ``fileloc``
(absolute) and ``relative_fileloc`` (relative) string values, closing
the test-fixture gap where both columns previously held the same
relative string and the absolute-vs-relative mismatch could not
manifest. One existing single-endpoint test documenting the previous
fall-through behaviour is updated to assert the new redaction.
(cherry picked from commit eba9b65)


Generated-by: Claude Opus 4.6 (1M context) following the guidelines at
https: //github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
…se log folder (#65325) (#65345)

* Refuse to follow log symlinks that resolve outside the base log folder

FileTaskHandler._read_from_local used to open every file that matched
the task's log glob pattern, including symlinks whose real path was
outside the configured base_log_folder. On deployments where worker
logs are accessible from the api-server, that meant the log viewer
could end up streaming content from files outside the configured log
tree whenever a symlink in the task log directory happened to match
the glob pattern.

Canonicalise self.local_base once via os.path.realpath and, for every
glob hit, resolve the path with os.path.realpath and skip it if the
resolved form is not contained in the canonicalised base log folder
(using os.path.commonpath, with a ValueError fallback for the
different-drive case on Windows). Open the resolved path rather than
the original glob hit so the file we open is the one we just
validated. Append to sources only after a successful open so sources
and log_streams stay aligned.

Drop the @staticmethod decorator so the method can read
self.local_base; existing call sites already invoke it via self.

Add a test class covering: regular-file-inside-base is still streamed;
a symlink whose real path is outside base_log_folder is skipped; a
symlink that stays inside base_log_folder is followed (legitimate
rotation case); and base_log_folder itself being a symlink works.

Generated-by: Claude Opus 4.6 (1M context) following the guidelines at
https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions

* Fix test__read_from_local to use valid base_log_folder

The existing test passed an empty string as base_log_folder, which
after the containment check resolves to CWD via os.path.realpath(""),
causing all files under tmp_path to be rejected. Use tmp_path instead.
(cherry picked from commit 3eda845)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…65348) (#65363)

JWTRefreshMiddleware derived the cookie Secure flag from the local
api.ssl_cert config only. Deployments with TLS terminated at a
reverse proxy (no local SSL cert on the Airflow process) therefore
received the JWT refresh cookie without the Secure flag.

Match the pattern already used by every other cookie-setting
location in the codebase (auth.py, simple/routes/login.py, FAB and
Keycloak login routes): treat secure as True when either the
request came in over HTTPS or a local ssl_cert is configured.
(cherry picked from commit 60db83f)


Generated-by: Claude Opus 4.6 (1M context) following the guidelines at
https: //github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
…_id_pattern (#65309)

* [v3-2-test] Bump actions/github-script in the github-actions-updates group (#65150) (#65160)

Bumps the github-actions-updates group with 1 update: [actions/github-script](https://github.com/actions/github-script).

Updates `actions/github-script` from 8.0.0 to 9.0.0
- [Release notes](https://github.com/actions/github-script/releases)
- [Commits](actions/github-script@ed59741...3a2844b)
(cherry picked from commit e5a047c)



---
updated-dependencies:
- dependency-name: actions/github-script
  dependency-version: 9.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions-updates
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [v3-2-test] Added breeze generate issue content for airflow-ctl (#65042) (#65241)

* Add breeze generate issue content for airflow-ctl

* add new command to doc
(cherry picked from commit b24538b)

Co-authored-by: Justin Pakzad <114518232+justinpakzad@users.noreply.github.com>

* [v3-2-test] Run release calendar verification on its own schedule (#65118) (#65242)

* Move release calendar verification to its own scheduled workflow

Run dev/verify_release_calendar.py from a dedicated daily scheduled
workflow instead of as a canary job in the main CI pipeline, and
notify the #release-management Slack channel when the check fails so
the issue is surfaced to release managers directly.

* Include wiki and calendar links in release calendar Slack alert
(cherry picked from commit 048e9a1)

* Fix: PATCH /dags pagination bug and document wildcard dag_id_pattern (#63665)

* fixed pagination bug and updated docstring to clarify dag_id_pattern wildcard usage

* removed batch loop to update all dags in one shot and added additional test case

* Fixed MySQL subquery issue

(cherry picked from commit 9504886)

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Co-authored-by: Justin Pakzad <114518232+justinpakzad@users.noreply.github.com>
…nection. (#65231) (#65368)

This was discovered by running with a custom External DB manager that had some
gnarly queries that ended up being locked behind this transaction.

`_single_connection_pool` replaces `settings.engine` with a
SingletonThreadPool engine. But `work_session` was created before that — it
still holds an internal reference to the old engine object.
_single_connection_pool has no way to rebind work_session.

So when _get_current_revision(session=work_session) runs on line 1203 — inside
the _single_connection_pool() block — it calls session.connection() which goes
through the old engine, not the SingletonThreadPool. The old engine's pool was
disposed and recreated empty by engine.dispose(), so it creates a brand new
connection. That connection is completely outside _single_connection_pool's
control.

_single_connection_pool guarantees one connection on the new engine. It can't
prevent work_session from creating connections on the old one. The name is a
bit of a lie — it's really "single connection pool for new code that uses
settings.engine", not "single connection total."
(cherry picked from commit e3fea3a)
(cherry picked from commit f8e0876)

Co-authored-by: Ash Berlin-Taylor <ash@apache.org>
…5167) (#65321)

* [v3-2-test] Bump actions/github-script in the github-actions-updates group (#65150) (#65160)

Bumps the github-actions-updates group with 1 update: [actions/github-script](https://github.com/actions/github-script).

Updates `actions/github-script` from 8.0.0 to 9.0.0
- [Release notes](https://github.com/actions/github-script/releases)
- [Commits](actions/github-script@ed59741...3a2844b)
(cherry picked from commit e5a047c)



---
updated-dependencies:
- dependency-name: actions/github-script
  dependency-version: 9.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions-updates
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [v3-2-test] Added breeze generate issue content for airflow-ctl (#65042) (#65241)

* Add breeze generate issue content for airflow-ctl

* add new command to doc
(cherry picked from commit b24538b)

Co-authored-by: Justin Pakzad <114518232+justinpakzad@users.noreply.github.com>

* [v3-2-test] Run release calendar verification on its own schedule (#65118) (#65242)

* Move release calendar verification to its own scheduled workflow

Run dev/verify_release_calendar.py from a dedicated daily scheduled
workflow instead of as a canary job in the main CI pipeline, and
notify the #release-management Slack channel when the check fails so
the issue is surfaced to release managers directly.

* Include wiki and calendar links in release calendar Slack alert
(cherry picked from commit 048e9a1)

* [v3-2-test] fix(ui): register trigger and sensor graph node types (#65167)

* fix(ui): register trigger and sensor graph node types

Adds missing Graph node type mappings for trigger/sensor and includes a focused unit test to prevent regressions where dependency graph rendering breaks for those node kinds.

* docs(ui): add graph screenshot showing sensor and trigger nodes

* chore(ui): keep PR scoped to graphTypes.ts only

---------
(cherry picked from commit e0ed795)

Co-authored-by: Windro.xd <88357206+windro-xdd@users.noreply.github.com>
Co-authored-by: Kripa Dev <dev@kripa-car-care.local>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Co-authored-by: Justin Pakzad <114518232+justinpakzad@users.noreply.github.com>
Co-authored-by: Windro.xd <88357206+windro-xdd@users.noreply.github.com>
Co-authored-by: Kripa Dev <dev@kripa-car-care.local>
…#65326) (#65334)

Mypy checks for non-provider projects now synchronize the local
virtualenv with uv.lock (uv sync --frozen) before running, so contributors
see the same dependency set CI uses and avoid results that drift from CI.

The update-uv-lock prek hook now runs with --frozen, so pyproject.toml
changes that would touch uv.lock fail the hook and require an explicit
uv lock + commit instead of silently rewriting the lock during a commit.
(cherry picked from commit 9b08d05)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
…64863) (#65473)

* CI: Avoid false recovery alerts when failed job lookup fails

* Potential fix for pull request finding



---------
(cherry picked from commit b41b11d)

Co-authored-by: Henry Chen <henryhenry0512@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…nd worker logs (#65458)

Support engineers could not reconstruct a task's full lifecycle from logs
because only the Execution API emitted the TaskInstance UUID consistently.
Adding ti_id to log lines across the other components makes 'grep ti_id=X'
surface every log touching that task, from scheduling through completion.

- Worker: bind ti_id to structlog context at startup(). Fresh process per
  TI means no cross-task leak risk.
- Triggerer: extend existing bind_log_contextvars at trigger start. The
  asyncio.create_task context copy scopes the binding per coroutine.
- Scheduler: add ti_id=%s to eight TI-touching log calls across
  _enqueue_task_instances_with_queued_state, process_executor_events,
  and _maybe_requeue_stuck_ti. Explicit positional args avoid the
  contextvar leak a bind+unbind pattern would introduce on exception paths.
- DAG processor: add ti_id to callback-processing log lines in
  _execute_callbacks and _execute_task_callbacks.

* Move ti_id into TaskInstance.__repr__; revert redundant log-line additions

Addresses review feedback from @jedcunningham on #65458: instead of
sprinkling ti_id=%s onto individual scheduler log lines, put the UUID in
TaskInstance.__repr__ once and let every %s-formatted TI log line inherit
it for free. Strictly better: covers log lines this PR didn't touch and
lines added by future PRs without further plumbing.

Net diff vs main goes from +61/-16 to +51/-14.

Changes:
- TaskInstance.__repr__ now appends `ti_id={self.id}` before the closing
  bracket (matches the existing TaskInstanceNote repr precedent).
- Reverted 10 log-line ti_id additions in scheduler_job_runner.py where
  the existing `%s` format arg was a TaskInstance; repr now supplies ti_id.
- Kept the explicit `ti_id=%s` in the "TaskInstance Finished" msg: it
  formats individual fields (dag_id, task_id, etc.), not %s on the TI,
  so the repr shortcut does not apply.
- Kept DAG processor structlog-kwargs ti_id additions: those go through
  structlog's kwargs path, not __repr__.
- Updated one test assertion in test_scheduler_job.py that hardcoded the
  exact TaskInstance repr string.

* Update test_not_enough_pool_slots for new TaskInstance repr

After adding ti_id to TaskInstance.__repr__, test_not_enough_pool_slots
needs to include ti_id in the expected log substring. Same fix pattern
as test_process_executor_events_with_callback at line 695.

* Fix test_not_enough_pool_slots ordering assumption on MySQL

dr.task_instances[0] can return can_run first on MySQL (alphabetical
default ordering) instead of cannot_run, so the expected ti_id used
in the "Not executing" assertion grabbed the wrong task's UUID and
the substring check failed on MySQL CI even though it passed on SQLite.

Look up the TI by task_id instead to make the assertion order-independent.
(cherry picked from commit 1a0efe7)

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants