Skip to content

Conversation

@SteveSandersonMS
Copy link
Contributor

@SteveSandersonMS SteveSandersonMS commented Jan 21, 2026

Fixes the skills E2E tests to share snapshots and unskips the ones of them that work.

Previously we were seeing some very strange behavior whereby tests would pass or fail depending on which order they run in, and I've tracked it down to a pretty solid belief there's a bug in the underlying "resume with skills" feature. Perhaps two bugs.

  1. The "should apply skill on session resume with skillDirectories" will fail if it runs on its own, but passes if it's in the same run as one of the other tests

    • Repro: unskip that test, make sure there's no snapshot for it on disk, and run it without running any other tests at the same time. It will fail because it doesn't load the skill.
    • It will pass if and only if it runs in the same run as another skills test (which means it shares a Client instance). So there must be some kind of state leakage. Either there's a bug in the CLI, or the same bug in all four language SDKs.
  2. If you do unskip "should apply skill on session resume with skillDirectories" and run it in the same run before "should not apply skill when disabled via disabledSkills", then the disabledSkills test will fail.

    • Again, it strongly appears to be some cross-session state leakage or a bug in all four SDKs.

There's no .NET-specific bug that I'm aware of. The only issues with that were the ones identified yesterday (the inability for snapshots to be replayed because of mismatches in paths and line endings, both of which are fixed now).

@SteveSandersonMS SteveSandersonMS requested a review from a team as a code owner January 21, 2026 13:39
Copilot AI review requested due to automatic review settings January 21, 2026 13:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the skills E2E tests to enable snapshot sharing across test runs and unskips the tests that work correctly. The changes address flakiness issues caused by unique directory names per test run, which prevented snapshot reuse.

Changes:

  • Added cleanup logic to ensure each test starts with a fresh skills directory
  • Standardized skill directory creation to use a single .test_skills directory instead of unique per-run directories
  • Fixed line ending issues in Python and .NET for cross-platform compatibility
  • Unskipped two working tests: "load and apply skill from skillDirectories" and "not apply skill when disabled via disabledSkills"
  • Added comprehensive comments explaining why the session resume test remains skipped

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/snapshots/skills/should_not_apply_skill_when_disabled_via_disabledskills.yaml Added snapshot showing skill disabled behavior (no marker in response)
test/snapshots/skills/should_load_and_apply_skill_from_skilldirectories.yaml Added snapshot showing successful skill loading and application
python/e2e/test_skills.py Added cleanup fixture, standardized directory names, fixed line endings, unskipped working tests
nodejs/test/e2e/skills.test.ts Added beforeEach cleanup, standardized directory names, unskipped suite with detailed skip comment for problematic test
go/e2e/skills_test.go Added cleanup function, standardized directory names, renamed test function, unskipped working tests
dotnet/test/SkillsTests.cs Added constructor cleanup, standardized directory names, fixed line endings, unskipped working tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@SteveSandersonMS SteveSandersonMS added this pull request to the merge queue Jan 21, 2026
Merged via the queue into main with commit 5731c68 Jan 21, 2026
22 checks passed
@SteveSandersonMS SteveSandersonMS deleted the stevesa/e2e-skills-tests branch January 21, 2026 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants