[SearchQADataLoader] train=167 val=5 test=5 (from /project/SkillOpt-main/data/searchqa_text)
[model config] backend=azure_openai optimizer=Qwen/Qwen3.5-27B (openai_chat) target=Qwen/Qwen3.5-27B (openai_chat) reasoning=medium
[initial skill] /project/SkillOpt-main/skillopt/envs/searchqa/skills/initial.md (481 chars)
[config] epochs=4 steps/epoch=5 (auto) accum=1 batch_size=40
[config] train_size=167
[config] batches/epoch=5 total_steps=20 games/epoch=167
[config] lr_scheduler=cosine edit_budget=4 min_edit_budget=2
[config] skill_update_mode=patch lr_control_mode=fixed rewrite_reasoning_effort=off rewrite_max_completion_tokens=64000 max_analyst_rounds=3
[config] longitudinal_pair_policy=mixed
[config] base_seeds=[43, 44, 45, 46, 47]
============================================================
BASELINE — evaluate initial skill on Selection set (valid_seen)
Selection items: 5
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
[rollout] 1/5 (acc=0.000) id=val_item_005 hard=0
[rollout] 2/5 (acc=0.000) id=val_item_001 hard=0
[rollout] 3/5 (acc=0.000) id=val_item_002 hard=0
[rollout] 4/5 (acc=0.000) id=val_item_004 hard=0
[rollout] 5/5 (acc=0.000) id=val_item_003 hard=0
[baseline result] selection hard=0.0000 soft=0.0000
[EPOCH 1/4] shuffled_seeds=[1043, 1044, 1045, 1046, 1047]
[STEP 1/20] epoch=1 step_in_epoch=0 ==============================
[1/6 ROLLOUT] train items=40 (from pool, batch_seed=1043)
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
[SearchQADataLoader] train=167 val=5 test=5 (from /project/SkillOpt-main/data/searchqa_text)
[model config] backend=azure_openai optimizer=Qwen/Qwen3.5-27B (openai_chat) target=Qwen/Qwen3.5-27B (openai_chat) reasoning=medium
[initial skill] /project/SkillOpt-main/skillopt/envs/searchqa/skills/initial.md (481 chars)
[config] epochs=4 steps/epoch=5 (auto) accum=1 batch_size=40
[config] train_size=167
[config] batches/epoch=5 total_steps=20 games/epoch=167
[config] lr_scheduler=cosine edit_budget=4 min_edit_budget=2
[config] skill_update_mode=patch lr_control_mode=fixed rewrite_reasoning_effort=off rewrite_max_completion_tokens=64000 max_analyst_rounds=3
[config] longitudinal_pair_policy=mixed
[config] base_seeds=[43, 44, 45, 46, 47]
============================================================
BASELINE — evaluate initial skill on Selection set (valid_seen)
Selection items: 5
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
[rollout] 1/5 (acc=0.000) id=val_item_005 hard=0
[rollout] 2/5 (acc=0.000) id=val_item_001 hard=0
[rollout] 3/5 (acc=0.000) id=val_item_002 hard=0
[rollout] 4/5 (acc=0.000) id=val_item_004 hard=0
[rollout] 5/5 (acc=0.000) id=val_item_003 hard=0
[baseline result] selection hard=0.0000 soft=0.0000
[EPOCH 1/4] shuffled_seeds=[1043, 1044, 1045, 1046, 1047]
[STEP 1/20] epoch=1 step_in_epoch=0 ==============================
[1/6 ROLLOUT] train items=40 (from pool, batch_seed=1043)
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path
AzureCliCredential.get_token_info failed: Azure CLI not found on path