Skip to content

feat(search-optimization): cache tool catalog and parallelize per-account MCP fetches#173

Merged
shashi-stackone merged 4 commits intomainfrom
ENG-12639
Apr 14, 2026
Merged

feat(search-optimization): cache tool catalog and parallelize per-account MCP fetches#173
shashi-stackone merged 4 commits intomainfrom
ENG-12639

Conversation

@shashi-stackone
Copy link
Copy Markdown
Contributor

@shashi-stackone shashi-stackone commented Apr 13, 2026

Problem

tool_search latency in agent loops was ~3 seconds per call in setups with multiple linked accounts, while
a direct hit to /actions/search via Postman returns in milliseconds. Investigation showed the SDK was
doing the same catalog-fetch and index-build work on every single call:

  1. search_tools() unconditionally called fetch_tools() with no cache on the search path (the
    existing _tools_cache only covered the meta-tool execute path).
  2. fetch_tools() opened a fresh MCP session per linked account sequentially in a demo with 7
    accounts, that's 7 serial round-trips before any search logic ran.
  3. ToolIndex (BM25 + TF-IDF over ~662 tools) was reconstructed on every local-search fallback call
    no caching.

Fix

Internal memoization only, no public API signature changes, no behavioral changes in the happy path.

  • Catalog cache: StackOneToolSet._catalog_cache memoizes fetch_tools() results keyed by
    (sorted(account_scope), providers, actions). Shared across search_tools, tool_execute, openai(),
    langchain(), search_action_names() they all benefit transparently.
  • Parallel per-account fetch: replaced the sequential for account in account_scope: loop with a
    ThreadPoolExecutor fan-out (capped at 10 workers), mirroring the existing semantic-search parallelism
    pattern in the same file (toolset.py:920-927). Single-account path preserved as-is — no threading overhead
    for the common case.
  • ToolIndex cache: memoized on the toolset instance by Tools identity; rebuilt only when the
    underlying catalog changes.
  • New public method: clear_catalog_cache() for users rotating accounts mid-session.
  • Invalidation: set_accounts() now calls clear_catalog_cache() automatically.
  • Dropped redundant _ExecuteTool._cached_tools local cache now delegates to the shared toolset
    cache.

Expected latency change

  • Warm tool_search: ~3s → ~200ms (cache hit, roughly the latency of /actions/search alone).
  • Cold tool_search with N accounts: drops from ~N × per-account latency to ~max(per-account) + small
    overhead.
  • Subsequent fetch_tools / openai() / langchain() calls on the same toolset are near-instant.

Summary by cubic

Cache the MCP tool catalog and local search index, and fetch per-account catalogs in parallel to cut tool_search latency in multi-account setups. Addresses ENG-12639.

  • Performance

    • Catalog cache in StackOneToolSet._catalog_cache keyed by accounts/providers/actions (accounts order-insensitive; providers lowercased); used by search_tools, tool_execute, openai(), langchain(), search_action_names().
    • Per-account MCP fetches run in parallel (up to 10 workers); single-account path stays non-threaded.
    • Local ToolIndex is cached by tools identity and rebuilt only when the catalog changes; invalidated via clear_catalog_cache().
    • Added clear_catalog_cache(); set_accounts() now clears the cache and local index. _ExecuteTool always defers to toolset.fetch_tools to use the shared cache.
    • Latency: warm tool_search ~3s → ~200ms; cold calls across N accounts now bounded by the slowest account, not N.
  • New Features

    • Added examples/benchmark_search.py to measure cold vs warm latency for fetch_tools, local search, and semantic search.

Written for commit 1c834b1. Summary will update on new commits.

Copilot AI review requested due to automatic review settings April 13, 2026 20:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves tool_search and related paths by memoizing the fetched tool catalog, parallelizing per-account MCP catalog fetches, and caching the local-search ToolIndex to avoid repeated rebuilds.

Changes:

  • Added an instance _catalog_cache for fetch_tools() results with explicit invalidation via clear_catalog_cache() and automatic invalidation on set_accounts().
  • Parallelized multi-account MCP catalog fetches using ThreadPoolExecutor (single-account path remains sequential).
  • Cached the local-search ToolIndex per tool catalog identity; updated/added tests covering cache behavior and parallel fetch semantics.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
stackone_ai/toolset.py Adds catalog + ToolIndex memoization, parallel per-account MCP fetch, and cache invalidation hooks.
tests/test_fetch_tools.py Adds tests for catalog cache keys/invalidation, parallel fetch behavior, and ToolIndex reuse/rebuild.
tests/test_agent_tools.py Updates expectations to reflect removal of _ExecuteTool’s local cache and delegation to toolset caching.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread stackone_ai/toolset.py
Comment on lines +1208 to +1210
futures = [pool.submit(_fetch_for_account, acc) for acc in account_scope]
for future in concurrent.futures.as_completed(futures):
all_tools.extend(future.result())
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parallel fetch aggregates results via as_completed, which makes all_tools order nondeterministic across runs. Since Tools builds _tool_map = {tool.name: tool for tool in tools}, any duplicate tool names across accounts will be resolved by “last one wins” in completion order, potentially changing which account’s tool get_tool()/tool_execute selects compared to the previous deterministic per-account loop. Consider preserving a deterministic account ordering while still fetching in parallel (e.g., use executor.map over an ordered account list, or collect results keyed by account and extend in that order; ideally use the same ordering as the cache key).

Suggested change
futures = [pool.submit(_fetch_for_account, acc) for acc in account_scope]
for future in concurrent.futures.as_completed(futures):
all_tools.extend(future.result())
for account_tools in pool.map(_fetch_for_account, account_scope):
all_tools.extend(account_tools)

Copilot uses AI. Check for mistakes.
Comment thread stackone_ai/toolset.py Outdated
Comment on lines +1186 to +1188
cache_key = (
tuple(sorted(account_scope, key=lambda a: (a is None, a))),
tuple(sorted(providers)) if providers else None,
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

providers filtering is documented/implemented as case-insensitive (_filter_by_provider lowercases providers), but the catalog cache_key uses the raw providers values. Calls like fetch_tools(providers=["Hibob"]) vs fetch_tools(providers=["hibob"]) will create separate cache entries even though they return identical results. Normalize providers (e.g., lowercase + sort) when building the cache key to avoid redundant cache misses and growth.

Suggested change
cache_key = (
tuple(sorted(account_scope, key=lambda a: (a is None, a))),
tuple(sorted(providers)) if providers else None,
normalized_providers = (
tuple(sorted(dict.fromkeys(provider.lower() for provider in providers)))
if providers
else None
)
cache_key = (
tuple(sorted(account_scope, key=lambda a: (a is None, a))),
normalized_providers,

Copilot uses AI. Check for mistakes.
Comment thread stackone_ai/toolset.py
Comment on lines +1191 to +1193
cached = self._catalog_cache.get(cache_key)
if cached is not None:
return cached
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On cache hits this returns the same cached Tools instance. Tools and StackOneTool are mutable (e.g., Tools.set_account_id() mutates the contained tools), so a consumer mutating the returned object will implicitly mutate the toolset’s cache and affect subsequent fetch_tools()/search_tools() calls. If you want memoization to be internal-only/no behavioral change, consider returning a defensive copy (or caching an immutable representation and constructing new tool objects per call) so external mutations can’t leak back into the cache.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic analysis

2 issues found across 3 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="stackone_ai/toolset.py">

<violation number="1" location="stackone_ai/toolset.py:1188">
P2: Cache key includes `providers` without case normalization, but provider filtering is case-insensitive. Calls like `fetch_tools(providers=['HiBob'])` vs `fetch_tools(providers=['hibob'])` will miss the cache despite producing identical results.</violation>

<violation number="2" location="stackone_ai/toolset.py:1209">
P2: `as_completed` yields futures in non-deterministic finish order, so the tool list order now varies between runs for multi-account fetches. Iterating `futures` directly (instead of `as_completed`) still parallelizes the HTTP calls but collects results in the original account order.</violation>
</file>

Linked issue analysis

Linked issue: ENG-12639: Investigate Performance of Search on SDK

Status Acceptance criteria Notes
Add a shared catalog cache so search_tools/tool_execute/openai/langchain/search_action_names reuse fetch_tools results Added _catalog_cache and fetch_tools returns cached Tools
Parallelize per-account MCP fetches (fan-out) capped at 10 workers; preserve single-account path Replaced loop with ThreadPoolExecutor, single-account branch preserved
Memoize ToolIndex on toolset instance and rebuild only when underlying catalog changes Added _tool_index_cache and reuses ToolIndex when id(all_tools) unchanged
Provide a new public method clear_catalog_cache() for explicit invalidation clear_catalog_cache() was added to invalidate caches
Ensure set_accounts() calls clear_catalog_cache() automatically set_accounts now calls self.clear_catalog_cache()
Drop redundant _ExecuteTool._cached_tools and delegate to toolset cache Removed _cached_tools and _ExecuteTool.execute calls toolset.fetch_tools()
Cache key must include sorted account_scope (order-invariant), providers, and actions Cache key built from sorted account_scope, sorted providers/actions tuples
Include tests verifying catalog memoization, account-order invariance, provider/action key participation, clear_catalog_cache, and set_accounts invalidation Comprehensive tests added to tests/test_fetch_tools.py
Include tests verifying parallel per-account fetch behavior, bounding, and correctness Parallel fetch tests check elapsed time and tool aggregation
Preserve single-account fast path (no threading overhead) Single-account branch calls _fetch_for_account directly without executor
No breaking public API signature changes; internal memoization only Existing method signatures unchanged; clear_catalog_cache added non-breaking
⚠️ Achieve the expected latency improvements (warm: ~3s→~200ms; cold: N×→max+overhead) Parallelism tests measure faster fetches but exact latency targets not validated

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread stackone_ai/toolset.py Outdated
Comment thread stackone_ai/toolset.py Outdated
cubic-dev-ai[bot]
cubic-dev-ai bot previously approved these changes Apr 14, 2026
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Auto-approved: Performance optimization adding internal catalog and index caching and parallelized fetches. Safe as it uses standard patterns, adds no public API changes, and includes extensive tests.

glebedel
glebedel previously approved these changes Apr 14, 2026
Copy link
Copy Markdown
Contributor

@glebedel glebedel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="examples/benchmark_search.py">

<violation number="1" location="examples/benchmark_search.py:61">
P2: Validate `--iterations` to require a positive integer; `0` or negative values currently crash at `times[0]` in `bench()`.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

parser = argparse.ArgumentParser(description="Benchmark SDK search latency")
parser.add_argument("--iterations", "-n", type=int, default=100, help="iterations per benchmark (default 100)")
args = parser.parse_args()
n = args.iterations
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Validate --iterations to require a positive integer; 0 or negative values currently crash at times[0] in bench().

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At examples/benchmark_search.py, line 61:

<comment>Validate `--iterations` to require a positive integer; `0` or negative values currently crash at `times[0]` in `bench()`.</comment>

<file context>
@@ -0,0 +1,131 @@
+    parser = argparse.ArgumentParser(description="Benchmark SDK search latency")
+    parser.add_argument("--iterations", "-n", type=int, default=100, help="iterations per benchmark (default 100)")
+    args = parser.parse_args()
+    n = args.iterations
+
+    api_key = os.getenv("STACKONE_API_KEY")
</file context>
Fix with Cubic

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Requires human review: Auto-approval blocked by 1 unresolved issue from previous reviews.

Comment on lines +87 to +91
def next_query() -> str:
nonlocal query_idx
q = QUERIES[query_idx % len(QUERIES)]
query_idx += 1
return q
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel like we don't need a global variable for a simple iteration on a function??

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, its trivial to fix but I think we can clean this up in Examples cleaup or Can do now?

Copy link
Copy Markdown

@willleeney willleeney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@shashi-stackone shashi-stackone merged commit cd635e6 into main Apr 14, 2026
16 checks passed
@shashi-stackone shashi-stackone deleted the ENG-12639 branch April 14, 2026 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants