feat(search-optimization): cache tool catalog and parallelize per-account MCP fetches#173
feat(search-optimization): cache tool catalog and parallelize per-account MCP fetches#173shashi-stackone merged 4 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Improves tool_search and related paths by memoizing the fetched tool catalog, parallelizing per-account MCP catalog fetches, and caching the local-search ToolIndex to avoid repeated rebuilds.
Changes:
- Added an instance
_catalog_cacheforfetch_tools()results with explicit invalidation viaclear_catalog_cache()and automatic invalidation onset_accounts(). - Parallelized multi-account MCP catalog fetches using
ThreadPoolExecutor(single-account path remains sequential). - Cached the local-search
ToolIndexper tool catalog identity; updated/added tests covering cache behavior and parallel fetch semantics.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| stackone_ai/toolset.py | Adds catalog + ToolIndex memoization, parallel per-account MCP fetch, and cache invalidation hooks. |
| tests/test_fetch_tools.py | Adds tests for catalog cache keys/invalidation, parallel fetch behavior, and ToolIndex reuse/rebuild. |
| tests/test_agent_tools.py | Updates expectations to reflect removal of _ExecuteTool’s local cache and delegation to toolset caching. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| futures = [pool.submit(_fetch_for_account, acc) for acc in account_scope] | ||
| for future in concurrent.futures.as_completed(futures): | ||
| all_tools.extend(future.result()) |
There was a problem hiding this comment.
The parallel fetch aggregates results via as_completed, which makes all_tools order nondeterministic across runs. Since Tools builds _tool_map = {tool.name: tool for tool in tools}, any duplicate tool names across accounts will be resolved by “last one wins” in completion order, potentially changing which account’s tool get_tool()/tool_execute selects compared to the previous deterministic per-account loop. Consider preserving a deterministic account ordering while still fetching in parallel (e.g., use executor.map over an ordered account list, or collect results keyed by account and extend in that order; ideally use the same ordering as the cache key).
| futures = [pool.submit(_fetch_for_account, acc) for acc in account_scope] | |
| for future in concurrent.futures.as_completed(futures): | |
| all_tools.extend(future.result()) | |
| for account_tools in pool.map(_fetch_for_account, account_scope): | |
| all_tools.extend(account_tools) |
| cache_key = ( | ||
| tuple(sorted(account_scope, key=lambda a: (a is None, a))), | ||
| tuple(sorted(providers)) if providers else None, |
There was a problem hiding this comment.
providers filtering is documented/implemented as case-insensitive (_filter_by_provider lowercases providers), but the catalog cache_key uses the raw providers values. Calls like fetch_tools(providers=["Hibob"]) vs fetch_tools(providers=["hibob"]) will create separate cache entries even though they return identical results. Normalize providers (e.g., lowercase + sort) when building the cache key to avoid redundant cache misses and growth.
| cache_key = ( | |
| tuple(sorted(account_scope, key=lambda a: (a is None, a))), | |
| tuple(sorted(providers)) if providers else None, | |
| normalized_providers = ( | |
| tuple(sorted(dict.fromkeys(provider.lower() for provider in providers))) | |
| if providers | |
| else None | |
| ) | |
| cache_key = ( | |
| tuple(sorted(account_scope, key=lambda a: (a is None, a))), | |
| normalized_providers, |
| cached = self._catalog_cache.get(cache_key) | ||
| if cached is not None: | ||
| return cached |
There was a problem hiding this comment.
On cache hits this returns the same cached Tools instance. Tools and StackOneTool are mutable (e.g., Tools.set_account_id() mutates the contained tools), so a consumer mutating the returned object will implicitly mutate the toolset’s cache and affect subsequent fetch_tools()/search_tools() calls. If you want memoization to be internal-only/no behavioral change, consider returning a defensive copy (or caching an immutable representation and constructing new tool objects per call) so external mutations can’t leak back into the cache.
There was a problem hiding this comment.
cubic analysis
2 issues found across 3 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="stackone_ai/toolset.py">
<violation number="1" location="stackone_ai/toolset.py:1188">
P2: Cache key includes `providers` without case normalization, but provider filtering is case-insensitive. Calls like `fetch_tools(providers=['HiBob'])` vs `fetch_tools(providers=['hibob'])` will miss the cache despite producing identical results.</violation>
<violation number="2" location="stackone_ai/toolset.py:1209">
P2: `as_completed` yields futures in non-deterministic finish order, so the tool list order now varies between runs for multi-account fetches. Iterating `futures` directly (instead of `as_completed`) still parallelizes the HTTP calls but collects results in the original account order.</violation>
</file>
Linked issue analysis
Linked issue: ENG-12639: Investigate Performance of Search on SDK
| Status | Acceptance criteria | Notes |
|---|---|---|
| ✅ | Add a shared catalog cache so search_tools/tool_execute/openai/langchain/search_action_names reuse fetch_tools results | Added _catalog_cache and fetch_tools returns cached Tools |
| ✅ | Parallelize per-account MCP fetches (fan-out) capped at 10 workers; preserve single-account path | Replaced loop with ThreadPoolExecutor, single-account branch preserved |
| ✅ | Memoize ToolIndex on toolset instance and rebuild only when underlying catalog changes | Added _tool_index_cache and reuses ToolIndex when id(all_tools) unchanged |
| ✅ | Provide a new public method clear_catalog_cache() for explicit invalidation | clear_catalog_cache() was added to invalidate caches |
| ✅ | Ensure set_accounts() calls clear_catalog_cache() automatically | set_accounts now calls self.clear_catalog_cache() |
| ✅ | Drop redundant _ExecuteTool._cached_tools and delegate to toolset cache | Removed _cached_tools and _ExecuteTool.execute calls toolset.fetch_tools() |
| ✅ | Cache key must include sorted account_scope (order-invariant), providers, and actions | Cache key built from sorted account_scope, sorted providers/actions tuples |
| ✅ | Include tests verifying catalog memoization, account-order invariance, provider/action key participation, clear_catalog_cache, and set_accounts invalidation | Comprehensive tests added to tests/test_fetch_tools.py |
| ✅ | Include tests verifying parallel per-account fetch behavior, bounding, and correctness | Parallel fetch tests check elapsed time and tool aggregation |
| ✅ | Preserve single-account fast path (no threading overhead) | Single-account branch calls _fetch_for_account directly without executor |
| ✅ | No breaking public API signature changes; internal memoization only | Existing method signatures unchanged; clear_catalog_cache added non-breaking |
| Achieve the expected latency improvements (warm: ~3s→~200ms; cold: N×→max+overhead) | Parallelism tests measure faster fetches but exact latency targets not validated |
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
0 issues found across 1 file (changes from recent commits).
Auto-approved: Performance optimization adding internal catalog and index caching and parallelized fetches. Safe as it uses standard patterns, adds no public API changes, and includes extensive tests.
982165d
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="examples/benchmark_search.py">
<violation number="1" location="examples/benchmark_search.py:61">
P2: Validate `--iterations` to require a positive integer; `0` or negative values currently crash at `times[0]` in `bench()`.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| parser = argparse.ArgumentParser(description="Benchmark SDK search latency") | ||
| parser.add_argument("--iterations", "-n", type=int, default=100, help="iterations per benchmark (default 100)") | ||
| args = parser.parse_args() | ||
| n = args.iterations |
There was a problem hiding this comment.
P2: Validate --iterations to require a positive integer; 0 or negative values currently crash at times[0] in bench().
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At examples/benchmark_search.py, line 61:
<comment>Validate `--iterations` to require a positive integer; `0` or negative values currently crash at `times[0]` in `bench()`.</comment>
<file context>
@@ -0,0 +1,131 @@
+ parser = argparse.ArgumentParser(description="Benchmark SDK search latency")
+ parser.add_argument("--iterations", "-n", type=int, default=100, help="iterations per benchmark (default 100)")
+ args = parser.parse_args()
+ n = args.iterations
+
+ api_key = os.getenv("STACKONE_API_KEY")
</file context>
| def next_query() -> str: | ||
| nonlocal query_idx | ||
| q = QUERIES[query_idx % len(QUERIES)] | ||
| query_idx += 1 | ||
| return q |
There was a problem hiding this comment.
feel like we don't need a global variable for a simple iteration on a function??
There was a problem hiding this comment.
Agree, its trivial to fix but I think we can clean this up in Examples cleaup or Can do now?
Problem
tool_searchlatency in agent loops was ~3 seconds per call in setups with multiple linked accounts, whilea direct hit to
/actions/searchvia Postman returns in milliseconds. Investigation showed the SDK wasdoing the same catalog-fetch and index-build work on every single call:
search_tools()unconditionally calledfetch_tools()with no cache on the search path (theexisting
_tools_cacheonly covered the meta-tool execute path).fetch_tools()opened a fresh MCP session per linked account sequentially in a demo with 7accounts, that's 7 serial round-trips before any search logic ran.
ToolIndex(BM25 + TF-IDF over ~662 tools) was reconstructed on every local-search fallback callno caching.
Fix
Internal memoization only, no public API signature changes, no behavioral changes in the happy path.
StackOneToolSet._catalog_cachememoizesfetch_tools()results keyed by(sorted(account_scope), providers, actions). Shared acrosssearch_tools,tool_execute,openai(),langchain(),search_action_names()they all benefit transparently.for account in account_scope:loop with aThreadPoolExecutorfan-out (capped at 10 workers), mirroring the existing semantic-search parallelismpattern in the same file (
toolset.py:920-927). Single-account path preserved as-is — no threading overheadfor the common case.
Toolsidentity; rebuilt only when theunderlying catalog changes.
clear_catalog_cache()for users rotating accounts mid-session.set_accounts()now callsclear_catalog_cache()automatically._ExecuteTool._cached_toolslocal cache now delegates to the shared toolsetcache.
Expected latency change
tool_search: ~3s → ~200ms (cache hit, roughly the latency of/actions/searchalone).tool_searchwith N accounts: drops from ~N × per-account latency to ~max(per-account) + smalloverhead.
fetch_tools/openai()/langchain()calls on the same toolset are near-instant.Summary by cubic
Cache the MCP tool catalog and local search index, and fetch per-account catalogs in parallel to cut
tool_searchlatency in multi-account setups. Addresses ENG-12639.Performance
StackOneToolSet._catalog_cachekeyed by accounts/providers/actions (accounts order-insensitive; providers lowercased); used bysearch_tools,tool_execute,openai(),langchain(),search_action_names().ToolIndexis cached by tools identity and rebuilt only when the catalog changes; invalidated viaclear_catalog_cache().clear_catalog_cache();set_accounts()now clears the cache and local index._ExecuteToolalways defers totoolset.fetch_toolsto use the shared cache.tool_search~3s → ~200ms; cold calls across N accounts now bounded by the slowest account, not N.New Features
examples/benchmark_search.pyto measure cold vs warm latency forfetch_tools, local search, and semantic search.Written for commit 1c834b1. Summary will update on new commits.