eval

Retrieval evaluation harness

yarn eval --collection <uuid> runs the hybrid retriever against a fixture of golden queries and prints precision@k / recall@k / MRR.

Fixture format

eval/fixtures.json is a JSON array of rows:

[
  {
    "query": "What is the SLA on response time?",
    "expectedDocumentIds": ["uuid-1", "uuid-2"]
  }
]

Each row's expectedDocumentIds is the set of documents that should appear in the top-k retrieval — any chunk from any of those documents counts as a hit.

Scoring

Precision@k: fraction of retrieved chunks whose document is in the expected set.
Recall@k: fraction of expected documents that appear at least once in the retrieved chunks.
MRR (Mean Reciprocal Rank): 1 / (rank of first hit), averaged across queries; 0 if no hit.

Adding your own fixtures

Upload your documents to a collection.
Ask the questions you'd want the system to answer; note the document IDs that should ground each answer.
Drop them into eval/fixtures.json (or pass --fixture path for a custom file).
Run yarn eval --collection <collection-uuid>.

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
fixtures.example.json		fixtures.example.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Retrieval evaluation harness

Fixture format

Scoring

Adding your own fixtures

FilesExpand file tree

eval

Directory actions

More options

Directory actions

More options

Latest commit

History

eval

Folders and files

parent directory

README.md

Retrieval evaluation harness

Fixture format

Scoring

Adding your own fixtures