ci_benchmarks

CI Benchmarks

This directory contains benchmarks that run in CI and report results to bencher.dev.

Structure

ci_benchmarks/
├── benchmarks/          # Benchmark tests
│   ├── test_scan.py
│   ├── test_search.py
│   └── test_random_access.py
├── datagen/             # Dataset generation scripts
│   ├── gen_all.py       # Generate all datasets
│   ├── basic.py         # 10M row dataset
│   └── lineitems.py     # TPC-H lineitem dataset
├── benchmark.py         # IO/memory benchmark infrastructure
├── conftest.py          # Pytest configuration
└── datasets.py          # Dataset URI resolver (local vs GCS)

Running Benchmarks Locally

1. Generate test datasets

python python/ci_benchmarks/datagen/gen_all.py

This creates datasets in ~/lance-benchmarks-ci-datasets/.

2. Run pytest-benchmark tests

pytest python/ci_benchmarks/ --benchmark-only

To save timing results as JSON:

pytest python/ci_benchmarks/ --benchmark-json results.json

IO/Memory Benchmarks

The io_memory_benchmark marker provides benchmarks that track both IO statistics and memory allocations during the benchmark execution (not setup/teardown).

Writing IO/Memory Benchmarks

@pytest.mark.io_memory_benchmark()
def test_full_scan(io_mem_benchmark):
    dataset_uri = get_dataset_uri("basic")
    ds = lance.dataset(dataset_uri)

    def bench(dataset):
        dataset.to_table()

    io_mem_benchmark(bench, ds)

The io_mem_benchmark fixture:

Runs an optional warmup iteration (not measured)
Tracks IO stats via dataset.io_stats_incremental()
Optionally tracks memory via lance-memtest if preloaded

Running IO/Memory Benchmarks

Without memory tracking:

pytest python/ci_benchmarks/benchmarks/test_search.py::test_io_mem_basic_btree_search -v

With memory tracking (Linux only):

LD_PRELOAD=$(lance-memtest) pytest python/ci_benchmarks/benchmarks/test_search.py::test_io_mem_basic_btree_search -v

Output

Terminal output shows a summary table:

======================== IO/Memory Benchmark Statistics ========================
Test                                     Peak Mem      Allocs   Read IOPS    Read Bytes
---------------------------------------------------------------------------------------
test_io_mem_basic_btree_search[...]        3.6 MB     135,387           2        1.8 MB

To save results as JSON (Bencher Metric Format):

pytest ... --benchmark-stats-json stats.json

Investigating memory use for a particular benchmark

To investigate memory use for a particular benchmark, you can use the bytehound library. After installing it, you can run a benchmark with memory profiling enabled:

LD_PRELOAD=/usr/local/lib/libbytehound.so \
    pytest 'python/ci_benchmarks/benchmarks/test_search.py::test_io_mem_basic_btree_search[small_strings-equal]' -v

Then use the bytehound server to visualize the memory profiling data:

bytehound server memory-profiling_*.dat

You can use time filters on the allocations view to see memory allocations at a specific point in time, which can help you filter out allocations from setup. Once you have filters in place, you can use the Flamegraph view (available from the menu in the upper right corner) to get a flamegraph of the memory allocations in that time range.

Name		Name	Last commit message	Last commit date
parent directory ..
benchmarks		benchmarks
datagen		datagen
README.md		README.md
__init__.py		__init__.py
benchmark.py		benchmark.py
conftest.py		conftest.py
datasets.py		datasets.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

CI Benchmarks

Structure

Running Benchmarks Locally

1. Generate test datasets

2. Run pytest-benchmark tests

IO/Memory Benchmarks

Writing IO/Memory Benchmarks

Running IO/Memory Benchmarks

Output

Investigating memory use for a particular benchmark

FilesExpand file tree

ci_benchmarks

Directory actions

More options

Directory actions

More options

Latest commit

History

ci_benchmarks

Folders and files

parent directory

README.md

CI Benchmarks

Structure

Running Benchmarks Locally

1. Generate test datasets

2. Run pytest-benchmark tests

IO/Memory Benchmarks

Writing IO/Memory Benchmarks

Running IO/Memory Benchmarks

Output

Investigating memory use for a particular benchmark