This directory contains benchmarks that run in CI and report results to bencher.dev.
ci_benchmarks/
├── benchmarks/ # Benchmark tests
│ ├── test_scan.py
│ ├── test_search.py
│ └── test_random_access.py
├── datagen/ # Dataset generation scripts
│ ├── gen_all.py # Generate all datasets
│ ├── basic.py # 10M row dataset
│ └── lineitems.py # TPC-H lineitem dataset
├── benchmark.py # IO/memory benchmark infrastructure
├── conftest.py # Pytest configuration
└── datasets.py # Dataset URI resolver (local vs GCS)
python python/ci_benchmarks/datagen/gen_all.pyThis creates datasets in ~/lance-benchmarks-ci-datasets/.
pytest python/ci_benchmarks/ --benchmark-onlyTo save timing results as JSON:
pytest python/ci_benchmarks/ --benchmark-json results.jsonThe io_memory_benchmark marker provides benchmarks that track both IO statistics
and memory allocations during the benchmark execution (not setup/teardown).
@pytest.mark.io_memory_benchmark()
def test_full_scan(io_mem_benchmark):
dataset_uri = get_dataset_uri("basic")
ds = lance.dataset(dataset_uri)
def bench(dataset):
dataset.to_table()
io_mem_benchmark(bench, ds)The io_mem_benchmark fixture:
- Runs an optional warmup iteration (not measured)
- Tracks IO stats via
dataset.io_stats_incremental() - Optionally tracks memory via
lance-memtestif preloaded
Without memory tracking:
pytest python/ci_benchmarks/benchmarks/test_search.py::test_io_mem_basic_btree_search -vWith memory tracking (Linux only):
LD_PRELOAD=$(lance-memtest) pytest python/ci_benchmarks/benchmarks/test_search.py::test_io_mem_basic_btree_search -vTerminal output shows a summary table:
======================== IO/Memory Benchmark Statistics ========================
Test Peak Mem Allocs Read IOPS Read Bytes
---------------------------------------------------------------------------------------
test_io_mem_basic_btree_search[...] 3.6 MB 135,387 2 1.8 MB
To save results as JSON (Bencher Metric Format):
pytest ... --benchmark-stats-json stats.jsonTo investigate memory use for a particular benchmark, you can use the bytehound library.
After installing it, you can run a benchmark with memory profiling enabled:
LD_PRELOAD=/usr/local/lib/libbytehound.so \
pytest 'python/ci_benchmarks/benchmarks/test_search.py::test_io_mem_basic_btree_search[small_strings-equal]' -vThen use the bytehound server to visualize the memory profiling data:
bytehound server memory-profiling_*.datYou can use time filters on the allocations view to see memory allocations at a specific point in time, which can help you filter out allocations from setup. Once you have filters in place, you can use the Flamegraph view (available from the menu in the upper right corner) to get a flamegraph of the memory allocations in that time range.