Tags · withcatai/node-llama-cpp

v3.18.1

feat(minor): customize `postinstall` behavior (#582)

* feat: customize `postinstall` behavior
* feat: experimental support for context KV cache type configurations
* feat: support `NVFP4` quants

Mar 17, 2026
57bea3d
zip
tar.gz
Notes
Downloads

v3.18.0

feat: automatic checkpoints for models that need it (#573)

* feat: automatic checkpoints for models that need it (such as Qwen 3.5 due to its hybrid architecture)
* feat(`QwenChatWrapper`): Qwen 3.5 support
* feat(`inspect gpu` command): detect and report missing prebuilt binary modules and custom npm registry
* feat: initial disk cache dir option for future optimizations (disabled for now)
* fix: Qwen 3.5 memory estimation
* fix: grammar use with HarmonyChatWrapper
* fix: add mistral think segment detection
* fix: compress excessively long segments from the current response on context shift instead of throwing an error
* fix: default thinking budget to 75% of the context size to prevent low-quality responses
* fix: bugs

Mar 15, 2026
c641959
zip
tar.gz
Notes
Downloads

v3.17.1

fix: Electron template (#566)

Feb 27, 2026
8931402
zip
tar.gz
Notes
Downloads

v3.17.0

feat(`getLlama`): `build: "autoAttempt"` (#564)

* feat(`getLlama`): `build: "autoAttempt"`
* feat: get rid of octokit
* fix(CLI): disable Direct I/O by default
* fix: Bun segmentation fault on process exit with undisposed `Llama`
* fix: detect glibc inside Nix
* fix: stricter CI build flag
* chore: update `simple-git`
* chore: switch off of `tsconfig.json` deprecated configs
* docs: clarify `getLlama`'s `build` option logic

Feb 27, 2026
dda5ade
zip
tar.gz
Notes

v3.16.2

fix: macOS 14 prebuilt binaries (#559)

Feb 21, 2026
6faa5ae
zip
tar.gz
Notes
Downloads

v3.16.1

fix: export missing types (#557)

Feb 20, 2026
498711c
zip
tar.gz
Notes
Downloads

v3.16.0

feat: Exclude Top Choices (XTC) (#553)

* feat: Exclude Top Choices (XTC) support
* feat: DRY (Don't Repeat Yourself) repeat penalty support
* feat: Tiny Aya support
* fix: adjust the default VRAM padding config to reserve enough memory for compute buffers
* fix: adapt to breaking `llama.cpp` changes
* fix: support function call syntax with optional whitespace prefix
* fix: find the provided cmake path
* fix: change the default value of `useDirectIo` to `false`
* fix: Vulkan device dedupe

Feb 19, 2026
57e8c22
zip
tar.gz
Notes
Downloads

v3.15.1

fix: adapt to `llama.cpp` changes (#547)

* fix: adapt to `llama.cpp` changes
* fix: change the level of common logs

Jan 25, 2026
4baa480
zip
tar.gz
Notes
Downloads

v3.15.0

feat(`LlamaCompletion`): `stopOnAbortSignal` (#538)

* feat(`LlamaCompletion`): `stopOnAbortSignal`
* feat(`LlamaModel`): `useDirectIo`
* fix: support new CUDA 13.1 archs
* fix: build the prebuilt binaries with CUDA 13.1 instead of 13.0
* docs: stopping a text completion generation

Jan 10, 2026
734693d
zip
tar.gz
Notes
Downloads

v3.14.5

docs: fix cmake dependencies link (#534)

Dec 10, 2025
7e467cc
zip
tar.gz
Notes
Downloads

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v3.18.1

v3.18.0

v3.17.1

v3.17.0

v3.16.2

v3.16.1

v3.16.0

v3.15.1

v3.15.0

v3.14.5

Uh oh!

Tags: withcatai/node-llama-cpp