Tags: withcatai/node-llama-cpp
Tags
feat: automatic checkpoints for models that need it (#573) * feat: automatic checkpoints for models that need it (such as Qwen 3.5 due to its hybrid architecture) * feat(`QwenChatWrapper`): Qwen 3.5 support * feat(`inspect gpu` command): detect and report missing prebuilt binary modules and custom npm registry * feat: initial disk cache dir option for future optimizations (disabled for now) * fix: Qwen 3.5 memory estimation * fix: grammar use with HarmonyChatWrapper * fix: add mistral think segment detection * fix: compress excessively long segments from the current response on context shift instead of throwing an error * fix: default thinking budget to 75% of the context size to prevent low-quality responses * fix: bugs
feat(`getLlama`): `build: "autoAttempt"` (#564) * feat(`getLlama`): `build: "autoAttempt"` * feat: get rid of octokit * fix(CLI): disable Direct I/O by default * fix: Bun segmentation fault on process exit with undisposed `Llama` * fix: detect glibc inside Nix * fix: stricter CI build flag * chore: update `simple-git` * chore: switch off of `tsconfig.json` deprecated configs * docs: clarify `getLlama`'s `build` option logic
feat: Exclude Top Choices (XTC) (#553) * feat: Exclude Top Choices (XTC) support * feat: DRY (Don't Repeat Yourself) repeat penalty support * feat: Tiny Aya support * fix: adjust the default VRAM padding config to reserve enough memory for compute buffers * fix: adapt to breaking `llama.cpp` changes * fix: support function call syntax with optional whitespace prefix * fix: find the provided cmake path * fix: change the default value of `useDirectIo` to `false` * fix: Vulkan device dedupe
feat(`LlamaCompletion`): `stopOnAbortSignal` (#538) * feat(`LlamaCompletion`): `stopOnAbortSignal` * feat(`LlamaModel`): `useDirectIo` * fix: support new CUDA 13.1 archs * fix: build the prebuilt binaries with CUDA 13.1 instead of 13.0 * docs: stopping a text completion generation
PreviousNext