Tags · ggml-org/llama.cpp

b8854

server : refactor "use checkpoint" logic (#22114)

Apr 20, 2026
de71b5f
zip
tar.gz
Notes
Downloads

b8853

[SYCL] Fix reorder MMVQ assert on unaligned vocab sizes (#22035)

* [SYCL] Fix reorder MMVQ assert on unaligned vocab sizes

The reorder mul_mat_vec_q dispatchers for Q4_0, Q8_0, Q4_K, and Q6_K
asserted that block_num_y was a multiple of 16 subgroups. Models with
a vocab size not divisible by 16 (for example HY-MT at 120818) aborted
on model load when the output projection tripped the assert.

I replaced the assert with padding: block_num_y now rounds up to a
whole number of subgroup-sized workgroups. The kernel already has the
row bounds check (`if (row >= nrows) return;`) so the extra padded
threads early-exit cleanly. Row values are uniform across a subgroup
so the collective reduce stays safe.

For aligned vocab sizes the padded block_num_y equals the old value,
so the kernel launch is identical and there is no regression.

Thanks to @arthw for flagging the relationship to #21527.

Fixes #22020.

AI assisted coding, tested on Intel B70 hardware.

* sycl: use WARP_SIZE for num_subgroups in reorder MMVQ launches

Replaces the hardcoded 16 with WARP_SIZE in the four reorder_mul_mat_vec
launch helpers (Q4_0, Q8_0, Q4_K, Q6_K). Compile-time no-op on the Intel
target where WARP_SIZE is 16, but makes the relationship to subgroup
size explicit. Per review by @NeoZhangJianyu on #22035.

Assisted by Claude.

Apr 20, 2026
788fcbc
zip
tar.gz
Notes
Downloads

b8852

server: rename --clear-idle to --cache-idle-slots (#21741)

Apr 20, 2026
9d49acb
zip
tar.gz
Notes
Downloads

b8851

vendor : update cpp-httplib to 0.42.0 (#21781)

Apr 19, 2026
e365e65
zip
tar.gz
Notes
Downloads

b8850

CUDA: refactor mma data loading for AMD (#22051)

* CUDA: refactor mma data loading for AMD

* fix CDNA MMQ occupancy

* fix CDNA3 mma

* fix RDNA3 compile

Apr 19, 2026
4eac5b4
zip
tar.gz
Notes
Downloads

b8849

common/autoparser : allow space after tool call (#22073)

Apr 19, 2026
d5b780a
zip
tar.gz
Notes
Downloads

b8848

HIP: Remove unesscary NCCL_CHECK (#21914)

Apr 19, 2026
471540a
zip
tar.gz
Notes
Downloads

b8847

mtmd: add pos_0 to mtmd_image_tokens_get_decoder_pos (breaking change) (

#22082)

* mtmd: add pos_0 to mtmd_image_tokens_get_decoder_pos

* fix build

Apr 19, 2026
1912407
zip
tar.gz
Notes
Downloads

b8846

ggml : reduce CPU overhead in meta backend (#22041)

* cache subgraph splits when cgraph is unchanged

Skip per-call subgraph construction in ggml_backend_meta_graph_compute when the same ggml_cgraph is used consecutively.

Assign uid to every sub-graph so that CUDA's fast uid check path hits too.

* Address review comments

* Keep the scope as is

* Rename last_uid and last_n_subgraphs field. Remove last_max_tmp_size field. Refactor code.

* Address review comments

* Update ggml/src/ggml-backend-meta.cpp

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Update ggml/src/ggml-backend-meta.cpp

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

Apr 19, 2026
bcdcc10
zip
tar.gz
Notes
Downloads

b8843

cmake: remove CMP0194 policy to restore MSVC builds (#21934)

#21630 added the CMP0194 NEW policy to silence a CMake warning, but on Windows runners it caused CMake to prefer the MinGW toolchain for ASM and broke MSVC builds.

Reverting only that policy block restores the previous working behavior. The CMake 4.1+ warning comes back, but that is cosmetic and does not break any platform.

Reported-by: oobabooga

Refs: #21630

Co-authored-by: texasich <texasich@users.noreply.github.com>

Apr 19, 2026
09b4efa
zip
tar.gz
Notes
Downloads

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b8854

b8853

b8852

b8851

b8850

b8849

b8848

b8847

b8846

b8843

Tags: ggml-org/llama.cpp