Skip to content

Tags: ggml-org/llama.cpp

Tags

b8854

Toggle b8854's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server : refactor "use checkpoint" logic (#22114)

b8853

Toggle b8853's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[SYCL] Fix reorder MMVQ assert on unaligned vocab sizes (#22035)

* [SYCL] Fix reorder MMVQ assert on unaligned vocab sizes

The reorder mul_mat_vec_q dispatchers for Q4_0, Q8_0, Q4_K, and Q6_K
asserted that block_num_y was a multiple of 16 subgroups. Models with
a vocab size not divisible by 16 (for example HY-MT at 120818) aborted
on model load when the output projection tripped the assert.

I replaced the assert with padding: block_num_y now rounds up to a
whole number of subgroup-sized workgroups. The kernel already has the
row bounds check (`if (row >= nrows) return;`) so the extra padded
threads early-exit cleanly. Row values are uniform across a subgroup
so the collective reduce stays safe.

For aligned vocab sizes the padded block_num_y equals the old value,
so the kernel launch is identical and there is no regression.

Thanks to @arthw for flagging the relationship to #21527.

Fixes #22020.

AI assisted coding, tested on Intel B70 hardware.

* sycl: use WARP_SIZE for num_subgroups in reorder MMVQ launches

Replaces the hardcoded 16 with WARP_SIZE in the four reorder_mul_mat_vec
launch helpers (Q4_0, Q8_0, Q4_K, Q6_K). Compile-time no-op on the Intel
target where WARP_SIZE is 16, but makes the relationship to subgroup
size explicit. Per review by @NeoZhangJianyu on #22035.

Assisted by Claude.

b8852

Toggle b8852's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
server: rename --clear-idle to --cache-idle-slots (#21741)

b8851

Toggle b8851's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
vendor : update cpp-httplib to 0.42.0 (#21781)

b8850

Toggle b8850's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CUDA: refactor mma data loading for AMD (#22051)

* CUDA: refactor mma data loading for AMD

* fix CDNA MMQ occupancy

* fix CDNA3 mma

* fix RDNA3 compile

b8849

Toggle b8849's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
common/autoparser : allow space after tool call (#22073)

b8848

Toggle b8848's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
HIP: Remove unesscary NCCL_CHECK (#21914)

b8847

Toggle b8847's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
mtmd: add pos_0 to mtmd_image_tokens_get_decoder_pos (breaking change) (

#22082)

* mtmd: add pos_0 to mtmd_image_tokens_get_decoder_pos

* fix build

b8846

Toggle b8846's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml : reduce CPU overhead in meta backend (#22041)

* cache subgraph splits when cgraph is unchanged

Skip per-call subgraph construction in ggml_backend_meta_graph_compute when the same ggml_cgraph is used consecutively.

Assign uid to every sub-graph so that CUDA's fast uid check path hits too.

* Address review comments

* Keep the scope as is

* Rename last_uid and last_n_subgraphs field. Remove last_max_tmp_size field. Refactor code.

* Address review comments

* Update ggml/src/ggml-backend-meta.cpp

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Update ggml/src/ggml-backend-meta.cpp

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

b8843

Toggle b8843's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
cmake: remove CMP0194 policy to restore MSVC builds (#21934)

#21630 added the CMP0194 NEW policy to silence a CMake warning, but on Windows runners it caused CMake to prefer the MinGW toolchain for ASM and broke MSVC builds.

Reverting only that policy block restores the previous working behavior. The CMake 4.1+ warning comes back, but that is cosmetic and does not break any platform.

Reported-by: oobabooga

Refs: #21630

Co-authored-by: texasich <texasich@users.noreply.github.com>