[FEEDBACK] Better packaging for llama.cpp to support downstream consumers 🤗 #15313
Replies: 62 comments 100 replies
-
|
From this, IMO it only misses Linux+CUDA bundle to be useable as download & use. If we want better packaging on Linux, we can also work on snap/bash installer when trying to use pre-built packages. |
Beta Was this translation helpful? Give feedback.
-
|
It’s high time HuggingFace to copy Ollama’s packaging and GTM strategy, but this time, give credit to llama.cpp. Ideally, we should retain llama.cpp as the core component. |
Beta Was this translation helpful? Give feedback.
-
|
Is the barrier the installation process, or the need to use a complex command line to launch llama.cpp? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
For me the biggest thing is I'd love to see more emphasis placed on My ideal would be for the Maybe include systray integration and a simple UI for selecting and downloading models too. At that point |
Beta Was this translation helpful? Give feedback.
-
|
It will be cool if 'llama-server' would have auto configuration option to the machine/model like 'ollama' does it. |
Beta Was this translation helpful? Give feedback.
-
|
For windows maybe choco and windows store would be a good idea? 🤔 |
Beta Was this translation helpful? Give feedback.
-
I created a rpm spec to manage installation though I think flatpaks might be more user friendly and distribution agnostic. |
Beta Was this translation helpful? Give feedback.
-
The released Windows builds are available via Scoop. Updates happen automatically. Old installed versions are kept, and current one symlinked into a folder „current“ which provides the executables on the path. |
Beta Was this translation helpful? Give feedback.
-
|
is it feasible to have a single release for OS including all the backend? |
Beta Was this translation helpful? Give feedback.
-
|
For linux I just install the vulkan binaries and run the server from there. Maybe we can have a install script like ollama that detects the system and launches the server which can be controlled from an app as well as cli? The user then gets basic command line utillities like run start stop load list etc? |
Beta Was this translation helpful? Give feedback.
-
|
On Mac, the easiest way (also arguably the safest way) from a user's perspective is to find it in App Store, and install from there. Because of apps from App Store are in a sandbox, so from a user's point of view, installing or uninstalling is simple and clean. Creating a build and passing the App Store review might take some efforts (due to the sandbox constraint), but it should be a one-time thing. |
Beta Was this translation helpful? Give feedback.
-
|
Its my understanding that none of the automated installs support GPU acceleration. I might be wrong but its definitely the case for Windows, which makes it useless to install via winget. |
Beta Was this translation helpful? Give feedback.
-
|
To me the biggest advantage ollama currently has is that the optimal settings for a model are bundled, the gguf spec would allow for this to since its versatile enough to make this a metadata field inside the model. It would allow people to load the settings from a gguf and frontends can extract them and adapt them as they see fit. I think that part is going to be more valuable than obtaining the binary since downloading the binary from github is not that hard. |
Beta Was this translation helpful? Give feedback.
-
|
My personal wishlist
|
Beta Was this translation helpful? Give feedback.
-
|
I think delegating installation to external projects is entirely legitimate. The end "product" of llama.cpp being the C headers + implementation is an acceptable place for this project to be. I would much rather have the maintainers of llama.cpp not have to deal with the deluge of installation and packaging idiosyncrasies. I understand a lot of people who are posting issues on this project are users of the server or CLI, but I think there's a sampling bias where most users of this repo are actually through llama-cpp-python, ollama, or some other wrapper (my own being llama-cpp-rs). I do not think this should be considered a problem to be solved. |
Beta Was this translation helpful? Give feedback.
-
|
Hi everyone! I’ve just done a major refactor of the build process and published all the work here: https://github.com/angt/installama.sh All fixes and new features (like the new downloader or the OpenSSL/BoringSSL backend) are either merged or on their way upstream. Let’s see how far we can go in this direction :) For now, let’s use this new repo to track issues and feature requests. |
Beta Was this translation helpful? Give feedback.
-
|
Hey everyone, just want make you aware that I started adding llama.cpp into Spack. See spack/spack-packages#2437 |
Beta Was this translation helpful? Give feedback.
-
|
Great to see the continued activity on this topic. As an outside observer / newcomer, llama.cpp simply feels to me like hard-core development software. People like me who come from a development background but haven't seen a Makefile in decades are forced into LM Studio / Ollama because it takes way too long to retrain atrophied muscles, even when we prefer the elegance of minimally necessary software solutions (i.e. no shim layers). We also run into continued misinformation. For example, this is what Google returns when I search for "how to install llama.cpp with gpu support" (excerpted)
Even in this thread, a recent commenter thought (erroneously) that "none of the automated installs support GPU acceleration". Reading through this thread and others (like #8188) it seems like llama.cpp is already most of the way to a decent packaging strategy. From what I can tell, it seems a few added lines into install.md plus some documentation structuring will go a long way. |
Beta Was this translation helpful? Give feedback.
-
|
I've hated on ollama for as long as it existed but let's be frank here: there's a reason ollama caught on and llama.cpp/lmstudio/etc. didn't. I would never install ollama on my machines for a variety of reasons, but I also find it insane that llama.cpp devs, despite being great at c++ and hardcore stuff, have not shown any sign of wanting to make the server accessible to more groups. It's been 2 hours since I started the build process of the current commit on my 40-core workstation (dual GPU) because |
Beta Was this translation helpful? Give feedback.
-
|
install-llama-cpp.zip |
Beta Was this translation helpful? Give feedback.
-
|
*** EDITED after reviewing previous comments regarding CLI-based installs. I fully agree with @ibehnam's comment. This is my main beef with llama.cpp as well. To install Ollama, all you need to do is go to ollama.com and click the "Download" button is at the top right. If you are somewhat technical and don't want an app install, the docs page has clear and concise information for the major methods most of which are about a page long with a LOT of whitespace. The longest is Linux (no surprise) but even that is about 3 pages. To my knowledge, all support GPU. IMHO this is the #1 reason why Ollama gets mindshare over Llama.cpp. I believe Llama.cpp is close with the existing methods and, and installama.sh or some other CLI-based method, but it requires testing and documentation, esp. the corner cases, rather than fragmented methods that can only be found in a 150 item long thread. Hopefully this doesn't sound harsh. It comes from a strong desire to see mainstream adoption for all good work the that have put in, which I assume we all want. |
Beta Was this translation helpful? Give feedback.
-
|
Compiling llama.cpp with CUDA takes forever if you don't use multithreading with -j |
Beta Was this translation helpful? Give feedback.
-
|
We have had a GH build setup as part of https://github.com/hybridgroup/llama-cpp-builder for Ubuntu CUDA and Vulkan binaries for a couple of months now, if this if of any use to the community. |
Beta Was this translation helpful? Give feedback.
-
|
Ideally, llama.cpp should use libggml from its standalone repository, so distributions can ship libggml a standalone package, and llama.cpp as another. Currently, llama.cpp needs to bundle its own version of libggml (in /usr/lib/llama.cpp), whisper.cpp bundles its own, and shipping a standalone libggml is kinda pointless since it's mostly unused. Would using libggml as a submodule not make your lives simpler too (i.e.: no syncing between repos any more)? |
Beta Was this translation helpful? Give feedback.
-
|
I don't think it's reasonable to expect to compile this software in order to install/use it as an end-user tool. If llama.cpp can prepare more builds (ex. CUDA on Linux) then more 3rd party packagers (homebrew, mise, aqua, asdf, etc) can have a plugin added to download and install them. I'm preparing an aqua plugin now so that I can As far as release cadence & versioning, fast development is fine, but it's not a great user experience. I think the vast majority of users want an occasional update with sane versioning. Development/alpha/beta/rc channels are the usual place for constant updates. |
Beta Was this translation helpful? Give feedback.
-
|
Hi, We've started a new discussion specifically for the Debian and Ubuntu packaging. Testers and feedback welcome. |
Beta Was this translation helpful? Give feedback.
-
请问这个问题怎么解决?
用的是 llama-b8254-bin-910b-openEuler-aarch64-aclgraph.tar.gz cann 8.5.1 |
Beta Was this translation helpful? Give feedback.
-
|
I'm working on this, and it's currently just a demo in development. Note that it heavily utilizes AI coding and human review, and it's currently aim to providing edge inference for all platforms. Currently, it only supports Windows platform development and has not been officially packaged yet. If anyone is interested, they are welcome to participate. |
Beta Was this translation helpful? Give feedback.
-
|
compared to |
Beta Was this translation helpful? Give feedback.



Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
llama.cpp as a project has made LLMs accessible to countless developers and consumers including me. The project has also consistently become faster over time as has the coverage beyond LLMs to VLMs, AudioLMs and more.
One feedback from community we keep getting is how difficult it is to directly use llama.cpp. Often times users end up using Ollama or GUIs like LMStudio or Jan (there's many more that I'm missing). However, it'd be great to offer a path to use llama.cpp in a more friendly and easy way to end consumers too.
Currently if someone was to use llama.cpp directly:
brew install llama.cppworksThis adds barrier for non technically inclined people specially since in all the above methods users would have to reinstall llama.cpp to get upgrades (and llama.cpp makes releases per commit - not a bad thing, but becomes an issue since you need to upgrade more frequently)
Opening this issue to discuss what could be done to package llama.cpp better and allow users to maybe download an executable and be on their way.
More so, are there people in the community interested in taking this up?
Beta Was this translation helpful? Give feedback.
All reactions