Generate with chat template#64
Generate with chat template#64kherud merged 2 commits intokherud:masterfrom lesters:generate-with-chat-template
Conversation
|
Hey @lesters thanks for the pull request! This was indeed not possible, so it's a nice addition. In general it's best to align the C++ code as close to the llama.cpp server code as possible since that way it's easier to maintain in the long term. I think there the chat template is loaded once when initializing a model (so via However, the server has a separate endpoint for chat completions, which the Java binding doesn't have. So to still be able to choose for each inference whether to use the chat template, I think using something like your I will review the PR in more detail tomorrow. |
|
Thanks, @kherud. I think you are right, this is probably better as a part of |
|
Looks good to me, thanks for the work. |
This adds the option of applying a chat template such as those found in GGUF files to the prompt before generating tokens. This is a suggestion as I couldn't find a way of doing this in the code today.