Description
i've stumbled upon dynatemp and have a question/proposal.
I believe, that the thing that was missed during dynatemp implementation is the underlying concept of what it's needed for.
Prompts may require 2 types of replies: deterministic replies and creative replies. These are opposite in terms of sampling approach.
Deterministic approach would be required, for example, by programming and by answering knowledge related question. Then you wish llm to provide with the most probable tokens.
Creative approach would be required in writing stories and general conversations with llms.
For example, we all know parasite words of llms, like "Maniacally laughing" of llama 3 and "Ahahahaha" that it inserts into nearly every reply. Tokens forming these are super probable. So, in case of using the dynatemp here, we will only increase changes to get "ahahahahahahaha" instead of "ahaha" and that's what i saw in my tests :).
Meanwhile, the whole idea for creative tasks the situation is opposite to deterministic. We need to skip "overfitten" tokens and instead flatten the rest of the tokens to walk around the "deadends".
So, we need to have exactly opposite to min_p and dynatemp. Actually, i thought i could use negative values for dynatemp, but it turned out that in the code we have:
case llama_sampler_type::TEMPERATURE:
if (dynatemp_range > 0) {
float dynatemp_min = std::max(0.0f, temp - dynatemp_range);
float dynatemp_max = std::max(0.0f, temp + dynatemp_range);
llama_sample_entropy(ctx_main, &cur_p, dynatemp_min, dynatemp_max, dynatemp_exponent);
} else {
llama_sample_temp(ctx_main, &cur_p, temp);
}
which makes it impossible, despite the fact that it actually could be possible :).
The question is obvious, shouldn't we patch it to allow for negative dynatemp? It would make perfect sense and would help to get more creative replies, as with positive it creates more deterministic replies.
And we need something like max_p to exclude super probable tokens that are chosen with no alternatives every time.