Open
Description
I think a bigger advantage would be to do some refactoring in the existing examples and "hide" some of the state for sampling and KV cache management that we expose behind the common API
- Hide some state for sampling (unsure what this mean)
- Key Value cache management should be abstracted away behind the common API
- @ngxson suggest restructuring the help message in gpt_params_print_usage() to improve help message clarity
If there is any other aspect of the example which is currently a pain point for developer grokking, feel free to also suggest some so it can be added here.