capture llama_print_timings without Low Level API? #1109

jacekpoplawski · 2024-01-19T14:46:46Z

jacekpoplawski
Jan 19, 2024

Hello, I am enjoying using llama-cpp-python by calling High Level API, however, on terminal I constantly see logs like llama_print_timings, instead would be nice to capture these values in my code. Is it possible without going to Low Level API?

zeyus · 2024-01-24T12:13:53Z

zeyus
Jan 24, 2024

Huh, I guess we both needed the same thing, I just posted in the Ideas section a solution to this. I hope it helps you :)

Take a look here: #1124

Essentially, as long as you have verbose=True on your model, the output from llama.cpp will go to stderr, but because they're C bindings, it will not redirect if you change sys.stderr = ... and you need to do some redirection with dup and dup2, which is done in llama-cpp-python but only to redirect to /dev/null, which doesn't allow you to capture the output.

My version of the code allows capturing the output into existing StringIO objects or it will create them for you that can be used within the context, e.g.:

with store_stdout_stderr() as (outbuff, errbuff):
    # Normal python output is captured
    print("something")
    # Output from C streams are captured
    llm = Llama(model_path="./models/7B/llama-model.gguf", verbose=True)

print(outbuff.getvalue())  # "something\\n"
print(errbuff.getvalue())  # Output from model information, layers etc

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

capture llama_print_timings without Low Level API? #1109

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

capture llama_print_timings without Low Level API? #1109

jacekpoplawski Jan 19, 2024

Replies: 1 comment

zeyus Jan 24, 2024

jacekpoplawski
Jan 19, 2024

zeyus
Jan 24, 2024