[BUG?] `finish_reason` is None when using `create_chat_completion(stream=True)`

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/abetlen/llama-cpp-python/discussions), and have a new bug or useful enhancement to share.

# Current Behavior

```
from icecream import ic

from llama_cpp import Llama
from llama_cpp import ChatCompletionMessage

llm = Llama(
    model_path="/opt/models/WizardCoder-Python-34B-V1.0/wizardcoder-python-34b-v1.0.Q4_K_M.gguf",
    n_gpu_layers=-1,
    # n_gpu_layers=0,
)

def print_chat_streaming(output, debug_p=True,):
    """
    Process and print out chat completions from a model when the stream is set to True.
    
    Args:
        output (iterable): The output from the model with stream=True.
    """
    for r in output:
        delta = r["choices"][0]['delta']
        if 'role' in delta:
            print(f"\n{delta['role']}: ", end='')
        if 'content' in delta:
            print(f"{delta['content']}", end='')
    print("\n")
    
    if debug_p == True:
        ic(r)
```

```
output = llm.create_chat_completion(
    messages=[
        ChatCompletionMessage(
            # role="user",
            role="system",
            content=r"""You're a helpful programming assistant who answers the questions the user asks of you concisely and accurately. As you're a senior engineer working at Google with a PhD in distributed systems, you're extremely smart. You take a deep breath before answering the question and solve the question step by step.""",
        ),
        ChatCompletionMessage(
            role="user",
            content=r"""List groups my linux user is in""",
        ),
    ],
    max_tokens=256,
    stop=[],
    temperature=0,
    stream=True,
)

print_chat_streaming(output)
```

```
Llama.generate: prefix-match hit

assistant: To list all the groups that your Linux user belongs to, run the following command:

...

This will display a space-separated list of all the groups that you belong to. 

ic| r: {'choices': [{'delta': {'content': ' '}, 'finish_reason': None, 'index': 0}],
        'created': 1695045448,
        'id': 'chatcmpl-4f8489ed-f56a-4fa9-b42f-cdc753de93b8',
        'model': '/opt/models/WizardCoder-Python-34B-V1.0/wizardcoder-python-34b-v1.0.Q4_K_M.gguf',
        'object': 'chat.completion.chunk'}

llama_print_timings:        load time =   458.52 ms
llama_print_timings:      sample time =    42.46 ms /    63 runs   (    0.67 ms per token,  1483.61 tokens per second)
llama_print_timings: prompt eval time =   411.41 ms /    12 tokens (   34.28 ms per token,    29.17 tokens per second)
llama_print_timings:        eval time =  2738.43 ms /    62 runs   (   44.17 ms per token,    22.64 tokens per second)
llama_print_timings:       total time =  3349.54 ms
```

# Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

* Physical (or virtual) hardware you are using, e.g. for Linux:

`$ lscpu`
```
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   45 bits physical, 48 bits virtual
Byte Order:                      Little Endian
CPU(s):                          64
On-line CPU(s) list:             0-63
Vendor ID:                       GenuineIntel
Model name:                      Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
CPU family:                      6
Model:                           85
Thread(s) per core:              1
Core(s) per socket:              32
Socket(s):                       2
Stepping:                        7
BogoMIPS:                        4190.15
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
Hypervisor vendor:               VMware
Virtualization type:             full
L1d cache:                       2 MiB (64 instances)
L1i cache:                       2 MiB (64 instances)
L2 cache:                        64 MiB (64 instances)
L3 cache:                        55 MiB (2 instances)
NUMA node(s):                    2
NUMA node0 CPU(s):               0-31
NUMA node1 CPU(s):               32-63
Vulnerability Itlb multihit:     KVM: Mitigation: VMX unsupported
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed:          Mitigation; Enhanced IBRS
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
```

* Operating System, e.g. for Linux:

`$ uname -a`

```
Linux gpu7 5.15.0-75-generic #82-Ubuntu SMP Tue Jun 6 23:10:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
```

* SDK version, e.g. for Linux:

```
$ python3 --version
Python 3.10.12

$ make --version
GNU Make 4.4.1
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

$ g++ --version
g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG?] `finish_reason` is None when using `create_chat_completion(stream=True)` #735

Prerequisites

Current Behavior

Environment and Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[BUG?] finish_reason is None when using create_chat_completion(stream=True) #735

Description

Prerequisites

Current Behavior

Environment and Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

[BUG?] `finish_reason` is None when using `create_chat_completion(stream=True)` #735