Skip to content

Commit 0ec6bce

Browse files
committed
Merge branch 'master' into xsn/vision_2
2 parents fa55281 + 80c41dd commit 0ec6bce

File tree

276 files changed

+15618
-5510
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

276 files changed

+15618
-5510
lines changed

.devops/cuda.Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
ARG UBUNTU_VERSION=22.04
22
# This needs to generally match the container host's environment.
3-
ARG CUDA_VERSION=12.6.0
3+
ARG CUDA_VERSION=12.4.0
44
# Target the CUDA build image
55
ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}
66

.devops/llama-cpp-cuda.srpm.spec

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,10 @@ Version: %( date "+%%Y%%m%%d" )
1717
Release: 1%{?dist}
1818
Summary: CPU Inference of LLaMA model in pure C/C++ (no CUDA/OpenCL)
1919
License: MIT
20-
Source0: https://github.com/ggerganov/llama.cpp/archive/refs/heads/master.tar.gz
20+
Source0: https://github.com/ggml-org/llama.cpp/archive/refs/heads/master.tar.gz
2121
BuildRequires: coreutils make gcc-c++ git cuda-toolkit
2222
Requires: cuda-toolkit
23-
URL: https://github.com/ggerganov/llama.cpp
23+
URL: https://github.com/ggml-org/llama.cpp
2424

2525
%define debug_package %{nil}
2626
%define source_date_epoch_from_changelog 0

.devops/llama-cpp.srpm.spec

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,10 @@ Version: %( date "+%%Y%%m%%d" )
1818
Release: 1%{?dist}
1919
Summary: CPU Inference of LLaMA model in pure C/C++ (no CUDA/OpenCL)
2020
License: MIT
21-
Source0: https://github.com/ggerganov/llama.cpp/archive/refs/heads/master.tar.gz
21+
Source0: https://github.com/ggml-org/llama.cpp/archive/refs/heads/master.tar.gz
2222
BuildRequires: coreutils make gcc-c++ git libstdc++-devel
2323
Requires: libstdc++
24-
URL: https://github.com/ggerganov/llama.cpp
24+
URL: https://github.com/ggml-org/llama.cpp
2525

2626
%define debug_package %{nil}
2727
%define source_date_epoch_from_changelog 0

.devops/musa.Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
ARG UBUNTU_VERSION=22.04
22
# This needs to generally match the container host's environment.
3-
ARG MUSA_VERSION=rc3.1.0
3+
ARG MUSA_VERSION=rc3.1.1
44
# Target the MUSA build image
55
ARG BASE_MUSA_DEV_CONTAINER=mthreads/musa:${MUSA_VERSION}-devel-ubuntu${UBUNTU_VERSION}
66

.devops/nix/package.nix

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -133,12 +133,12 @@ effectiveStdenv.mkDerivation (finalAttrs: {
133133
--replace '[bundle pathForResource:@"default" ofType:@"metallib"];' "@\"$out/bin/default.metallib\";"
134134
'';
135135

136-
# With PR#6015 https://github.com/ggerganov/llama.cpp/pull/6015,
136+
# With PR#6015 https://github.com/ggml-org/llama.cpp/pull/6015,
137137
# `default.metallib` may be compiled with Metal compiler from XCode
138138
# and we need to escape sandbox on MacOS to access Metal compiler.
139139
# `xcrun` is used find the path of the Metal compiler, which is varible
140140
# and not on $PATH
141-
# see https://github.com/ggerganov/llama.cpp/pull/6118 for discussion
141+
# see https://github.com/ggml-org/llama.cpp/pull/6118 for discussion
142142
__noChroot = effectiveStdenv.isDarwin && useMetalKit && precompileMetalShaders;
143143

144144
nativeBuildInputs =
@@ -220,7 +220,7 @@ effectiveStdenv.mkDerivation (finalAttrs: {
220220
broken = (useMetalKit && !effectiveStdenv.isDarwin);
221221

222222
description = "Inference of LLaMA model in pure C/C++${descriptionSuffix}";
223-
homepage = "https://github.com/ggerganov/llama.cpp/";
223+
homepage = "https://github.com/ggml-org/llama.cpp/";
224224
license = lib.licenses.mit;
225225

226226
# Accommodates `nix run` and `lib.getExe`

.devops/rocm.Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ ARG BASE_ROCM_DEV_CONTAINER=rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION}-co
1111
FROM ${BASE_ROCM_DEV_CONTAINER} AS build
1212

1313
# Unless otherwise specified, we make a fat build.
14-
# List from https://github.com/ggerganov/llama.cpp/pull/1087#issuecomment-1682807878
14+
# List from https://github.com/ggml-org/llama.cpp/pull/1087#issuecomment-1682807878
1515
# This is mostly tied to rocBLAS supported archs.
1616
# gfx803, gfx900, gfx1032, gfx1101, gfx1102,not officialy supported
1717
# gfx906 is deprecated

.github/ISSUE_TEMPLATE/020-enhancement.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ body:
66
- type: markdown
77
attributes:
88
value: |
9-
[Please post your idea first in Discussion if there is not yet a consensus for this enhancement request. This will help to keep this issue tracker focused on enhancements that the community has agreed needs to be implemented.](https://github.com/ggerganov/llama.cpp/discussions/categories/ideas)
9+
[Please post your idea first in Discussion if there is not yet a consensus for this enhancement request. This will help to keep this issue tracker focused on enhancements that the community has agreed needs to be implemented.](https://github.com/ggml-org/llama.cpp/discussions/categories/ideas)
1010
1111
- type: checkboxes
1212
id: prerequisites
@@ -16,11 +16,11 @@ body:
1616
options:
1717
- label: I am running the latest code. Mention the version if possible as well.
1818
required: true
19-
- label: I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
19+
- label: I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md).
2020
required: true
2121
- label: I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
2222
required: true
23-
- label: I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new and useful enhancement to share.
23+
- label: I reviewed the [Discussions](https://github.com/ggml-org/llama.cpp/discussions), and have a new and useful enhancement to share.
2424
required: true
2525

2626
- type: textarea

.github/ISSUE_TEMPLATE/030-research.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ body:
66
- type: markdown
77
attributes:
88
value: |
9-
Don't forget to check for any [duplicate research issue tickets](https://github.com/ggerganov/llama.cpp/issues?q=is%3Aopen+is%3Aissue+label%3A%22research+%F0%9F%94%AC%22)
9+
Don't forget to check for any [duplicate research issue tickets](https://github.com/ggml-org/llama.cpp/issues?q=is%3Aopen+is%3Aissue+label%3A%22research+%F0%9F%94%AC%22)
1010
1111
- type: checkboxes
1212
id: research-stage

.github/ISSUE_TEMPLATE/040-refactor.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@ body:
66
- type: markdown
77
attributes:
88
value: |
9-
Don't forget to [check for existing refactor issue tickets](https://github.com/ggerganov/llama.cpp/issues?q=is%3Aopen+is%3Aissue+label%3Arefactoring) in case it's already covered.
10-
Also you may want to check [Pull request refactor label as well](https://github.com/ggerganov/llama.cpp/pulls?q=is%3Aopen+is%3Apr+label%3Arefactoring) for duplicates too.
9+
Don't forget to [check for existing refactor issue tickets](https://github.com/ggml-org/llama.cpp/issues?q=is%3Aopen+is%3Aissue+label%3Arefactoring) in case it's already covered.
10+
Also you may want to check [Pull request refactor label as well](https://github.com/ggml-org/llama.cpp/pulls?q=is%3Aopen+is%3Apr+label%3Arefactoring) for duplicates too.
1111
1212
- type: textarea
1313
id: background-description

.github/ISSUE_TEMPLATE/config.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
blank_issues_enabled: true
22
contact_links:
33
- name: Got an idea?
4-
url: https://github.com/ggerganov/llama.cpp/discussions/categories/ideas
4+
url: https://github.com/ggml-org/llama.cpp/discussions/categories/ideas
55
about: Pop it there. It may then become an enhancement ticket.
66
- name: Got a question?
7-
url: https://github.com/ggerganov/llama.cpp/discussions/categories/q-a
7+
url: https://github.com/ggml-org/llama.cpp/discussions/categories/q-a
88
about: Ask a question there!
99
- name: Want to contribute?
10-
url: https://github.com/ggerganov/llama.cpp/wiki/contribute
10+
url: https://github.com/ggml-org/llama.cpp/wiki/contribute
1111
about: Head to the contribution guide page of the wiki for areas you can help with

.github/pull_request_template.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
*Make sure to read the [contributing guidelines](https://github.com/ggerganov/llama.cpp/blob/master/CONTRIBUTING.md) before submitting a PR*
1+
*Make sure to read the [contributing guidelines](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md) before submitting a PR*

.github/workflows/bench.yml.disabled

Lines changed: 1 addition & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# TODO: there have been some issues with the workflow, so disabling for now
2-
# https://github.com/ggerganov/llama.cpp/issues/7893
2+
# https://github.com/ggml-org/llama.cpp/issues/7893
33
#
44
# Benchmark
55
name: Benchmark
@@ -57,17 +57,7 @@ jobs:
5757

5858
if: |
5959
inputs.gpu-series == 'Standard_NC4as_T4_v3'
60-
|| (
61-
github.event_name == 'schedule'
62-
&& github.ref_name == 'master'
63-
&& github.repository_owner == 'ggerganov'
64-
)
6560
|| github.event_name == 'pull_request_target'
66-
|| (
67-
github.event_name == 'push'
68-
&& github.event.ref == 'refs/heads/master'
69-
&& github.repository_owner == 'ggerganov'
70-
)
7161
steps:
7262
- name: Clone
7363
id: checkout

.github/workflows/build.yml

Lines changed: 47 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ jobs:
129129
run: |
130130
sysctl -a
131131
# Metal is disabled due to intermittent failures with Github runners not having a GPU:
132-
# https://github.com/ggerganov/llama.cpp/actions/runs/8635935781/job/23674807267#step:5:2313
132+
# https://github.com/ggml-org/llama.cpp/actions/runs/8635935781/job/23674807267#step:5:2313
133133
cmake -B build \
134134
-DCMAKE_BUILD_RPATH="@loader_path" \
135135
-DLLAMA_FATAL_WARNINGS=ON \
@@ -173,7 +173,15 @@ jobs:
173173
name: llama-bin-macos-x64.zip
174174

175175
ubuntu-cpu-cmake:
176-
runs-on: ubuntu-22.04
176+
strategy:
177+
matrix:
178+
include:
179+
- build: 'x64'
180+
os: ubuntu-22.04
181+
- build: 'arm64'
182+
os: ubuntu-22.04-arm
183+
184+
runs-on: ${{ matrix.os }}
177185

178186
steps:
179187
- name: Clone
@@ -239,14 +247,14 @@ jobs:
239247
run: |
240248
cp LICENSE ./build/bin/
241249
cp examples/run/linenoise.cpp/LICENSE ./build/bin/LICENSE.linenoise.cpp
242-
zip -r llama-${{ steps.tag.outputs.name }}-bin-ubuntu-x64.zip ./build/bin/*
250+
zip -r llama-${{ steps.tag.outputs.name }}-bin-ubuntu-${{ matrix.build }}.zip ./build/bin/*
243251
244252
- name: Upload artifacts
245253
if: ${{ ( github.event_name == 'push' && github.ref == 'refs/heads/master' ) || github.event.inputs.create_release == 'true' }}
246254
uses: actions/upload-artifact@v4
247255
with:
248-
path: llama-${{ steps.tag.outputs.name }}-bin-ubuntu-x64.zip
249-
name: llama-bin-ubuntu-x64.zip
256+
path: llama-${{ steps.tag.outputs.name }}-bin-ubuntu-${{ matrix.build }}.zip
257+
name: llama-bin-ubuntu-${{ matrix.build }}.zip
250258

251259
ubuntu-latest-cmake-sanitizer:
252260
runs-on: ubuntu-latest
@@ -374,6 +382,8 @@ jobs:
374382
- name: Clone
375383
id: checkout
376384
uses: actions/checkout@v4
385+
with:
386+
fetch-depth: 0
377387

378388
- name: ccache
379389
uses: hendrikmuhs/ccache-action@v1.2.16
@@ -401,7 +411,35 @@ jobs:
401411
run: |
402412
cd build
403413
# This is using llvmpipe and runs slower than other backends
404-
ctest -L main --verbose --timeout 1800
414+
ctest -L main --verbose --timeout 2700
415+
416+
- name: Determine tag name
417+
id: tag
418+
shell: bash
419+
run: |
420+
BUILD_NUMBER="$(git rev-list --count HEAD)"
421+
SHORT_HASH="$(git rev-parse --short=7 HEAD)"
422+
if [[ "${{ env.BRANCH_NAME }}" == "master" ]]; then
423+
echo "name=b${BUILD_NUMBER}" >> $GITHUB_OUTPUT
424+
else
425+
SAFE_NAME=$(echo "${{ env.BRANCH_NAME }}" | tr '/' '-')
426+
echo "name=${SAFE_NAME}-b${BUILD_NUMBER}-${SHORT_HASH}" >> $GITHUB_OUTPUT
427+
fi
428+
429+
- name: Pack artifacts
430+
id: pack_artifacts
431+
if: ${{ ( github.event_name == 'push' && github.ref == 'refs/heads/master' ) || github.event.inputs.create_release == 'true' }}
432+
run: |
433+
cp LICENSE ./build/bin/
434+
cp examples/run/linenoise.cpp/LICENSE ./build/bin/LICENSE.linenoise.cpp
435+
zip -r llama-${{ steps.tag.outputs.name }}-bin-ubuntu-vulkan-x64.zip ./build/bin/*
436+
437+
- name: Upload artifacts
438+
if: ${{ ( github.event_name == 'push' && github.ref == 'refs/heads/master' ) || github.event.inputs.create_release == 'true' }}
439+
uses: actions/upload-artifact@v4
440+
with:
441+
path: llama-${{ steps.tag.outputs.name }}-bin-ubuntu-vulkan-x64.zip
442+
name: llama-bin-ubuntu-vulkan-x64.zip
405443

406444
ubuntu-22-cmake-hip:
407445
runs-on: ubuntu-22.04
@@ -443,7 +481,7 @@ jobs:
443481
444482
ubuntu-22-cmake-musa:
445483
runs-on: ubuntu-22.04
446-
container: mthreads/musa:rc3.1.0-devel-ubuntu22.04
484+
container: mthreads/musa:rc3.1.1-devel-ubuntu22.04
447485

448486
steps:
449487
- name: Clone
@@ -1345,8 +1383,10 @@ jobs:
13451383

13461384
needs:
13471385
- ubuntu-cpu-cmake
1386+
- ubuntu-22-cmake-vulkan
13481387
- windows-latest-cmake
13491388
- windows-2019-cmake-cuda
1389+
- windows-latest-cmake-sycl
13501390
- windows-latest-cmake-hip-release
13511391
- macOS-latest-cmake-arm64
13521392
- macOS-latest-cmake-x64

.github/workflows/docker.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,8 @@ jobs:
5151

5252
- name: Set up QEMU
5353
uses: docker/setup-qemu-action@v3
54+
with:
55+
image: tonistiigi/binfmt:qemu-v7.0.0-28
5456

5557
- name: Set up Docker Buildx
5658
uses: docker/setup-buildx-action@v3

.github/workflows/labeler.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ jobs:
1111
steps:
1212
- uses: actions/checkout@v4
1313
with:
14-
repository: "ggerganov/llama.cpp"
14+
repository: "ggml-org/llama.cpp"
1515
- uses: actions/labeler@v5
1616
with:
1717
configuration-path: '.github/labeler.yml'

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@ lcov-report/
4545
tags
4646
.build/
4747
build*
48+
release
49+
debug
4850
!build-info.cmake
4951
!build-info.cpp.in
5052
!build-info.sh
@@ -98,6 +100,7 @@ examples/server/*.css.hpp
98100
examples/server/*.html.hpp
99101
examples/server/*.js.hpp
100102
examples/server/*.mjs.hpp
103+
examples/server/*.gz.hpp
101104
!build_64.sh
102105
!examples/*.bat
103106
!examples/*/*.kts

CONTRIBUTING.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,20 @@
11
# Pull requests (for contributors)
22

3+
- llama.cpp uses the ggml tensor library for model evaluation. If you are unfamiliar with ggml, consider taking a look at the [examples in the ggml repository](https://github.com/ggml-org/ggml/tree/master/examples/). [simple](https://github.com/ggml-org/ggml/tree/master/examples/simple) shows the bare minimum for using ggml. [gpt-2](https://github.com/ggml-org/ggml/tree/master/examples/gpt-2) has minimal implementations for language model inference using GPT-2. [mnist](https://github.com/ggml-org/ggml/tree/master/examples/mnist) demonstrates how to train and evaluate a simple image classifier
34
- Test your changes:
45
- Execute [the full CI locally on your machine](ci/README.md) before publishing
56
- Verify that the perplexity and the performance are not affected negatively by your changes (use `llama-perplexity` and `llama-bench`)
67
- If you modified the `ggml` source, run the `test-backend-ops` tool to check whether different backend implementations of the `ggml` operators produce consistent results (this requires access to at least two different `ggml` backends)
78
- If you modified a `ggml` operator or added a new one, add the corresponding test cases to `test-backend-ops`
9+
- Create separate PRs for each feature or fix. Avoid combining unrelated changes in a single PR
810
- Consider allowing write access to your branch for faster reviews, as reviewers can push commits directly
911
- If your PR becomes stale, don't hesitate to ping the maintainers in the comments
1012

1113
# Pull requests (for collaborators)
1214

1315
- Squash-merge PRs
1416
- Use the following format for the squashed commit title: `<module> : <commit title> (#<issue_number>)`. For example: `utils : fix typo in utils.py (#1234)`
15-
- Optionally pick a `<module>` from here: https://github.com/ggerganov/llama.cpp/wiki/Modules
17+
- Optionally pick a `<module>` from here: https://github.com/ggml-org/llama.cpp/wiki/Modules
1618
- Consider adding yourself to [CODEOWNERS](CODEOWNERS)
1719

1820
# Coding guidelines
@@ -37,17 +39,17 @@
3739
3840
_(NOTE: this guideline is yet to be applied to the `llama.cpp` codebase. New code should follow this guideline.)_
3941
40-
- Try to follow the existing patterns in the code (indentation, spaces, etc.). In case of doubt use `clang-format` to format the added code
42+
- Try to follow the existing patterns in the code (indentation, spaces, etc.). In case of doubt use `clang-format` (from clang-tools v15+) to format the added code
4143
- For anything not covered in the current guidelines, refer to the [C++ Core Guidelines](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines)
4244
- Tensors store data in row-major order. We refer to dimension 0 as columns, 1 as rows, 2 as matrices
43-
- Matrix multiplication is unconventional: [`C = ggml_mul_mat(ctx, A, B)`](https://github.com/ggerganov/llama.cpp/blob/880e352277fc017df4d5794f0c21c44e1eae2b84/ggml.h#L1058-L1064) means $C^T = A B^T \Leftrightarrow C = B A^T.$
45+
- Matrix multiplication is unconventional: [`C = ggml_mul_mat(ctx, A, B)`](https://github.com/ggml-org/llama.cpp/blob/880e352277fc017df4d5794f0c21c44e1eae2b84/ggml.h#L1058-L1064) means $C^T = A B^T \Leftrightarrow C = B A^T.$
4446
4547
![matmul](media/matmul.png)
4648
4749
# Naming guidelines
4850
4951
- Use `snake_case` for function, variable and type names
50-
- Naming usually optimizes for longest common prefix (see https://github.com/ggerganov/ggml/pull/302#discussion_r1243240963)
52+
- Naming usually optimizes for longest common prefix (see https://github.com/ggml-org/ggml/pull/302#discussion_r1243240963)
5153
5254
```cpp
5355
// not OK
@@ -122,4 +124,4 @@
122124
123125
The Github issues, PRs and discussions contain a lot of information that can be useful to get familiar with the codebase. For convenience, some of the more important information is referenced from Github projects:
124126
125-
https://github.com/ggerganov/llama.cpp/projects
127+
https://github.com/ggml-org/llama.cpp/projects

0 commit comments

Comments
 (0)