Skip to content

Commit b919787

Browse files
committed
Updated verilog_eval. Included verilog eval/descriptions. Included example. Updated LICENSE and README.
2 parents 312c5e5 + 5a5c719 commit b919787

21 files changed

+894
-121
lines changed

Dockerfile

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
FROM nvcr.io/nvidia/pytorch:22.08-py3
2+
LABEL maintainer="Mingjie Liu <mingjiel@nvidia.com>"
3+
RUN echo "alias python=python3" >> ~/.bashrc \
4+
&& echo "alias pip=pip3" >> ~/.bashrc
5+
RUN apt-get -y update \
6+
&& apt-get -y install vim
7+
RUN apt-get install wget
8+
RUN apt-get install -y autoconf gperf flex bison screen
9+
RUN python -m pip install --upgrade pip
10+
RUN python -m pip install deepspeed scikit-learn pandas numpy scipy wandb
11+
RUN python -m pip install accelerate>=0.12.0 torch>=1.3 datasets>=1.8.0 sentencepiece!=0.1.92 protobuf evaluate
12+
RUN python -m pip install git+https://github.com/huggingface/transformers/
13+
RUN git clone https://github.com/steveicarus/iverilog.git && cd iverilog \
14+
&& git checkout 01441687235135d1c12eeef920f75d97995da333 \
15+
&& sh ./autoconf.sh && ./configure && make -j4\
16+
&& make install
17+
RUN python -m pip install jupyterlab
18+
RUN python -m pip install openai tiktoken
19+
ENV SHELL=/bin/bash

LICENSE

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,28 @@
1+
MIT License
2+
3+
Copyright (c) 2023 NVIDIA Research Projects
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.
22+
23+
24+
This project contains code from human-eval (https://github.com/openai/human-eval/).
25+
126
The MIT License
227

328
Copyright (c) OpenAI (https://openai.com)
@@ -19,3 +44,5 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
1944
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
2045
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
2146
THE SOFTWARE.
47+
48+

README.md

Lines changed: 65 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,48 @@
1-
# HumanEval: Hand-Written Evaluation Set
1+
# VerilogEval: Evaluating Large Language Models for Verilog Code Generation
22

3-
This is an evaluation harness for the HumanEval problem solving dataset
4-
described in the paper "[Evaluating Large Language Models Trained on
5-
Code](https://arxiv.org/abs/2107.03374)".
3+
This is an evaluation harness for the VerilogEval problem solving dataset
4+
described in the paper "[VerilogEval: Evaluating Large
5+
Language Models for Verilog Code Generation](https://arxiv.org/abs/2309.07544)".
6+
7+
This evaluation dataset consists of 156 problems from the Verilog
8+
instructional website [HDLBits](https://hdlbits.01xz.net/wiki/Problem_sets).
9+
We provide two sets of problem descriptions: machine generated and manually
10+
converted to text-only format.
611

712
## Installation
813

14+
We closely follow guidance from [HumanEval](https://github.com/openai/human-eval/tree/master).
15+
916
Make sure to use python 3.7 or later:
1017
```
1118
$ conda create -n codex python=3.7
1219
$ conda activate codex
1320
```
1421

22+
Install [ICARUS Verilog](https://github.com/steveicarus/iverilog):
23+
```
24+
$ git clone https://github.com/steveicarus/iverilog.git && cd iverilog \
25+
&& git checkout 01441687235135d1c12eeef920f75d97995da333 \
26+
&& sh ./autoconf.sh && ./configure && make -j4\
27+
&& make install
28+
```
29+
30+
It is recommended to use the provided [Dockerfile](https://github.com/NVlabs/verilog-eval/Dockerfile)
31+
which already pre-installed ICARUS Verilog Simulator. Using the docker container
32+
you would still need to complete the following step.
33+
1534
Check out and install this repository:
1635
```
17-
$ git clone https://github.com/openai/human-eval
18-
$ pip install -e human-eval
36+
$ git clone https://github.com/NVlabs/verilog-eval
37+
$ pip install -e verilog-eval
1938
```
2039

2140
## Usage
2241

23-
**This program exists to run untrusted model-generated code. Users are strongly
42+
**This program would make system calls to *iverilog* and *vvp* to simulate
43+
untrusted model-generated code. Users are strongly
2444
encouraged not to do so outside of a robust security sandbox. The [execution
25-
call](https://github.com/openai/human-eval/blob/master/human_eval/execution.py#L48-L58)
45+
call](https://github.com/NVlabs/verilog-eval/blob/main/verilog_eval/execution.py#L79-L112)
2646
in `execution.py` is deliberately commented out to ensure users read this
2747
disclaimer before running code in a potentially unsafe manner. See the comment in
2848
`execution.py` for more information and instructions.**
@@ -31,54 +51,46 @@ After following the above instructions to enable execution, generate samples
3151
and save them in the following JSON Lines (jsonl) format, where each sample is
3252
formatted into a single line like so:
3353
```
34-
{"task_id": "Corresponding HumanEval task ID", "completion": "Completion only without the prompt"}
35-
```
36-
We provide `example_problem.jsonl` and `example_solutions.jsonl` under `data`
37-
to illustrate the format and help with debugging.
38-
39-
Here is nearly functional example code (you just have to provide
40-
`generate_one_completion` to make it work) that saves generated completions to
41-
`samples.jsonl`.
42-
```
43-
from human_eval.data import write_jsonl, read_problems
44-
45-
problems = read_problems()
46-
47-
num_samples_per_task = 200
48-
samples = [
49-
dict(task_id=task_id, completion=generate_one_completion(problems[task_id]["prompt"]))
50-
for task_id in problems
51-
for _ in range(num_samples_per_task)
52-
]
53-
write_jsonl("samples.jsonl", samples)
54+
{"task_id": "Corresponding VerilogEval task ID", "completion": "Completion only without the prompt"}
5455
```
56+
We provide examples under `data/example` to illustrate the format and help with debugging.
5557

5658
To evaluate the samples, run
5759
```
58-
$ evaluate_functional_correctness samples.jsonl
60+
$ evaluate_functional_correctness samples.jsonl --problem_file data/VerilogEval_Human.jsonl
5961
Reading samples...
60-
32800it [00:01, 23787.50it/s]
62+
3120it [00:00, 16077.44it/s]
6163
Running test suites...
62-
100%|...| 32800/32800 [16:11<00:00, 33.76it/s]
64+
100%|...| 3120/3120 [00:32<00:00, 97.47it/s]
65+
Killing all hanging simulation process.
6366
Writing results to samples.jsonl_results.jsonl...
64-
100%|...| 32800/32800 [00:00<00:00, 42876.84it/s]
65-
{'pass@1': ..., 'pass@10': ..., 'pass@100': ...}
67+
100%|...| 3120/3120 [00:00<00:00, 30608.13it/s]
68+
{'pass@1': ..., 'pass@5': ..., 'pass@10': ...}
6669
```
70+
71+
The user must specify `--problem_file` input argument. We provide two sets of problem
72+
evaluations `data/VerilogEval_Machine.jsonl` and `data/VerilogEval_Human.jsonl`.
73+
We also provide problem description files used to sample Verilog code completions
74+
in `descriptions` directory.
75+
6776
This script provides more fine-grained information in a new file ending in
6877
`<input_path>_results.jsonl`. Each row now contains whether the completion
6978
`passed` along with the execution `result` which is one of "passed", "timed
7079
out", or "failed".
7180

72-
As a quick sanity-check, the example samples should yield 0.5 pass@1.
81+
As a quick sanity-check, the example samples should yield 0.5 pass@1. The results can be
82+
verified against the provided output
83+
in `data/example/ExampleSolution.jsonl_reference.jsonl`.
7384
```
74-
$ evaluate_functional_correctness data/example_samples.jsonl --problem_file=data/example_problem.jsonl
85+
$ evaluate_functional_correctness data/example/ExampleSolution.jsonl --problem_file=data/example/ExampleEval.jsonl
7586
Reading samples...
76-
6it [00:00, 3397.11it/s]
87+
6it [00:00, 221.60it/s]
7788
Running example suites...
78-
100%|...| 6/6 [00:03<00:00, 1.96it/s]
79-
Writing results to data/example_samples.jsonl_results.jsonl...
80-
100%|...| 6/6 [00:00<00:00, 6148.50it/s]
81-
{'pass@1': 0.4999999999999999}
89+
100%|...| 6/6 [00:00<00:00, 142.09it/s]
90+
Killing all hanging simulation process.
91+
Writing results to data/example/ExampleSolution.jsonl_results.jsonl...
92+
100%|...| 6/6 [00:00<00:00, 19941.22it/s]
93+
{'pass@1': 0.5}
8294
```
8395

8496
Because there is no unbiased way of estimating pass@k when there are fewer
@@ -90,26 +102,25 @@ $ evaluate_functional_correctness --help
90102
```
91103
However, we recommend that you use the default values for the rest.
92104

93-
## Known Issues
105+
## Issues
106+
Problem descriptions in `descriptions/VerilogDescriptions_Machine.jsonl` are machine
107+
generated and we can not guarantee the absense of ambiguity and errors. We do not plan
108+
to maintain description correctness.
109+
110+
Functional correctness are evaluated through comparing simulation outputs using
111+
[ICARUS Verilog](https://github.com/steveicarus/iverilog). The evaluation of Verilog syntax is limited by the simulator, which might not include all features of Verilog HDL
112+
IEEE-1364 standard.
94113

95-
While evaluation uses very little memory, you might see the following error
96-
message when the system is running out of RAM. Since this may cause some
97-
correct programs to fail, we recommend that you free some memory and try again.
98-
```
99-
malloc: can't allocate region
100-
```
101114

102115
## Citation
103116

104117
Please cite using the following bibtex entry:
105118

106119
```
107-
@article{chen2021codex,
108-
title={Evaluating Large Language Models Trained on Code},
109-
author={Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Ponde de Oliveira Pinto and Jared Kaplan and Harri Edwards and Yuri Burda and Nicholas Joseph and Greg Brockman and Alex Ray and Raul Puri and Gretchen Krueger and Michael Petrov and Heidy Khlaaf and Girish Sastry and Pamela Mishkin and Brooke Chan and Scott Gray and Nick Ryder and Mikhail Pavlov and Alethea Power and Lukasz Kaiser and Mohammad Bavarian and Clemens Winter and Philippe Tillet and Felipe Petroski Such and Dave Cummings and Matthias Plappert and Fotios Chantzis and Elizabeth Barnes and Ariel Herbert-Voss and William Hebgen Guss and Alex Nichol and Alex Paino and Nikolas Tezak and Jie Tang and Igor Babuschkin and Suchir Balaji and Shantanu Jain and William Saunders and Christopher Hesse and Andrew N. Carr and Jan Leike and Josh Achiam and Vedant Misra and Evan Morikawa and Alec Radford and Matthew Knight and Miles Brundage and Mira Murati and Katie Mayer and Peter Welinder and Bob McGrew and Dario Amodei and Sam McCandlish and Ilya Sutskever and Wojciech Zaremba},
110-
year={2021},
111-
eprint={2107.03374},
112-
archivePrefix={arXiv},
113-
primaryClass={cs.LG}
120+
@inproceedings{liu2023verilogeval,
121+
title={{VerilogEval:} Evaluating Large Language Models for Verilog Code Generation},
122+
author={Liu, Mingjie and Pinckney, Nathaniel and Khailany, Brucek and Ren, Haoxing},
123+
booktitle={2023 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)},
124+
year={2023}
114125
}
115126
```

data/VerilogEval_Human.jsonl

Lines changed: 156 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)