Skip to content

Commit abbf596

Browse files
authored
Scripts for jlearch (#279)
Add scripts for jlearch
1 parent 48e5641 commit abbf596

File tree

15 files changed

+790
-7
lines changed

15 files changed

+790
-7
lines changed
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Pipeline diagram
2+
3+
```mermaid
4+
graph TD
5+
Projects --> ContestEstimator
6+
Selectors --> ContestEstimator
7+
subgraph FeatureGeneration
8+
ContestEstimator --> Tests
9+
Tests --> Features
10+
Tests --> Rewards
11+
end
12+
13+
14+
Features --> Data
15+
Rewards --> Data
16+
17+
Data --> Models
18+
Models --> NNRewardGuidedSelector --> UsualTestGeneration
19+
```
20+
21+
# Training
22+
23+
Briefly:
24+
25+
* Get dataset `D` by running `ContestEstimator` on several projects using several selectors.
26+
* Train `model_0` using `D`
27+
* For several `iterations` repeat (assume we on `i`-th step):
28+
* Get dataset `D'` by running `ContestEstimator` on several projects using `NNRewardGuidedSelector`, which will use `model_i`
29+
* $$D = D \cup D'$$
30+
* Train `model_$(i+1)` using `D`
31+
32+
To do this, you should:
33+
* Be sure that you use `Java 8` by `java` command and set `JAVA_HOME` to `Java 8`.
34+
* Put projects, on which you want to learn in `contest_input/projects` folder, then list classes, on which you want to learn in `contest_input/classes/<project name>/list` (if it is empty, than we will take all classes from project jar).
35+
* Run `pip install -r scripts/requirements.txt`. It is up to you to make it in virtual environment or not.
36+
* List selectors in `scripts/selector_list` and projects in `scripts/prog_list`
37+
* Run `./scripts/train_iteratively.sh `

docs/jlearch/scripts.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# How to use scripts
2+
For each scenario: go to root of `UTBotJava` repository - it is `WORKDIR`.
3+
4+
`PATH_SELECTOR` as argument is `"PATH_SELECTOR_TYPE [PATH_SELECTOR_PATH for NN] [IS_COMBINED (false by default)] [ITERATIONS]"`.
5+
6+
Before start of work run:
7+
```bash
8+
./scripts/prepare.sh
9+
```
10+
11+
It will copy contest resources in `contest_input` folder and build the project, because we use jars, so if you want to change something in code and re-run scripts, then you should run:
12+
```bash
13+
./gradlew clean build -x test
14+
```
15+
16+
## To Train a few iterations of your models:
17+
By default features directory is `eval/features` - it should be created, to change it you should manually do it in source code of scripts.
18+
19+
List projects and selectors on what you want to train in `scripts/prog_list` and `scripts/selector_list`. You will be trained on all methods of all classes from `contest_input/classes/<project name>/list`.
20+
21+
Then just run:
22+
```bash
23+
./scripts/train_iteratively.sh <time_limit> <iterations> <output_dir> <python_command>
24+
```
25+
Python command is your command for python3, in the end of execution you will get iterations models in `<output_dir>` folder and features for each selector and project in `<features_dir>/<selector>/<project>` for `selector` from `selectors_list` and in `<features_dir>/jlearch/<selector>/<prog>` for models.
26+
27+
## To Run Contest Estimator with coverage:
28+
Check that `srcTestDir` with your project exist in `build.gradle` of `utbot-junit-contest`. If it is not then add `build/output/test/<project>`.
29+
30+
Then just run:
31+
```bash
32+
./scripts/run_with_coverage.sh <project> <time_limit> <path_selector> <selector_alias>
33+
```
34+
35+
In the end of execution you will get jacoco report in `eval/jacoco/<project>/<selector_alias>/` folder.
36+
37+
## To estimate quality
38+
Just run:
39+
```bash
40+
./scripts/quality_analysis.sh <project> <selector_aliases, separated by comma>
41+
```
42+
It will take coverage reports from relative report folders (at `eval/jacoco/project/alias`) and generate charts in `$outputDir/<project>/<timestamp>.html`.
43+
`outputDir` can be changed in `QualityAnalysisConfig`. Result file will contain information about 3 metrics:
44+
* $\frac{\sum_{c \in classSet} instCoverage(c)}{|classSet|}$
45+
* $\frac{\sum_{c \in classSet} coveredInstructions(c)}{\sum_{c \in classSet} allInstructions(c)}$
46+
* $\frac{\sum_{c \in classSet} branchCoverage(c)}{|classSet|}$
47+
48+
For each metric for each selector you will have:
49+
* value of metric
50+
* some chart with median, $q_1$, $q_3$ and so on
51+
52+
53+
## To scrap solution classes from codeforces
54+
Note: You can't scrap many classes, because codeforces api has a request limit.
55+
56+
It can be useful, if you want to train Jlearch on classes usually without virtual functions, but with many algorithms, so cycles and conditions.
57+
58+
Just run:
59+
```bash
60+
python3 path/to/codeforces_scrapper.py --problem_count <val> --submission_count <val> --min_rating <val> --max_rating <val> --output_dir <val>
61+
```
62+
63+
All arguments are optional. Default values: `100`, `10`, `0`, `1500`, `.`.
64+
65+
At the end you should get `submission_count` classes for each of `problem_count` problems with rating between `min_rating` and `max_rating` at `output_dir`. Each class have package `p<contest_id>.p<submission_id>`.

docs/jlearch/setup.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# How to setup environment for experiments on Linux
2+
3+
* Clone repository, go to root
4+
* `chmod +x ./scripts/*` and `chmod +x gradlew`.
5+
* Set `Java 8` as default and set `JAVA_HOME` to this `Java`.
6+
For example
7+
* Go through [this](https://sdkman.io/install) until `Windows installation`
8+
* `sdk list java`
9+
* Find any `Java 8`
10+
* `sdk install <this java>`
11+
* `sdk use <this java>`
12+
* Check `java -version`
13+
* `mkdir -p eval/features`
14+
* `mkdir models`
15+
* Set environment for `Python`.
16+
For example
17+
* `python3 -m venv /path/to/new/virtual/environment`
18+
* `source /path/to/venv/bin/activate`
19+
* Check `which python3`, it should be somewhere in `path/to/env` folder.
20+
* `pip install -r scripts/requirements.txt`
21+
* `./scripts/prepare.sh`
22+
* Change `scripts/prog_list` to run on smaller project or delete some classes from `contest_input/classes/<project>/list`.
23+
24+
# Default settings and how to change it
25+
* You can reduce number of models in `models` variable in `scripts/train_iteratively.sh`
26+
* You can change amount of required RAM in `run_contest_estimator.sh`: `16 gb` by default
27+
* You can change `batch_size` or `device` n `train.py`: `4096` and `gpu` by default
28+
* If you are completing setup on server, then you will need to uncomment tmp directory option in `run_contest_estimator.sh`
29+
30+
# Continue setup
31+
* `scripts/train_iteratively.sh 30 2 models <your python3 command>`
32+
* In `models/` you should get models.
33+
* `mkdir eval/jacoco`
34+
* `./scripts/run_with_coverage.sh <any project (guava-26.0, for example)> 30 "NN_REWARD_GUIDED_SELECTOR path/to/model" some_alias`. `path/to/model` should be something like `models/nn32/0`, where `nn32` is a type of model and `0` is the iteration number
35+
* You should get jacoco report in `eval/jacoco/guava-26.0/some_alias/`
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
import argparse
2+
import os.path
3+
import time
4+
5+
import requests
6+
from urllib import request
7+
import json
8+
import bs4
9+
import javalang
10+
11+
from codeforces import CodeforcesAPI
12+
13+
14+
def get_args():
15+
parser = argparse.ArgumentParser()
16+
parser.add_argument("--problem_count", dest='problem_count', type=int, default=100)
17+
parser.add_argument("--submission_count", dest='submission_count', type=int, default=10)
18+
parser.add_argument("--min_rating", dest='min_rating', type=int, default=0)
19+
parser.add_argument("--max_rating", dest="max_rating", type=int, default=1500)
20+
parser.add_argument("--output_dir", dest="output_dir", type=str, default=".")
21+
return parser.parse_args()
22+
23+
24+
args = get_args()
25+
26+
27+
def check_json(answer):
28+
values = json.loads(answer)
29+
30+
if values['status'] == 'OK':
31+
return values['result']
32+
33+
34+
def get_main_name(tree):
35+
"""
36+
:return: class name with main method
37+
"""
38+
return next(klass.name for klass in tree.types
39+
if isinstance(klass, javalang.tree.ClassDeclaration)
40+
for m in klass.methods
41+
if m.name == 'main' and m.modifiers.issuperset({'public', 'static'}))
42+
43+
44+
def save_source_code(contest_id, submission_id):
45+
"""
46+
Parse html page to find source code of submission and save it in some unique package p${contest_id}.p${submission_id}.
47+
If we reach api request bound, then we try to sleep for 5 minutes.
48+
"""
49+
url = request.Request(f"http://codeforces.com/contest/{contest_id}/submission/{submission_id}")
50+
with request.urlopen(url) as req:
51+
soup = bs4.BeautifulSoup(req.read(), "html.parser")
52+
path = os.path.join(args.output_dir, f"p{contest_id}", f"p{submission_id}")
53+
if not os.path.exists(path):
54+
os.makedirs(path)
55+
code = ""
56+
for p in soup.find_all("pre", {"class": "program-source"}):
57+
code += p.get_text()
58+
tree = javalang.parse.parse(code)
59+
try:
60+
name = get_main_name(tree)
61+
with open(os.path.join(path, f"{name}.java"), 'w') as f:
62+
print(f"package p{contest_id}.p{submission_id};", file=f)
63+
f.write(code)
64+
except StopIteration:
65+
print("Sleeping, because we reach request bound")
66+
time.sleep(300)
67+
68+
69+
def main():
70+
codeforces = "http://codeforces.com/api/"
71+
api = CodeforcesAPI()
72+
73+
with request.urlopen(f"{codeforces}problemset.problems") as req:
74+
all_problems = check_json(req.read().decode('utf-8'))
75+
76+
problems = []
77+
cur_problem = 0
78+
for p in all_problems['problems']:
79+
if cur_problem >= args.problem_count:
80+
break
81+
if p.get('rating') is None:
82+
continue
83+
if p['rating'] < args.min_rating or p['rating'] > args.max_rating:
84+
continue
85+
cur_problem += 1
86+
problems.append({'contest_id': p['contestId'], 'index': p['index']})
87+
88+
print(f"Get {len(problems)} problems: {problems[0]}")
89+
90+
"""
91+
For each problem try to take submission_count submissions.
92+
"""
93+
all_submission = 0
94+
for i, p in enumerate(problems):
95+
cur_submission = 0
96+
iteration = 0
97+
page_size = 1000
98+
while cur_submission < args.submission_count:
99+
length = 0
100+
for s in api.contest_status(contest_id=p['contest_id'], from_=page_size * iteration + 1, count=page_size):
101+
if cur_submission >= args.submission_count:
102+
break
103+
length += 1
104+
if s.problem.contest_id != p['contest_id'] or s.problem.index != p['index']:
105+
continue
106+
if s.programming_language != "Java 8":
107+
continue
108+
if s.verdict.name != "ok":
109+
continue
110+
save_source_code(p['contest_id'], s.id)
111+
cur_submission += 1
112+
all_submission += 1
113+
print(f"Get new {all_submission} program")
114+
iteration += 1
115+
if length == 0:
116+
break
117+
118+
119+
if __name__ == "__main__":
120+
main()

scripts/prepare.sh

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
#!/bin/bash
2+
3+
./gradlew clean build -x test
4+
5+
INPUT_FOLDER=contest_input
6+
7+
# Copy resources folder in distinct folder to allow other scripts have specific project in contest resources folder
8+
mkdir $INPUT_FOLDER
9+
cp -r utbot-junit-contest/src/main/resources/* $INPUT_FOLDER

scripts/prog_list

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
antlr

scripts/quality_analysis.sh

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
#!/bin/bash
2+
3+
PROJECT=${1}
4+
SELECTORS=${2}
5+
STMT_COVERAGE=${3}
6+
WORKDIR="."
7+
8+
# We set QualityAnalysisConfig by properties file
9+
SETTING_PROPERTIES_FILE="$WORKDIR/utbot-analytics/src/main/resources/config.properties"
10+
touch $SETTING_PROPERTIES_FILE
11+
echo "project=$PROJECT" > "$SETTING_PROPERTIES_FILE"
12+
echo "selectors=$SELECTORS" >> "$SETTING_PROPERTIES_FILE"
13+
14+
JAR_TYPE="utbot-analytics"
15+
echo "JAR_TYPE: $JAR_TYPE"
16+
LIBS_DIR=utbot-analytics/build/libs/
17+
UTBOT_JAR="$LIBS_DIR$(ls -l $LIBS_DIR | grep $JAR_TYPE | awk '{print $9}')"
18+
echo $UTBOT_JAR
19+
MAIN_CLASS="org.utbot.QualityAnalysisKt"
20+
21+
if [[ -n $STMT_COVERAGE ]]; then
22+
MAIN_CLASS="org.utbot.StmtCoverageReportKt"
23+
fi
24+
25+
26+
27+
#Running the jar
28+
COMMAND_LINE="java $JVM_OPTS -cp $UTBOT_JAR $MAIN_CLASS"
29+
30+
echo "COMMAND=$COMMAND"
31+
32+
$COMMAND_LINE

scripts/requirements.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
beautifulsoup4
2+
javalang
3+
numpy
4+
pandas
5+
requests
6+
scikit_learn
7+
torch

0 commit comments

Comments
 (0)