You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/Fuzzing Platform.md
+50-41Lines changed: 50 additions & 41 deletions
Original file line number
Diff line number
Diff line change
@@ -2,21 +2,22 @@
2
2
3
3
**Problem:** fuzzing is a versatile technique for generating values to be used as method arguments. Normally,
4
4
to generate values, one needs information on a method signature, or rather on the parameter types (if a fuzzer is
5
-
able to "understand" them). _White-box_ approach also requires AST, and _grey-box_ approach needs coverage
5
+
able to "understand" them).
6
+
The _white-box_ approach also requires AST, and the _grey-box_ approach needs coverage
6
7
information. To generate values that may serve as method arguments, the fuzzer uses generators, mutators, and
7
8
predefined values.
8
9
9
10
*_Generators_ yield concrete objects created by descriptions. The basic description for creating objects is _type_.
10
-
Constants, regular expressions, and other structured object specifications (e.g. in HTML) may be also used as
11
+
Constants, regular expressions, and other structured object specifications (e.g. in HTML) may also be used as
11
12
descriptions.
12
13
13
14
*_Mutators_ modify the object in accordance with some logic that usually means random changes. To get better
14
15
results, mutators obtain feedback (information on coverage and the inner state of the
15
16
program) during method call.
16
17
17
-
*_Predefined values_ work well for known problems, e.g. incorrect symbol sequences. To discover potential problems one can analyze parameter names as well as the specific constructs or method calls inside the method body.
18
+
*_Predefined values_ work well for known problems, e.g. incorrect symbol sequences. To discover potential problems, one can analyze parameter names as well as the specific constructs or method calls inside the method body.
18
19
19
-
General API for using fuzzer looks like this:
20
+
The general API for using the fuzzer looks like this:
20
21
21
22
```
22
23
fuzz(
@@ -29,9 +30,14 @@ fuzz(
29
30
}
30
31
```
31
32
32
-
Fuzzer accepts list of types which can be provided in different formats: string, object or Class<*> in Java. Then seed
33
-
generator accepts these types and produces seeds which are used as base objects for value generation and mutations.
34
-
Fuzzing logic about how to choose, combine and mutate values from seed set is only fuzzing responsibility. API should not provide such abilities except general fuzzing configuring.
33
+
The fuzzer gets the list of types,
34
+
which can be provided in different formats: as a string, an object, or a Class<*> in Java.
35
+
The seed generator accepts these types and produces seeds.
36
+
The seeds are base objects for value generation and mutations.
37
+
38
+
It is the fuzzer, which is responsible for choosing, combining and mutating values from the seed set.
39
+
The fuzzer API should not provide access to the inner fuzzing logic.
40
+
Only general configuration is available.
35
41
36
42
## Parameters
37
43
@@ -42,26 +48,34 @@ The general fuzzing process gets the list of parameter descriptions as input and
42
48
```
43
49
44
50
In this particular case, the fuzzing process can generate the set of all the pairs having integer as the first value
45
-
and `true` or `false` as the second one. If values `-3, 0, 10` are generated to be the `Int` values, the set of all the possible combinations has six items: `(-3, false), (0, false), (10, false), (-3, true), (0, true), (10, true)`. Depending on the programming language, one may use interface descriptions or annotations (type hints) instead of defining the specific type. Fuzzing platform (FP) is not able to create the concrete objects as it does not deal with the specific languages. It still can convert the descriptions to the known constructs it can work with.
46
-
47
-
Say, in most of the programming languages, any integer may be represented as a bit array, and fuzzer can construct and
48
-
modify bit arrays. So, in general case, the boundary values for the integer are these bit arrays:
49
-
50
-
*[0, 0, 0, ..., 0] - null
51
-
*[1, 0, 0, ..., 0] - minimum value
52
-
*[0, 1, 1, ..., 1] - maximum value
53
-
*[0, 0, ..., 0, 1] - plus 1
54
-
*[1, 1, 1, ..., 1] - minus 1
51
+
and `true` or `false` as the second one.
52
+
If values `-3, 0, 10` are generated to be the `Int` values, the set of all the possible combinations has six items:
one may use interface descriptions or annotations (type hints) instead of defining the specific type.
56
+
Fuzzing platform (FP) is not able to create the concrete objects as it does not deal with the specific languages.
57
+
It can still convert the descriptions to the known constructs it can work with.
58
+
59
+
Say, in most of the programming languages, any integer may be represented as a bit array, and the fuzzer can construct and
60
+
modify bit arrays. So, in the general case, the boundary values for the integer are these bit arrays:
61
+
62
+
*[0, 0, 0, ..., 0] — zero
63
+
*[1, 0, 0, ..., 0] — minimum value
64
+
*[0, 1, 1, ..., 1] — maximum value
65
+
*[0, 0, ..., 0, 1] — plus 1
66
+
*[1, 1, 1, ..., 1] — minus 1
55
67
56
68
One can correctly use this representation for unsigned integers as well:
57
69
58
-
*[0, 0, 0, ..., 0]- null (minimum value)
59
-
*[1, 0, 0, ..., 0]- maximum value / 2
60
-
*[0, 1, 1, ..., 1]- maximum value / 2 + 1
61
-
*[0, 0, ..., 0, 1]- plus 1
62
-
*[1, 1, 1, ..., 1]- maximum value
70
+
*[0, 0, 0, ..., 0]— zero (minimum value)
71
+
*[1, 0, 0, ..., 0]— maximum value / 2
72
+
*[0, 1, 1, ..., 1]— maximum value / 2 + 1
73
+
*[0, 0, ..., 0, 1]— plus 1
74
+
*[1, 1, 1, ..., 1]— maximum value
63
75
64
-
Thus, FP interprets the _Byte_ and _Unsigned Byte_ descriptions in different ways: in the former case, the maximum value is [0, 1, 1, 1, 1, 1, 1, 1], while in the latter case it is [1, 1, 1, 1, 1, 1, 1, 1]. FP types are described in details further.
76
+
Thus, FP interprets the _Byte_ and _Unsigned Byte_ descriptions in different ways: in the former case,
77
+
the maximum value is [0, 1, 1, 1, 1, 1, 1, 1], while in the latter case it is [1, 1, 1, 1, 1, 1, 1, 1].
78
+
FP types are described in detail further.
65
79
66
80
## Refined parameter description
67
81
@@ -79,19 +93,21 @@ public boolean isNaN(Number n) {
79
93
In the above example, let the parameter be `Integer`. Considering the feedback, the fuzzer suggests that nothing but `Double` might increase coverage, so the type may be downcasted to `Double`. This allows for filtering out a priori unfitting values.
80
94
81
95
## Statically and dynamically generated values
82
-
Predefined, or _statically_ generated, values help to define the initial range of values, which could be used as method arguments. These values allow us to:
96
+
Predefined, or _statically_ generated, values help to define the initial range of values, which could be used as method arguments.
83
97
84
-
* check if it is possible to call the given method with at least some set of values as arguments,
85
-
* gather statistics on executing the program,
98
+
These values allow us to:
99
+
* check if it is possible to call the given method with at least some set of values as arguments;
100
+
* gather statistics on executing the program;
86
101
* refine the parameter description.
87
102
88
103
_Dynamic_ values are generated in two ways:
89
-
90
-
* internally — via mutating the existing values, successfully performed as method arguments (i.e. seeds);
91
-
* externally — via obtaining feedback that can return not only the statistics on the execution (the paths explored,
104
+
* internally, via mutating the existing values, successfully performed as method arguments (i.e. seeds);
105
+
* externally, via obtaining feedback that can return not only the statistics on the execution (the paths explored,
92
106
the time spent, etc.) but also the set of new values to be blended with the values already in use.
93
107
94
-
Dynamic values should have the higher priority for a sample, that's why they should be chosen either first or at least more likely than the statically generated ones. In general, the algorithm that guides the fuzzing process looks like this:
108
+
Dynamic values should have a higher priority for a sample;
109
+
that is why they should be chosen either first or at least more likely than the statically generated ones.
110
+
In general, the algorithm that guides the fuzzing process looks like this:
95
111
96
112
```
97
113
# dynamic values are stored with respect to their return priority
@@ -135,7 +151,6 @@ Sometimes it is reasonable to modify the source code so that it makes applying f
135
151
## Generators
136
152
137
153
There are two types of generators:
138
-
139
154
* yielding values of primitive data types: integers, strings, booleans
140
155
* yielding values of recursive data types: objects, lists
141
156
@@ -146,39 +161,33 @@ three
146
161
modifications for it using `put(key, value)`. For this purpose, you may request for applying the fuzzer to six
147
162
parameters `(key, value, key, value, key, value)` and get the necessary modified values.
148
163
149
-
Primitive type generators allow for yielding
150
-
164
+
Primitive type generators allow for yielding:
151
165
1. Signed integers of a given size (8, 16, 32, and 64 bits, usually)
152
166
2. Unsigned integers of a given size
153
167
3. Floating-point numbers with a given size of significand and exponent according to IEEE 754
154
168
4. Booleans: _True_ and _False_
155
169
5. Characters (in UTF-16 format)
156
170
6. Strings (consisting of UTF-16 characters)
157
171
158
-
Fuzzer should be able to provide out-of-the-box support for these types — be able to create, modify, and process
159
-
them. To work with multiple languages it is enough to specify the possible type size and to describe and create the
172
+
The fuzzer should be able to provide out-of-the-box support for these types — be able to create, modify, and process
173
+
them.
174
+
To work with multiple languages, it is enough to specify the possible type size and to describe and create
160
175
concrete objects based on the FP-generated values.
161
176
162
177
The recursive types include two categories:
163
-
164
178
* Collections (arrays and lists)
165
179
* Objects
166
180
167
181
Collections may be nested and have _n_ dimensions (one, two, three, or more).
168
182
169
183
Collections may be:
170
-
171
184
* of a fixed size (e.g., arrays)
172
185
* of a variable size (e.g., lists and dictionaries)
173
186
174
187
Objects may have:
175
-
176
188
1. Constructors with parameters
177
-
178
189
2. Modifiable inner fields
179
-
180
190
3. Modifiable global values (the static ones)
181
-
182
191
4. Calls for modifying methods
183
192
184
193
FP should be able to create and describe such objects in the form of a tree. The semantics of actual modifications is under the responsibility of a programming language.
Copy file name to clipboardExpand all lines: docs/OverallArchitecture.md
+22-7Lines changed: 22 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -112,7 +112,7 @@ sequenceDiagram
112
112
The plugin provides
113
113
* a UI for the IntelliJ-based IDEs to use UnitTestBot directly from source code,
114
114
* the linkage between IntelliJ Platform API and UnitTestBot API,
115
-
* support for the most popular programming languages and frameworks for end users (the plugin and its optional dependencies are described in [plugin.xml](https://github.com/UnitTestBot/UTBotJava/blob/main/utbot-intellij/src/main/resources/META-INF/plugin.xml) and nearby, in the [`META-INF`](https://github.com/UnitTestBot/UTBotJava/tree/main/utbot-intellij/src/main/resources/META-INF) folder.
115
+
* support for the most popular programming languages and frameworks for end users (the plugin and its optional dependencies are described in [plugin.xml](https://github.com/UnitTestBot/UTBotJava/blob/main/utbot-intellij/src/main/resources/META-INF/plugin.xml) and nearby, in the [`META-INF`](https://github.com/UnitTestBot/UTBotJava/tree/main/utbot-intellij/src/main/resources/META-INF) folder).
116
116
117
117
The main plugin module is [utbot-intellij](https://github.com/UnitTestBot/UTBotJava/tree/main/utbot-intellij), providing support for Java and Kotlin.
118
118
Also, there is an auxiliary [utbot-ui-commons](https://github.com/UnitTestBot/UTBotJava/tree/main/utbot-ui-commons) module to support providers for other languages.
@@ -124,7 +124,7 @@ As for the UI, there are two entry points:
124
124
The main plugin-specific features are:
125
125
* A common action for generating tests right from the editor or a project tree — with a generation scope from a single method up to the whole source root. See [GenerateTestAction](https://github.com/UnitTestBot/UTBotJava/blob/main/utbot-intellij/src/main/kotlin/org/utbot/intellij/plugin/ui/actions/GenerateTestsAction.kt) — the same for all supported languages.
126
126
* Auto-installation of the user-chosen testing framework as a project library dependency (JUnit 4, JUnit 5, and TestNG are supported). See [UtIdeaProjectModelModifier](https://github.com/UnitTestBot/UTBotJava/blob/main/utbot-intellij/src/main/kotlin/org/utbot/intellij/plugin/util/UtIdeaProjectModelModifier.kt) and the Maven-specific version: [UtMavenProjectModelModifier](https://github.com/UnitTestBot/UTBotJava/blob/main/utbot-intellij/src/main/kotlin/org/utbot/intellij/plugin/util/UtMavenProjectModelModifier.kt).
127
-
* Suggesting the location for a test source root and auto-generating the `utbot_tests` folder there, providing users with a sandbox in their codespace.
127
+
* Suggesting the location for a test source root and auto-generating the `utbot_tests` folder there, providing users with a sandbox in their code space.
128
128
* Optimizing generated code with IDE-provided intentions (experimental). See [IntentionHelper](https://github.com/UnitTestBot/UTBotJava/blob/main/utbot-intellij/src/main/kotlin/org/utbot/intellij/plugin/generator/IntentionHelper.kt) for details.
129
129
* An option for distributing generation time between symbolic execution and fuzzing explicitly.
130
130
* Running generated tests while showing coverage with the IDE-provided measurement tools. See [RunConfigurationHelper](https://github.com/UnitTestBot/UTBotJava/blob/main/utbot-intellij/src/main/kotlin/org/utbot/intellij/plugin/util/RunConfigurationHelper.kt) for implementation.
@@ -241,7 +241,10 @@ The main instrumentation of UnitTestBot is [UtExecutionInstrumentation](https://
241
241
### Code generator
242
242
243
243
Code generation and rendering are a part of the test generation process in UnitTestBot.
244
-
UnitTestBot gets the synthetic representation of generated test cases from the fuzzer or the symbolic engine. This representation, or model, is implemented in the `UtExecution` class. The `codegen` module generates the real test code based on this `UtExecution` model and renders it in a human-readable form.
244
+
UnitTestBot gets the synthetic representation of generated test cases from the fuzzer or the symbolic engine.
245
+
This representation (or model) is implemented in the `UtExecution` class.
246
+
The `codegen` module generates the real test code based on this `UtExecution` model
247
+
and renders it in a human-readable form.
245
248
246
249
The `codegen` module
247
250
- converts `UtExecution` test information into an Abstract Syntax Tree (AST) representation using `CodeGenerator`,
@@ -287,7 +290,7 @@ To minimize the number of executions in a group, we use a simple greedy algorith
287
290
2. Add this execution to the final suite and mark new lines as covered.
288
291
3. Repeat the first step and continue till there are executions containing uncovered lines.
289
292
290
-
The whole minimization procedure is located in the [org.utbopt.framework.minimization](utbot-framework/src/main/kotlin/org/utbot/framework/minimization) package inside the [utbot-framework](../utbot-framework) module.
293
+
The whole minimization procedure is located in the [org.utbot.framework.minimization](../utbot-framework/src/main/kotlin/org/utbot/framework/minimization) package inside the [utbot-framework](../utbot-framework) module.
291
294
292
295
### Summarization module
293
296
@@ -309,7 +312,7 @@ For detailed information, please refer to the Summarization architecture design
309
312
310
313
### SARIF report generator
311
314
312
-
SARIF (Static Analysis Results Interchange Format) is a JSON–based format for displaying static analysis results.
315
+
SARIF (Static Analysis Results Interchange Format) is a JSON-based format for displaying static analysis results.
313
316
314
317
All the necessary information about the format and its usage can be found
315
318
in the [official documentation](https://github.com/microsoft/sarif-tutorials/blob/main/README.md)
@@ -346,7 +349,8 @@ UnitTestBot consists of three processes (according to the execution order):
346
349
347
350
These processes are built on top of the [Reactive distributed communication framework (Rd)](https://github.com/JetBrains/rd) developed by JetBrains.
348
351
349
-
One of the main Rd concepts is _Lifetime_ — it helps to release shared resources upon the object's termination. You can find the Rd basic ideas and UnitTestBot implementation details in the [Multiprocess architecture](https://github.com/UnitTestBot/UTBotJava/blob/main/docs/RD%20for%20UnitTestBot.md) design doc.
352
+
One of the main Rd concepts is a _Lifetime_ — it helps to release shared resources upon the object's termination.
353
+
You can find the Rd basic ideas and UnitTestBot implementation details in the [Multiprocess architecture](https://github.com/UnitTestBot/UTBotJava/blob/main/docs/RD%20for%20UnitTestBot.md) design doc.
350
354
351
355
### Settings
352
356
@@ -362,4 +366,15 @@ The end user has three places to change UnitTestBot behavior:
362
366
3. Controls in the **Generate Tests with UnitTestBot window** dialog — for per-generation settings.
363
367
364
368
### Logging
365
-
TODO
369
+
370
+
The UnitTestBot Java logging system is implemented across the IDE process, the Engine process, and the Instrumented process.
371
+
372
+
UnitTestBot Java logging relies on `log4j2` library.
373
+
The custom Rd logging system is recommended as the default one for the Instrumented process.
374
+
375
+
In the [Logging](../docs/contributing/InterProcessLogging.md) document,
376
+
you can find how to configure the logging system when UnitTestBot Java is used
377
+
* as an IntelliJ IDEA plugin,
378
+
* as Contest estimator or the Gradle/Maven plugins, via CLI or during the CI test runs.
379
+
380
+
Implementation details, log level and performance questions are also addressed [here](../docs/contributing/InterProcessLogging.md).
0 commit comments