Skip to content

Commit 1b38315

Browse files
authored
Clean up OverallArchitecture, Fuzzing and Logging docs (#1873)
* Minor fixes: wording, formatting, punctuation * Replaced null with zero
1 parent f833be0 commit 1b38315

File tree

5 files changed

+322
-298
lines changed

5 files changed

+322
-298
lines changed

docs/Fuzzing Platform.md

Lines changed: 50 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -2,21 +2,22 @@
22

33
**Problem:** fuzzing is a versatile technique for generating values to be used as method arguments. Normally,
44
to generate values, one needs information on a method signature, or rather on the parameter types (if a fuzzer is
5-
able to "understand" them). _White-box_ approach also requires AST, and _grey-box_ approach needs coverage
5+
able to "understand" them).
6+
The _white-box_ approach also requires AST, and the _grey-box_ approach needs coverage
67
information. To generate values that may serve as method arguments, the fuzzer uses generators, mutators, and
78
predefined values.
89

910
* _Generators_ yield concrete objects created by descriptions. The basic description for creating objects is _type_.
10-
Constants, regular expressions, and other structured object specifications (e.g. in HTML) may be also used as
11+
Constants, regular expressions, and other structured object specifications (e.g. in HTML) may also be used as
1112
descriptions.
1213

1314
* _Mutators_ modify the object in accordance with some logic that usually means random changes. To get better
1415
results, mutators obtain feedback (information on coverage and the inner state of the
1516
program) during method call.
1617

17-
* _Predefined values_ work well for known problems, e.g. incorrect symbol sequences. To discover potential problems one can analyze parameter names as well as the specific constructs or method calls inside the method body.
18+
* _Predefined values_ work well for known problems, e.g. incorrect symbol sequences. To discover potential problems, one can analyze parameter names as well as the specific constructs or method calls inside the method body.
1819

19-
General API for using fuzzer looks like this:
20+
The general API for using the fuzzer looks like this:
2021

2122
```
2223
fuzz(
@@ -29,9 +30,14 @@ fuzz(
2930
}
3031
```
3132

32-
Fuzzer accepts list of types which can be provided in different formats: string, object or Class<*> in Java. Then seed
33-
generator accepts these types and produces seeds which are used as base objects for value generation and mutations.
34-
Fuzzing logic about how to choose, combine and mutate values from seed set is only fuzzing responsibility. API should not provide such abilities except general fuzzing configuring.
33+
The fuzzer gets the list of types,
34+
which can be provided in different formats: as a string, an object, or a Class<*> in Java.
35+
The seed generator accepts these types and produces seeds.
36+
The seeds are base objects for value generation and mutations.
37+
38+
It is the fuzzer, which is responsible for choosing, combining and mutating values from the seed set.
39+
The fuzzer API should not provide access to the inner fuzzing logic.
40+
Only general configuration is available.
3541

3642
## Parameters
3743

@@ -42,26 +48,34 @@ The general fuzzing process gets the list of parameter descriptions as input and
4248
```
4349

4450
In this particular case, the fuzzing process can generate the set of all the pairs having integer as the first value
45-
and `true` or `false` as the second one. If values `-3, 0, 10` are generated to be the `Int` values, the set of all the possible combinations has six items: `(-3, false), (0, false), (10, false), (-3, true), (0, true), (10, true)`. Depending on the programming language, one may use interface descriptions or annotations (type hints) instead of defining the specific type. Fuzzing platform (FP) is not able to create the concrete objects as it does not deal with the specific languages. It still can convert the descriptions to the known constructs it can work with.
46-
47-
Say, in most of the programming languages, any integer may be represented as a bit array, and fuzzer can construct and
48-
modify bit arrays. So, in general case, the boundary values for the integer are these bit arrays:
49-
50-
* [0, 0, 0, ..., 0] - null
51-
* [1, 0, 0, ..., 0] - minimum value
52-
* [0, 1, 1, ..., 1] - maximum value
53-
* [0, 0, ..., 0, 1] - plus 1
54-
* [1, 1, 1, ..., 1] - minus 1
51+
and `true` or `false` as the second one.
52+
If values `-3, 0, 10` are generated to be the `Int` values, the set of all the possible combinations has six items:
53+
`(-3, false), (0, false), (10, false), (-3, true), (0, true), (10, true)`.
54+
Depending on the programming language,
55+
one may use interface descriptions or annotations (type hints) instead of defining the specific type.
56+
Fuzzing platform (FP) is not able to create the concrete objects as it does not deal with the specific languages.
57+
It can still convert the descriptions to the known constructs it can work with.
58+
59+
Say, in most of the programming languages, any integer may be represented as a bit array, and the fuzzer can construct and
60+
modify bit arrays. So, in the general case, the boundary values for the integer are these bit arrays:
61+
62+
* [0, 0, 0, ..., 0] — zero
63+
* [1, 0, 0, ..., 0] — minimum value
64+
* [0, 1, 1, ..., 1] — maximum value
65+
* [0, 0, ..., 0, 1] — plus 1
66+
* [1, 1, 1, ..., 1] — minus 1
5567

5668
One can correctly use this representation for unsigned integers as well:
5769

58-
* [0, 0, 0, ..., 0] - null (minimum value)
59-
* [1, 0, 0, ..., 0] - maximum value / 2
60-
* [0, 1, 1, ..., 1] - maximum value / 2 + 1
61-
* [0, 0, ..., 0, 1] - plus 1
62-
* [1, 1, 1, ..., 1] - maximum value
70+
* [0, 0, 0, ..., 0] — zero (minimum value)
71+
* [1, 0, 0, ..., 0] maximum value / 2
72+
* [0, 1, 1, ..., 1] maximum value / 2 + 1
73+
* [0, 0, ..., 0, 1] plus 1
74+
* [1, 1, 1, ..., 1] maximum value
6375

64-
Thus, FP interprets the _Byte_ and _Unsigned Byte_ descriptions in different ways: in the former case, the maximum value is [0, 1, 1, 1, 1, 1, 1, 1], while in the latter case it is [1, 1, 1, 1, 1, 1, 1, 1]. FP types are described in details further.
76+
Thus, FP interprets the _Byte_ and _Unsigned Byte_ descriptions in different ways: in the former case,
77+
the maximum value is [0, 1, 1, 1, 1, 1, 1, 1], while in the latter case it is [1, 1, 1, 1, 1, 1, 1, 1].
78+
FP types are described in detail further.
6579

6680
## Refined parameter description
6781

@@ -79,19 +93,21 @@ public boolean isNaN(Number n) {
7993
In the above example, let the parameter be `Integer`. Considering the feedback, the fuzzer suggests that nothing but `Double` might increase coverage, so the type may be downcasted to `Double`. This allows for filtering out a priori unfitting values.
8094

8195
## Statically and dynamically generated values
82-
Predefined, or _statically_ generated, values help to define the initial range of values, which could be used as method arguments. These values allow us to:
96+
Predefined, or _statically_ generated, values help to define the initial range of values, which could be used as method arguments.
8397

84-
* check if it is possible to call the given method with at least some set of values as arguments,
85-
* gather statistics on executing the program,
98+
These values allow us to:
99+
* check if it is possible to call the given method with at least some set of values as arguments;
100+
* gather statistics on executing the program;
86101
* refine the parameter description.
87102

88103
_Dynamic_ values are generated in two ways:
89-
90-
* internally — via mutating the existing values, successfully performed as method arguments (i.e. seeds);
91-
* externally — via obtaining feedback that can return not only the statistics on the execution (the paths explored,
104+
* internally, via mutating the existing values, successfully performed as method arguments (i.e. seeds);
105+
* externally, via obtaining feedback that can return not only the statistics on the execution (the paths explored,
92106
the time spent, etc.) but also the set of new values to be blended with the values already in use.
93107

94-
Dynamic values should have the higher priority for a sample, that's why they should be chosen either first or at least more likely than the statically generated ones. In general, the algorithm that guides the fuzzing process looks like this:
108+
Dynamic values should have a higher priority for a sample;
109+
that is why they should be chosen either first or at least more likely than the statically generated ones.
110+
In general, the algorithm that guides the fuzzing process looks like this:
95111

96112
```
97113
# dynamic values are stored with respect to their return priority
@@ -135,7 +151,6 @@ Sometimes it is reasonable to modify the source code so that it makes applying f
135151
## Generators
136152

137153
There are two types of generators:
138-
139154
* yielding values of primitive data types: integers, strings, booleans
140155
* yielding values of recursive data types: objects, lists
141156

@@ -146,39 +161,33 @@ three
146161
modifications for it using `put(key, value)`. For this purpose, you may request for applying the fuzzer to six
147162
parameters `(key, value, key, value, key, value)` and get the necessary modified values.
148163

149-
Primitive type generators allow for yielding
150-
164+
Primitive type generators allow for yielding:
151165
1. Signed integers of a given size (8, 16, 32, and 64 bits, usually)
152166
2. Unsigned integers of a given size
153167
3. Floating-point numbers with a given size of significand and exponent according to IEEE 754
154168
4. Booleans: _True_ and _False_
155169
5. Characters (in UTF-16 format)
156170
6. Strings (consisting of UTF-16 characters)
157171

158-
Fuzzer should be able to provide out-of-the-box support for these types — be able to create, modify, and process
159-
them. To work with multiple languages it is enough to specify the possible type size and to describe and create the
172+
The fuzzer should be able to provide out-of-the-box support for these types — be able to create, modify, and process
173+
them.
174+
To work with multiple languages, it is enough to specify the possible type size and to describe and create
160175
concrete objects based on the FP-generated values.
161176

162177
The recursive types include two categories:
163-
164178
* Collections (arrays and lists)
165179
* Objects
166180

167181
Collections may be nested and have _n_ dimensions (one, two, three, or more).
168182

169183
Collections may be:
170-
171184
* of a fixed size (e.g., arrays)
172185
* of a variable size (e.g., lists and dictionaries)
173186

174187
Objects may have:
175-
176188
1. Constructors with parameters
177-
178189
2. Modifiable inner fields
179-
180190
3. Modifiable global values (the static ones)
181-
182191
4. Calls for modifying methods
183192

184193
FP should be able to create and describe such objects in the form of a tree. The semantics of actual modifications is under the responsibility of a programming language.

docs/OverallArchitecture.md

Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ sequenceDiagram
112112
The plugin provides
113113
* a UI for the IntelliJ-based IDEs to use UnitTestBot directly from source code,
114114
* the linkage between IntelliJ Platform API and UnitTestBot API,
115-
* support for the most popular programming languages and frameworks for end users (the plugin and its optional dependencies are described in [plugin.xml](https://github.com/UnitTestBot/UTBotJava/blob/main/utbot-intellij/src/main/resources/META-INF/plugin.xml) and nearby, in the [`META-INF`](https://github.com/UnitTestBot/UTBotJava/tree/main/utbot-intellij/src/main/resources/META-INF) folder.
115+
* support for the most popular programming languages and frameworks for end users (the plugin and its optional dependencies are described in [plugin.xml](https://github.com/UnitTestBot/UTBotJava/blob/main/utbot-intellij/src/main/resources/META-INF/plugin.xml) and nearby, in the [`META-INF`](https://github.com/UnitTestBot/UTBotJava/tree/main/utbot-intellij/src/main/resources/META-INF) folder).
116116

117117
The main plugin module is [utbot-intellij](https://github.com/UnitTestBot/UTBotJava/tree/main/utbot-intellij), providing support for Java and Kotlin.
118118
Also, there is an auxiliary [utbot-ui-commons](https://github.com/UnitTestBot/UTBotJava/tree/main/utbot-ui-commons) module to support providers for other languages.
@@ -124,7 +124,7 @@ As for the UI, there are two entry points:
124124
The main plugin-specific features are:
125125
* A common action for generating tests right from the editor or a project tree — with a generation scope from a single method up to the whole source root. See [GenerateTestAction](https://github.com/UnitTestBot/UTBotJava/blob/main/utbot-intellij/src/main/kotlin/org/utbot/intellij/plugin/ui/actions/GenerateTestsAction.kt) — the same for all supported languages.
126126
* Auto-installation of the user-chosen testing framework as a project library dependency (JUnit 4, JUnit 5, and TestNG are supported). See [UtIdeaProjectModelModifier](https://github.com/UnitTestBot/UTBotJava/blob/main/utbot-intellij/src/main/kotlin/org/utbot/intellij/plugin/util/UtIdeaProjectModelModifier.kt) and the Maven-specific version: [UtMavenProjectModelModifier](https://github.com/UnitTestBot/UTBotJava/blob/main/utbot-intellij/src/main/kotlin/org/utbot/intellij/plugin/util/UtMavenProjectModelModifier.kt).
127-
* Suggesting the location for a test source root and auto-generating the `utbot_tests` folder there, providing users with a sandbox in their codespace.
127+
* Suggesting the location for a test source root and auto-generating the `utbot_tests` folder there, providing users with a sandbox in their code space.
128128
* Optimizing generated code with IDE-provided intentions (experimental). See [IntentionHelper](https://github.com/UnitTestBot/UTBotJava/blob/main/utbot-intellij/src/main/kotlin/org/utbot/intellij/plugin/generator/IntentionHelper.kt) for details.
129129
* An option for distributing generation time between symbolic execution and fuzzing explicitly.
130130
* Running generated tests while showing coverage with the IDE-provided measurement tools. See [RunConfigurationHelper](https://github.com/UnitTestBot/UTBotJava/blob/main/utbot-intellij/src/main/kotlin/org/utbot/intellij/plugin/util/RunConfigurationHelper.kt) for implementation.
@@ -241,7 +241,10 @@ The main instrumentation of UnitTestBot is [UtExecutionInstrumentation](https://
241241
### Code generator
242242

243243
Code generation and rendering are a part of the test generation process in UnitTestBot.
244-
UnitTestBot gets the synthetic representation of generated test cases from the fuzzer or the symbolic engine. This representation, or model, is implemented in the `UtExecution` class. The `codegen` module generates the real test code based on this `UtExecution` model and renders it in a human-readable form.
244+
UnitTestBot gets the synthetic representation of generated test cases from the fuzzer or the symbolic engine.
245+
This representation (or model) is implemented in the `UtExecution` class.
246+
The `codegen` module generates the real test code based on this `UtExecution` model
247+
and renders it in a human-readable form.
245248

246249
The `codegen` module
247250
- converts `UtExecution` test information into an Abstract Syntax Tree (AST) representation using `CodeGenerator`,
@@ -287,7 +290,7 @@ To minimize the number of executions in a group, we use a simple greedy algorith
287290
2. Add this execution to the final suite and mark new lines as covered.
288291
3. Repeat the first step and continue till there are executions containing uncovered lines.
289292

290-
The whole minimization procedure is located in the [org.utbopt.framework.minimization](utbot-framework/src/main/kotlin/org/utbot/framework/minimization) package inside the [utbot-framework](../utbot-framework) module.
293+
The whole minimization procedure is located in the [org.utbot.framework.minimization](../utbot-framework/src/main/kotlin/org/utbot/framework/minimization) package inside the [utbot-framework](../utbot-framework) module.
291294

292295
### Summarization module
293296

@@ -309,7 +312,7 @@ For detailed information, please refer to the Summarization architecture design
309312

310313
### SARIF report generator
311314

312-
SARIF (Static Analysis Results Interchange Format) is a JSONbased format for displaying static analysis results.
315+
SARIF (Static Analysis Results Interchange Format) is a JSON-based format for displaying static analysis results.
313316

314317
All the necessary information about the format and its usage can be found
315318
in the [official documentation](https://github.com/microsoft/sarif-tutorials/blob/main/README.md)
@@ -346,7 +349,8 @@ UnitTestBot consists of three processes (according to the execution order):
346349

347350
These processes are built on top of the [Reactive distributed communication framework (Rd)](https://github.com/JetBrains/rd) developed by JetBrains.
348351

349-
One of the main Rd concepts is _Lifetime_ — it helps to release shared resources upon the object's termination. You can find the Rd basic ideas and UnitTestBot implementation details in the [Multiprocess architecture](https://github.com/UnitTestBot/UTBotJava/blob/main/docs/RD%20for%20UnitTestBot.md) design doc.
352+
One of the main Rd concepts is a _Lifetime_ — it helps to release shared resources upon the object's termination.
353+
You can find the Rd basic ideas and UnitTestBot implementation details in the [Multiprocess architecture](https://github.com/UnitTestBot/UTBotJava/blob/main/docs/RD%20for%20UnitTestBot.md) design doc.
350354

351355
### Settings
352356

@@ -362,4 +366,15 @@ The end user has three places to change UnitTestBot behavior:
362366
3. Controls in the **Generate Tests with UnitTestBot window** dialog — for per-generation settings.
363367

364368
### Logging
365-
TODO
369+
370+
The UnitTestBot Java logging system is implemented across the IDE process, the Engine process, and the Instrumented process.
371+
372+
UnitTestBot Java logging relies on `log4j2` library.
373+
The custom Rd logging system is recommended as the default one for the Instrumented process.
374+
375+
In the [Logging](../docs/contributing/InterProcessLogging.md) document,
376+
you can find how to configure the logging system when UnitTestBot Java is used
377+
* as an IntelliJ IDEA plugin,
378+
* as Contest estimator or the Gradle/Maven plugins, via CLI or during the CI test runs.
379+
380+
Implementation details, log level and performance questions are also addressed [here](../docs/contributing/InterProcessLogging.md).

docs/RD for UnitTestBot.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ executing all the callbacks because some other thread executes them.
7777
Rd is a lightweight reactive one-to-one RPC protocol, which is cross-language as well as cross-platform. It can
7878
work on the same or different machines via the Internet.
7979

80-
These are some of Rd entities:
80+
These are some Rd entities:
8181
- `Protocol` encapsulates the logic of all Rd communications. All the entities should be bound to `Protocol` before
8282
being used. `Protocol` contains `IScheduler`, which executes a _runnable_ instance on a different thread.
8383
- `RdSignal` is an entity allowing one to **fire and forget**. You can add a callback for every received message
@@ -228,7 +228,7 @@ Sometimes the _Instrumented process_ may unexpectedly die due to concrete execut
228228
- **Important**: do not add [`Rdgen`](https://mvnrepository.com/artifact/com.jetbrains.rd/rd-gen) as
229229
an implementation dependency — it breaks some JAR files as it contains `kotlin-compiler-embeddable`.
230230
5. Logging & debugging:
231-
- [Interprocess logging](./InterProcessLogging.md)
231+
- [Interprocess logging](contributing/InterProcessLogging.md)
232232
- [Interprocess debugging](./contributing/InterProcessDebugging.md)
233233
6. Custom protocol marshaling types: do not spend time on it until `UtModels` get simpler, e.g. compatible with
234234
`kotlinx.serialization`.

docs/contributing/InterProcessDebugging.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ To debug the _Engine process_ and the _Instrumented process_, you need to enable
6262
"-agentlib:jdwp=transport=dt_socket,server=n,suspend=n,quiet=y,address=12345"
6363
```
6464
See `org.utbot.intellij.plugin.process.EngineProcess.Companion.debugArgument` for switch implementation.
65-
4. For information about logs, refer to the [Interprocess logging](../InterProcessLogging.md) guide.
65+
4. For information about logs, refer to the [Interprocess logging](InterProcessLogging.md) guide.
6666

6767
### Run configurations for debugging the Engine process
6868

0 commit comments

Comments
 (0)