Skip to content

Commit fc87c58

Browse files
Added the Summarization module architecture (#1580)
* Added the Summarization module architecture * Summarization doc reviewed Co-authored-by: Olga Naumenko <64418523+olganaumenko@users.noreply.github.com>
1 parent 392ac4b commit fc87c58

File tree

2 files changed

+85
-16
lines changed

2 files changed

+85
-16
lines changed

docs/Summarization module.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# Summarization module
2+
3+
## Overview
4+
5+
UnitTestBot minimizes the number of tests so that they are necessary and sufficient, but sometimes there are still a lot of them. Tests may look very similar to each other, and it may be hard for a user to distinguish between them. To ease test case comprehension, UnitTestBot generates summaries, or human-readable test descriptions. Summaries also facilitate navigation: they structure the whole collection of generated tests by clustering them into groups.
6+
7+
Summarization module generates detailed meta-information:
8+
- test method names
9+
- testing framework annotations (including `@DisplayName`)
10+
- Javadoc comments for tests
11+
- cluster comments for groups of tests (_regions_)
12+
13+
Javadoc comments can be rendered in two styles: as plain text or in a special format enriched with the [custom Javadoc tags](https://github.com/UnitTestBot/UTBotJava/blob/main/docs/summaries/CustomJavadocTags.md).
14+
15+
If the summarization process fails due to an error or insufficient information, then the test method receives a unique name and no meta-information.
16+
17+
The whole summarization subsystem is located in the `utbot-summary` module.
18+
19+
## Implementation
20+
21+
At the last stage of test generation process, the `UtMethodTestSet.summarize` method is called.
22+
As input, this method receives the set of `UtExecution` models with empty `testMethodName`, `displayName`, and `summary` fields. It fills these fields with the corresponding meta-information, groups the received `UtExecution` models into clusters and generates cluster names.
23+
24+
Currently, there are three main `UtExecution` implementations:
25+
* `UtSymbolicExecution`,
26+
* `UtFailedExecution`,
27+
* `UtFuzzedExecution`.
28+
29+
To construct meta-information for the `UtFuzzedExecution` models, the summarization module uses method parameters with their values and types as well as the return value type. To generate summaries for each `UtSymbolicExecution`, it uses the symbolic code analysis results.
30+
31+
Let's describe this process in detail for `UtSymbolicExecution` and `UtFuzzedExecution`.
32+
33+
### Constructing meta-information for `UtSymbolicExecution`
34+
35+
1. **Producing _Jimple statements_.**
36+
For each method under test (or MUT), the symbolic execution engine generates `UtMethodTestSet` consisting of `UtExecution` models, i.e. a test suite consisting of unit tests. A unit test (or `UtExecution`) in this suite is a set of execution steps that traverses a particular path in the MUT. An execution `Step` contains info on a statement, the depth of execution step and an execution decision.
37+
* A statement (`stmt`) is a Jimple statement, provided with the [Soot](https://github.com/soot-oss/soot) framework. A Jimple statement is a simplified representation of the Java program that is based on the three-address code. The symbolic engine accepts Java bytecode and transforms it to the Jimple statements for the analytical traversal of execution paths.
38+
* The depth of execution step (`depth`) depicts an execution depth of the statement in a call graph where the MUT is a root.
39+
* An execution decision (`decision`) is a number indicating the execution direction inside the control flow graph. If there are two edges coming out of the execution statement in the control flow graph, a decision number shows what edge is chosen to be executed next.
40+
41+
2. **_Tagging_.**
42+
For each pair of `UtMethodTestSet` and its source code file, the summarization module identifies unique execution steps, recursions, iteration cycles, skipped iterations, etc. These code constructs are marked with tags or meta-tags, which represent the execution paths in a structural view. The summarization module uses these tags directly to create meta-information, or summaries.
43+
44+
At this moment, the summarization module is able to assign the following tags:
45+
- Uniqueness of a statement:
46+
- _Unique_: no other execution path in the cluster contains this step, so only one execution triggers this statement in its cluster.
47+
- _Common_: all the paths execute these statements.
48+
- _Partly Common_: only some executions in a cluster contain this step.
49+
- The decision in the CFG (branching): _Right_, _Left_, _Return_
50+
- The number of statement executions in a given test
51+
- Dealing with loops: _starting/ending an iteration_, _invoking the recursion_, etc.
52+
53+
We use our own implementation of the [DBSCAN](https://en.wikipedia.org/wiki/DBSCAN) clustering algorithm with the non-euclidean distance measure based on the Minimum Edit Distance to identify _unique_, _common_ and _partly common_ execution steps. Firstly, we manually divided execution paths into groups:
54+
- successfully executed paths (only this group is clustered into different regions with DBSCAN)
55+
- paths with expected exceptions
56+
- paths with unexpected exceptions
57+
- other groups with errors and exceptions based on the given `UtResult`
58+
59+
3. **Building _sentences_.**
60+
_Sentences_ are the blocks for the resulting summaries.
61+
To build the _sentence_, the summarization module
62+
- parses the source file (containing the MUT) using [JavaParser](https://javaparser.org/) to get AST representations;
63+
- maps the AST representations to Jimple statements (so each statement is mapped to AST node);
64+
- builds the _sentence_ blocks (to provide custom Javadoc tags or plain-text mode);
65+
- builds the _final sentence_ (valid for plain-text mode only);
66+
- generates the `@DisplayName` annotation and test method names using the following rule: find the last _unique_ statement in each path (preferably, the condition statement) that has been executed once (being satisfied or unsatisfied); then the AST node of this statement is used for naming the execution;
67+
- builds the cluster names based on the _common_ execution paths.
68+
69+
### Constructing meta-information for `UtFuzzedExecution`
70+
71+
For `UtFuzzedExecution`, meta-information is also available as test method names, `@DisplayName` annotations, Javadoc comments, and cluster comments.
72+
73+
The difference is that clustering tests for `UtFuzzedExecution` is based on `UtResult`. No subgroups are generated for the successfully completed tests.
74+
75+
The algorithm for generating meta-information is described in the `ModelBasedNameSuggester` class, which is the registration point for `SingleModelNameSuggester` interface. This interface is implemented in `PrimitiveModelNameSuggester` and `ArrayModelNameSuggester`.
76+
77+
Depending on the received `UtExecutionResult` type, `ModelBasedNameSuggester` produces the basic part of the method name or the `@DisplayName` annotation. `UtFuzzedExecution` provides `FuzzedMethodDescription` and `FuzzedValue` that supplement the generated basic part for test name with information about the types, names and values of the MUT parameters.
78+
79+
_Note:_ test method names and `@DisplayName` annotations are generated if only the number of MUT parameters is no more than three, otherwise they are not generated.
80+

docs/summaries/CustomJavadocTags.md

Lines changed: 5 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -54,17 +54,6 @@ Added a flag `USE_CUSTOM_TAGS` to settings.
5454
After plugin's removal, IDE doesn't recognize our custom tags. It doesn't lead to errors, but highlights tags with
5555
yellow color.
5656

57-
## Feedback
58-
59-
We held a Feature Demo meeting to gather some feedback (16.08.2022).
60-
61-
In general, teammates said that the idea and the way it looks like is good and suggested several things to do:
62-
63-
- Investigate if it's possible to add a link to menu settings or file with stacktrace in comments.
64-
- There is no need in classUnderTest tag because methodUnderTest contains this information.
65-
- Add a tag describing the test intention: check boundary cases, calls of some interesting methods.
66-
- Prepare survey and ask Artem Aliev's team to try the feature and share feedback.
67-
6857
## Test scenarios
6958

7059
Currently, the feature works only for Symbolic execution engine, so make sure the slider is on the Symbolic execution
@@ -86,11 +75,11 @@ side.
8675

8776
### Content
8877

89-
1. First, generate comment with one style, then generate with another one and compare its content. If it differs,
90-
please, provide code snippet and both generated comments. It could differ because currently the
91-
style with custom Javadoc tags is a bit simplified.
78+
First, generate comment with one style, then generate with another one and compare its content. If it differs,
79+
please, provide code snippet and both generated comments. It could differ because currently the
80+
style with custom Javadoc tags is a bit simplified.
9281

9382
### View
9483

95-
1. Check that the comments are rendered well. To do it, click on the toggle near the comment (see post about Rendered
96-
view feature in IntelliJ IDEA).
84+
Check that the comments are rendered well. To do it, click on the toggle near the comment (see post about Rendered
85+
view feature in IntelliJ IDEA).

0 commit comments

Comments
 (0)