You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the domain of large-scale software development, the demands for dynamic and multifaceted static code analysis exceed the capabilities of traditional tools. To bridge this gap, we present CodeFuse-Query, a system that redefines static code analysis through the fusion of Domain Optimized System Design and Logic Oriented Computation Design.
31
+
CodeFuse-Query reimagines code analysis as a data computation task, support scanning over 10 billion lines of code daily and more than 300 different tasks. It optimizes resource utilization, prioritizes data reusability, applies incremental code extraction, and introduces tasks types specially for Code Change, underscoring its domain-optimized design. The system's logic-oriented facet employs Datalog, utilizing a unique two-tiered schema, COREF, to convert source code into data facts. Through Godel, a distinctive language, CodeFuse-Query enables formulation of complex tasks as logical expressions, harnessing Datalog's declarative prowess.
32
+
33
+
Overall, the CodeFuse-Query platform is divided into three main parts: code data model, code query DSL, and platform productization services.
34
+
### Code Data Model: COREF
35
+
We have defined a code data and standardization model: COREF, which requires all code to be converted to this model through various language extractors.
Note: Since the computation difficulty of each type of information varies, not all languages' COREF information includes all the above. The basic information mainly consists of AST, ASG, Call Graph, Class Hierarchy, and Documentation, while other information (CFG and PDG) is still under construction and will be gradually supported.
39
+
### Code Query DSL
40
+
Based on the generated COREF code data, CodeFuse-Query uses a custom DSL language called **Gödel** for queries to meet code analysis needs.
41
+
Gödel is a logical reasoning language based on the logical reasoning language Datalog, which derives new facts through "facts" and "rules". Gödel is also a declarative language, which, compared to imperative programming, focuses more on describing "what is needed" and leaves the implementation to the computation engine.
42
+
Since the code has been transformed into relational data (COREF data is stored in the form of relational data tables), one might wonder why not use SQL directly or use an SDK, but instead learn a new DSL language. The reason is that Datalog has monotonicity and termination properties, meaning that Datalog sacrifices some expressive power, and Gödel inherits this characteristic.
43
+
44
+
- Compared to SDKs, Gödel's main advantage is ease of learning and use; its declarative nature means users do not need to focus on intermediate computations but can describe their needs simply, like with SQL.
45
+
- Compared to SQL, Gödel's advantages are stronger descriptive ability and faster computation speed, for example, in describing recursive algorithms and multi-table joint queries, which are difficult for SQL.
46
+
### Platformization, Productization
47
+
CodeFuse-Query includes the **Sparrow CLI** and the online service **Query Center**. Sparrow CLI contains all components and dependencies, such as extractors, data model, compiler, etc., allowing users to generate code data and conduct queries locally (for Sparrow CLI usage, please see Section 3: Installation, Configuration, and Running). If users require online queries, they can experiment using the Query Center.
48
+
## Supported Programming Languages for Analysis
49
+
As of now, CodeFuse-Query supports data analysis for 11 programming languages. Among them, support for 5 languages (Java, JavaScript, TypeScript, XML, Go) is very mature, while the remaining 6 languages (Object-C, C++, Python3, Swift, SQL, Properties) are in beta stage and have room for further improvement and perfection. The specific support status is shown in the table below:
50
+
51
+
| Language | Status | COREF Model Node Count |
49
52
| --- | --- | --- |
50
-
| Java | 成熟 | 162 |
51
-
| XML | 成熟 | 12 |
52
-
| TS/JS | 成熟 | 392 |
53
-
| Go | 成熟 | 40 |
54
-
| OC/C++ | beta | 53/397 |
55
-
| Python3 | beta | 93 |
56
-
| Swift | beta | 248 |
57
-
| SQL | beta | 750 |
58
-
| Properties | beta | 9 |
59
-
60
-
注:以上语言状态的成熟程度判断标准是根据COREF包含的信息种类和实际落地情况来进行判定,除了OC/C++外,所有语言均支持了完整的AST信息和Documentation信息,以Java为例,COREF for Java还支持了ASG、Call Graph、Class Hierarchy、以及部分CFG信息。
61
-
## 使用场景
62
-
### 查询代码特征
63
-
小开发同学想知道 Repo A 里面使用了哪些 String 型的变量,所以他写了一个 Gödel 如下,交给 CodeFuse-Query 系统给他返回了结果。
Note: The maturity level of the language status is determined based on the types of information contained in COREF and the actual implementation. Except for OC/C++, all languages support complete AST information and Documentation, and in the case of Java, COREF for Java also supports ASG, Call Graph, Class Hierarchy, and some CFG information.
64
+
65
+
## Quick Start
66
+
[Installation, Configuration, and Running](./doc/3_install_and_run.md)
67
+
68
+
## Documentation
69
+
-[Abstract](./doc/1_abstract.md)
70
+
-[Introduction](./doc/2_introduction.md)
71
+
-[User Case](./doc/user_case.en.md)
72
+
-[Installation, Configuration, and Running](./doc/3_install_and_run.md)
-`cli`: The entry point for the command-line tool, providing a unified command-line interface, calling other modules to complete specific functions
82
+
-`language`: Core data and data modeling (lib) for various languages. Regarding the degree of openness, please refer to the section "Some Notes on the Scope of Open Source"
83
+
-`doc`: Reference documents
84
+
-`examples`: Gödel query language examples
85
+
-`tutorial`:CodeFuse-Query Development Container Usage Tutorial
As of now, it is **not possible** to build an executable program from the source code because not all modules have been made open-source in this release, and missing modules will be released over the next year. Nevertheless, to ensure a complete experience, we have released **complete installation packages** for download, please see the Release page.
89
+
Regarding the openness of languages, you can refer to the table below:
136
90
137
-
|语言|数据建模开源 | 数据化核心开源 | 成熟度|
91
+
|Language|Data Modeling Open Source | Data Core Open Source | Maturity|
[](https://star-history.com/#codefuse-ai/CodeFuse-Query&Date)
0 commit comments