Skip to content

[doc] Add jekyll build and deployment CI #28

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Dec 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions .github/workflows/pages.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# This workflow uses actions that are not certified by GitHub.
# They are provided by a third-party and are governed by
# separate terms of service, privacy policy, and support
# documentation.

# Sample workflow for building and deploying a Jekyll site to GitHub Pages
name: Deploy Jekyll site to Pages

on:
push:
branches: ["main", "doc-preview"]
paths:
- "doc/**"

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
permissions:
contents: read
pages: write
id-token: write

# Allow one concurrent deployment
concurrency:
group: "pages"
cancel-in-progress: true

jobs:
# Build job
build:
runs-on: ubuntu-latest
defaults:
run:
working-directory: doc
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Setup Ruby
uses: ruby/setup-ruby@v1
with:
ruby-version: '3.1' # Not needed with a .ruby-version file
bundler-cache: true # runs 'bundle install' and caches installed gems automatically
cache-version: 0 # Increment this number if you need to re-download cached gems
working-directory: '${{ github.workspace }}/doc'
- name: Generate COREF API Documents
run: python3 tools/build.py
- name: Setup Pages
id: pages
uses: actions/configure-pages@v3
- name: Build with Jekyll
# Outputs to the './_site' directory by default
run: bundle exec jekyll build --baseurl "${{ steps.pages.outputs.base_path }}"
env:
JEKYLL_ENV: production
- name: Upload artifact
# Automatically uploads an artifact from the './_site' directory by default
uses: actions/upload-pages-artifact@v1
with:
path: "doc/_site/"

# Deployment job
deploy:
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
needs: build
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v2
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ CodeFuse-Query为CodeFuse代码大模型提供了以下数据清洗能力:
- [安装、配置、运行](./doc/3_install_and_run.md)
- [Gödel查询语言介绍](./doc/4_godelscript_language.md)
- [VSCode开发插件](./doc/5_toolchain.md)
- [COREF API](https://codefuse-ai.github.io/CodeFuse-Query/godel-api/coref_library_reference.html)

## 教程 (tutorial)
- [在线教程](./tutorial/README.md)
Expand Down
2 changes: 2 additions & 0 deletions doc/1_abstract.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
随着大规模软件开发的普及,对可扩展且易于适应的静态代码分析技术的需求正在加大。传统的静态分析工具,如 Clang Static Analyzer (CSA) 或 PMD,在检查编程规则或样式问题方面已经展现出了良好的效果。然而,这些工具通常是为了满足特定的目标而设计的,往往无法满足现代软件开发环境中多变和多元化的需求。这些需求可以涉及服务质量 (QoS)、各种编程语言、不同的算法需求,以及各种性能需求。例如,安全团队可能需要复杂的算法,如上下文敏感的污点分析,来审查较小的代码库,而项目经理可能需要一种相对较轻的算法,例如计算圈复杂度的算法,以在较大的代码库上测量开发人员的生产力。

这些多元化的需求,加上大型组织中常见的计算资源限制,构成了一项重大的挑战。由于传统工具采用的是问题特定的计算方式,往往无法在这种环境中实现扩展。因此,我们推出了 CodeQuery,这是一个专为大规模静态分析设计的集中式数据平台。
在 CodeQuery 的实现中,我们把源代码和分析结果看作数据,把执行过程看作大数据处理,这与传统的以工具为中心的方法有着显著的不同。我们利用大型组织中的常见系统,如数据仓库、MaxCompute 和 Hive 等数据计算设施、OSS 对象存储和 Kubernetes 等灵活计算资源,让 CodeQuery 能够无缝地融入这些系统中。这种方法使 CodeQuery 高度可维护和可扩展,能够支持多元化的需求,并有效应对不断变化的需求。此外,CodeQuery 的开放架构鼓励各种内部系统之间的互操作性,实现了无缝的交互和数据交换。这种集成和交互能力不仅提高了组织内部的自动化程度,也提高了效率,降低了手动错误的可能性。通过打破信息孤岛,推动更互联、更自动化的环境,CodeQuery 显著提高了软件开发过程的整体生产力和效率。
此外,CodeQuery 的以数据为中心的方法在处理静态源代码分析的领域特定挑战时具有独特的优势。例如,源代码通常是一个高度结构化和互联的数据集,与其他代码和配置文件有强烈的信息和连接。将代码视为数据,CodeQuery 可以巧妙地处理这些问题,这使得它特别适合在大型组织中使用,其中代码库持续但逐步地进行演变,大部分代码在每天进行微小的改动同时保持稳定。 CodeQuery 还支持如基于代码数据的商业智能 (BI) 这类用例,能生成报告和仪表板,协助监控和决策过程。此外,CodeQuery 在分析大型语言模型 (LLM) 的训练数据方面发挥了重要作用,提供了增强这些模型整体效果的深入见解。

在当前的静态分析领域,CodeQuery 带来了一种新的范式。它不仅满足了大规模、复杂的代码库分析需求,还能适应不断变化和多元化的静态分析场景。CodeQuery 的以数据为中心的方法,使得其在处理大数据环境中的代码分析问题时具有独特优势。CodeQuery 的设计,旨在解决大规模软件开发环境中的静态分析问题。它能够将源代码和分析结果视作数据,使得其可以灵活地融入大型组织的各种系统中。这种方法不仅可以有效地处理大规模的代码库,还可以应对各种复杂的分析需求,从而使得静态分析工作变得更加高效和准确。

CodeQuery 的特点和优势可以概括为以下几点:
Expand Down
4 changes: 3 additions & 1 deletion doc/5_toolchain.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,5 +79,7 @@ code --install-extension [扩展vsix文件路径]
- `godelScript.libraryDirectoryPath`
- 用于指定 GödelScript 的库文件夹路径,默认为空。需要时请替换为 GödelScript 库文件夹绝对路径。
- 如果已经下载 Sparrow CLI ,则库文件夹路径为 `[sparrow cli root]/lib-1.0`。
# 智能助手

# 智能助手

待开放,尽情期待!
7 changes: 7 additions & 0 deletions doc/Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
source 'https://rubygems.org'

gem "jekyll", "~> 4.3.2" # installed by `gem jekyll`
# gem "webrick" # required when using Ruby >= 3 and Jekyll <= 4.2.2

gem "just-the-docs", "0.7.0" # pinned to the current release
# gem "just-the-docs" # always download the latest release
86 changes: 86 additions & 0 deletions doc/Gemfile.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
GEM
remote: https://rubygems.org/
specs:
addressable (2.8.5)
public_suffix (>= 2.0.2, < 6.0)
colorator (1.1.0)
concurrent-ruby (1.2.2)
em-websocket (0.5.3)
eventmachine (>= 0.12.9)
http_parser.rb (~> 0)
eventmachine (1.2.7)
ffi (1.15.5)
forwardable-extended (2.6.0)
google-protobuf (3.24.3-arm64-darwin)
google-protobuf (3.24.3-x86_64-linux)
http_parser.rb (0.8.0)
i18n (1.14.1)
concurrent-ruby (~> 1.0)
jekyll (4.3.2)
addressable (~> 2.4)
colorator (~> 1.0)
em-websocket (~> 0.5)
i18n (~> 1.0)
jekyll-sass-converter (>= 2.0, < 4.0)
jekyll-watch (~> 2.0)
kramdown (~> 2.3, >= 2.3.1)
kramdown-parser-gfm (~> 1.0)
liquid (~> 4.0)
mercenary (>= 0.3.6, < 0.5)
pathutil (~> 0.9)
rouge (>= 3.0, < 5.0)
safe_yaml (~> 1.0)
terminal-table (>= 1.8, < 4.0)
webrick (~> 1.7)
jekyll-include-cache (0.2.1)
jekyll (>= 3.7, < 5.0)
jekyll-sass-converter (3.0.0)
sass-embedded (~> 1.54)
jekyll-seo-tag (2.8.0)
jekyll (>= 3.8, < 5.0)
jekyll-watch (2.2.1)
listen (~> 3.0)
just-the-docs (0.7.0)
jekyll (>= 3.8.5)
jekyll-include-cache
jekyll-seo-tag (>= 2.0)
rake (>= 12.3.1)
kramdown (2.4.0)
rexml
kramdown-parser-gfm (1.1.0)
kramdown (~> 2.0)
liquid (4.0.4)
listen (3.8.0)
rb-fsevent (~> 0.10, >= 0.10.3)
rb-inotify (~> 0.9, >= 0.9.10)
mercenary (0.4.0)
pathutil (0.16.2)
forwardable-extended (~> 2.6)
public_suffix (5.0.3)
rake (13.0.6)
rb-fsevent (0.11.2)
rb-inotify (0.10.1)
ffi (~> 1.0)
rexml (3.2.6)
rouge (4.1.3)
safe_yaml (1.0.5)
sass-embedded (1.67.0-arm64-darwin)
google-protobuf (~> 3.23)
sass-embedded (1.67.0-x86_64-linux-gnu)
google-protobuf (~> 3.23)
terminal-table (3.0.2)
unicode-display_width (>= 1.1.1, < 3)
unicode-display_width (2.4.2)
webrick (1.8.1)

PLATFORMS
arm64-darwin-21
arm64-darwin-23
x86_64-linux

DEPENDENCIES
jekyll (~> 4.3.2)
just-the-docs (= 0.7.0)

BUNDLED WITH
2.3.26
5 changes: 5 additions & 0 deletions doc/_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
title: CodeFuse-Query Documentation
description: A starter template for a Jeykll site using the Just the Docs theme!
theme: just-the-docs

url: https://codefuse-ai.github.io/CodeFuse-Query
8 changes: 8 additions & 0 deletions doc/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: Home
layout: default
nav_order: 1
---
## 文档 (Documentation)

请见[仓库首页](https://github.com/codefuse-ai/CodeFuse-Query)
21 changes: 21 additions & 0 deletions doc/tools/build.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import subprocess

print("Download Sparrow CLI")
subprocess.run([
"curl",
"-L",
"https://github.com/codefuse-ai/CodeFuse-Query/releases/download/2.0.2/sparrow-cli-2.0.2.linux.tar.gz",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

留个TODO,这里可以改成latest的链接,后面可以自动根据最新发布更新文档

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

要不加个issue啥的?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea

"-o",
"sparrow-cli.tar.gz"
])
subprocess.run([
"tar",
"-xvzf",
"sparrow-cli.tar.gz"
])
print("Copy ../assets into ./doc/assets")
subprocess.run(["cp", "-r", "../assets", "./"])
print("Concat coref library from ../language into ./.coref-api-build")
subprocess.run(["python3", "tools/generate_coref_library.py", "../language"])
print("Generate markdown documents into ./godel-api")
subprocess.run(["python3", "tools/generate_markdown.py", "./sparrow-cli/godel-script/usr/bin/godel"])
28 changes: 28 additions & 0 deletions doc/tools/generate_coref_library.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import sys
import os

if len(sys.argv) != 2:
print("Usage: python this_file.py language_library_directory")
exit(-1)

input_language_dir = sys.argv[1]

print("Generate library from", input_language_dir)
if not os.path.exists("./.coref-api-build"):
os.mkdir("./.coref-api-build")

mapper = {
"coref.go.gdl": input_language_dir + "/go/lib",
"coref.java.gdl": input_language_dir + "/java/lib",
"coref.javascript.gdl": input_language_dir + "/javascript/lib",
"coref.python.gdl": input_language_dir + "/python/lib",
"coref.xml.gdl": input_language_dir + "/xml/lib",
}

for key in mapper.keys():
output_file = "./.coref-api-build/" + key
result = ""
for root, ignored, files in os.walk(mapper[key]):
for file in files:
result += open(root + "/" + file, "r").read() + "\n"
open(output_file, "w").write(result)
Loading