Skip to content

move design docs to a folder #821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,13 +52,13 @@ Done predicting. Predict table : iris.predict

- [Installation](doc/installation.md)
- [Running a Demo](doc/demo.md)
- [Extended SQL Syntax](doc/syntax.md)
- [User Guide](doc/user_guide.md)

## Contributions

- [Build from source](doc/build.md)
- [The walkthrough of the source code](doc/walkthrough.md)
- [The choice of parser generator](doc/sql_parser.md)
- [The choice of parser generator](doc/design/design_sql_parser.md)

## Roadmap

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Proof of Concept: ALPS Submitter
# _Design:_ ALPS Submitter

ALPS (Ant Learning and Prediction Suite) provides a common algorithm-driven framework in Ant Financial, focusing on providing users with an efficient and easy-to-use machine learning programming framework and a financial learning machine learning algorithm solution.

Expand Down
2 changes: 1 addition & 1 deletion doc/analyzer_design.md → doc/design/design_analyzer.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Design: Analyze the Machine Learning Mode in SQLFlow
# _Design:_ Analyze the Machine Learning Mode in SQLFlow

## Concept

Expand Down
File renamed without changes.
4 changes: 2 additions & 2 deletions doc/auth_design.md → doc/design/design_auth.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Design: SQLFlow Authentication and Authorization
# _Design:_ SQLFlow Authentication and Authorization

## Concepts

Expand Down Expand Up @@ -72,7 +72,7 @@ that case, we store session data into a reliable storage service like
The below figure demonstrates overall workflow for authorization and
authentication.

<img src="figures/sqlflow_auth.png">
<img src="../figures/sqlflow_auth.png">

Users can access the JupyterHub web page using their own username and password.
The user's identity will be verified by the [SSO](https://en.wikipedia.org/wiki/Single_sign-on)
Expand Down
4 changes: 2 additions & 2 deletions doc/cluster_design.md → doc/design/design_clustermodel.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Design: Clustering in SQLflow to analyze patterns in data
# _Design:_ Clustering in SQLflow to analyze patterns in data

## ClusterModel introduction

Expand All @@ -9,7 +9,7 @@ This design document introduced how to support the `Cluster Model` in SQLFLow.

The figure below demonstrates the overall workflow for cluster model training, which include both the pre_train autoencoder model and the clustering model.(Reference https://www.dlology.com/blog/how-to-do-unsupervised-clustering-with-keras/)

<div align=center> <img width="460" height="550" src="figures/cluster_model_train_overview.png"> </div>
<div align=center> <img width="460" height="550" src="../figures/cluster_model_train_overview.png"> </div>

1. The first part is used to load a pre_trained model. We use the output of the trained encoder layer as the input to the clustering model.
2. Then, the clustering model starts training with randomly initialized weights, and generate clusters after multiple iterations.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Design Doc: Define Models for SQLFlow
# _Design:_ Define Models for SQLFlow

SQLFlow enables SQL programs to call deep learning models defined in Python. This document is about how to define models for SQLFlow.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Compatibility with Various SQL Engines
# _Design:_ Compatibility with Various SQL Engines

SQLFlow interacts with SQL engines like MySQL and Hive, while different SQL engines use variants of SQL syntax, it is important for SQLFlow to have an abstraction layer that hides such differences.

Expand All @@ -8,7 +8,7 @@ SQLFlow calls Go's [standard database API](https://golang.org/pkg/database/sql/)

### Data Retrieval

The basic idea of SQLFlow is to extend the SELECT statement of SQL to have the TRAIN and PREDICT clauses. For more discussion, please refer to the [syntax design](/doc/syntax.md). SQLFlow translates such "extended SQL statements" into submitter programs, which forward the part from SELECT to TRAIN or PREDICT, which we call the "standard part", to the SQL engine. SQLFlow also accepts the SELECT statement without TRAIN or PREDICT clauses and would forward such "standard statements" to the engine. It is noticeable that the "standard part" or "standard statements" are not standardized. For example, various engines use different syntax for `FULL OUTER JOIN`.
The basic idea of SQLFlow is to extend the SELECT statement of SQL to have the TRAIN and PREDICT clauses. For more discussion, please refer to the [syntax design](/doc/design/design_syntax.md). SQLFlow translates such "extended SQL statements" into submitter programs, which forward the part from SELECT to TRAIN or PREDICT, which we call the "standard part", to the SQL engine. SQLFlow also accepts the SELECT statement without TRAIN or PREDICT clauses and would forward such "standard statements" to the engine. It is noticeable that the "standard part" or "standard statements" are not standardized. For example, various engines use different syntax for `FULL OUTER JOIN`.

- Hive supports `FULL OUTER JOIN` directly.
- MySQL doesn't have `FULL OUTER JOIN`. However, a user can emulates `FULL OUTER JOIN` using `LEFT JOIN`, `UNION` and `RIGHT JOIN`.
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Design: Feature Derivation
# _Design:_ Feature Derivation

This file discusses the details and implementations of "Feature Derivation".
Please refer to [this](https://medium.com/@SQLFlow/feature-derivation-the-conversion-from-sql-data-to-tensors-833519db1467) blog to
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@ As SQLFlow is supporting more and more machine learning toolkits, the correspond

The core `sql` package should include the following functionalities:
1. The entry point of running extended SQL statements.
1. The [parsing](https://github.com/sql-machine-learning/sqlflow/blob/develop/doc/sql_parser.md) of extended SQL statements.
1. The [parsing](https://github.com/sql-machine-learning/sqlflow/blob/develop/doc/design/design_sql_parser.md) of extended SQL statements.
1. The verification of extended SQL statements, including verifying the syntax, the existence of the selected fields.
1. The [feature derivation](https://github.com/sql-machine-learning/sqlflow/blob/develop/doc/feature_derivation.md), including name, type, shape, and preprocessing method of the select fields.
1. The [training data and validation data split](https://github.com/sql-machine-learning/sqlflow/blob/develop/doc/training_and_validation.md).
1. The [feature derivation](https://github.com/sql-machine-learning/sqlflow/blob/develop/doc/design/design_feature_derivation.md), including name, type, shape, and preprocessing method of the select fields.
1. The [training data and validation data split](https://github.com/sql-machine-learning/sqlflow/blob/develop/doc/design/design_training_and_validation.md).

With these functionalities, the `sql` package çan translate user typed extended SQL statements to an IR as an exposed Go struct. The codegen package takes the IR and returns a generated Python program for the `sql` package to execute.

Expand Down
4 changes: 2 additions & 2 deletions doc/pipe.md → doc/design/design_pipe.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Piping Responses
# _Design:_ Piping Responses


## Streaming Responses

As described in the [overall design](doc/syntax.md), a SQLFlow job could be a standard or an extended SQL statement, where an extended SQL statement will be translated into a Python program. Therefore, each job might generate up to the following data streams:
As described in the [overall design](design_syntax.md), a SQLFlow job could be a standard or an extended SQL statement, where an extended SQL statement will be translated into a Python program. Therefore, each job might generate up to the following data streams:

1. standard output, where each element is a line of text,
1. standard error, where each element is a line of text,
Expand Down
2 changes: 1 addition & 1 deletion doc/sql_parser.md → doc/design/design_sql_parser.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Extended SQL Parser Design
# _Design:_ Extended SQL Parser

This documentation explains the technical decision made in building a SQL
parser in Go. It is used to parsed the extended SELECT syntax of SQL that
Expand Down
4 changes: 2 additions & 2 deletions doc/submitter.md → doc/design/design_submitter.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Submitter
# _Design:_ Submitter

A submitter is a pluggable module in SQLFlow that is used to submit an ML job to a third party computation service.

## Workflow

When a user types in an extended SQL statement, SQLFlow first parses and semantically verifies the statement. Then SQLFlow either runs the ML job locally or submits the ML job to a third party computation service.

![](figures/sqlflow-arch2.png)
![](../figures/sqlflow-arch2.png)

In the latter case, SQLFlow produces a job description (`TrainDescription` or `PredictDescription`) and hands it over to the submitter. For a training SQL, SQLFlow produces `TrainDescription`; for prediction SQL, SQLFlow produces `PredDescription`. The concrete definition of the description looks like the following

Expand Down
2 changes: 1 addition & 1 deletion doc/syntax.md → doc/design/design_syntax.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# SQLFlow: Design Doc
# _Design:_ SQLFlow

## What is SQLFlow

Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Design: Training and Validation
# _Design:_ Training and Validation

A common ML training job usually involves two kinds of data sets: training data and validation data. These two data sets will be generated automatically by SQLFlow through randomly splitting the select results.

## Overall
SQLFlow generates a temporary table following the user-specific table, trains and evaluates a model.

<img src="./figures/training_and_validation.png" width="60%">
<img src="../figures/training_and_validation.png" width="60%">

Notice, we talk about the **train** process in this post.

Expand Down Expand Up @@ -125,4 +125,4 @@ In the end, SQLFlow remove the temporary table to release resources.

- If the column sqlflow_random already exists, SQLFlow chooses to quit
Notice, *column name started with an underscore is invalid in the hive*
- Any discussion to implement a better splitting is welcomed
- Any discussion to implement a better splitting is welcomed
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Design Doc: XGBoost on SQLFlow
# _Design:_ XGBoost on SQLFlow

## Introduction

Expand Down
2 changes: 1 addition & 1 deletion doc/text_classification_demo.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Note that the steps in this tutorial may be changed during the development
of SQLFlow, we only provide a way that simply works for the current version.

To support custom models like CNN text classification, you may check out the
current [design](https://github.com/sql-machine-learning/models/blob/develop/doc/customized%2Bmodel.md)
current [design](https://github.com/sql-machine-learning/models/blob/develop/doc/design/design_customized_model.md)
for ongoing development.

In this tutorial we use two datasets both for english and chinese text classification.
Expand Down