-
Notifications
You must be signed in to change notification settings - Fork 705
add user guide for ant-xgboost #772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add user guide for ant-xgboost #772
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR. I think we need a clear goal for this document. It contains too much about Ant-XGBoost, which is more appropriate to reside in the Ant-XGBoost repo rather than here the SQLFlow repo.
It is titled "user guide", but doesn't contain the part about building and setting up SQLFlow with Ant-XGBoost codegen. It would be easier to have links to the setup of Jupyter Notebook as well, so users could follow and type the examples.
It seems more comprehensive if we could explain novel concepts, like XGBoost Estimator, before showing how to call them.
Also, please follow the Markdown syntax used with Github: https://guides.github.com/features/mastering-markdown/
doc/ant-xgboost_user_guide.md
Outdated
@@ -0,0 +1,265 @@ | |||
### _user guide:_ Ant-XGBoost on sqlflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
###
=> #
doc/ant-xgboost_user_guide.md
Outdated
@@ -0,0 +1,265 @@ | |||
### _user guide:_ Ant-XGBoost on sqlflow | |||
|
|||
#### Overview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
####
=> ##
doc/ant-xgboost_user_guide.md
Outdated
While the current `auto_train` method is a very simple approach, we are working on better strategies to further scale up hyperparameter tuning in XGBoost training. | ||
|
||
|
||
### Helpful Backports of XGBoost master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All above are about Ant-XGBoost and should be part of the documentation of Ant-XGBoost, other than SQLFlow. I would recommend moving the above content to github.com/alipay/ant-xgboost, and put a link in this document pointing to that repo.
doc/ant-xgboost_user_guide.md
Outdated
|
||
<br> | ||
|
||
## Quick Start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document is supposed to be about the design of antxgboost_codegen.go
. But there is no discussion about this code generator?
doc/ant-xgboost_user_guide.md
Outdated
|
||
<br> | ||
|
||
## Overall SQL Syntax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this document is not about the extended syntax by SQLFlow to SQL. Do you want to explain the part of SQLFlow syntax to be utilized by the Ant-XGBoost codegen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangkuiyi Yes, I want to inform users the overall sqlflow syntax related to ant-xgboost. So, we rename this section with Overall SQL Syntax for AntXGBoost
.
@wangkuiyi Thanks for comments! I have refined this guide, and add a tutorial for AntXGBoost. |
[Ant-XGBoost](https://github.com/alipay/ant-xgboost) is fork of [dmlc/xgboost](https://github.com/dmlc/xgboost), which is maintained by active contributors of dmlc/xgboost in Alipay Inc. | ||
|
||
Ant-XGBoost extends `dmlc/xgboost` with the capability of running on Kubernetes and automatic hyper-parameter estimation. | ||
In particular, Ant-XGBoost includes `auto_train` methods for automatic training and introduces an additional parameter `convergence_criteria` for generalized early stopping strategy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add links to reference auto_train
and convergence_criteria
, so that users can know the concept clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/ant-xgboost_user_guide.md
Outdated
|
||
## Tutorial | ||
We provide an [interactive tutorial](../example/jupyter/tutorial_antxgb.ipynb) via jupyter notebook, which can be run out-of-the-box in [sqlflow playground](https://play.sqlflow.org). | ||
If you want to run it locally, you need to install sqlflow first. You can learn how to install sqlflow at [here](../doc/installation.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sqlflow => SQLFlow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/ant-xgboost_user_guide.md
Outdated
|
||
* xgboost.Regressor | ||
|
||
Estimator for regression task, set `train.objective` to `reg:squarederror`(`reg:linear`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does users need to set objective=train.objective
in WITH clause or not? If not, which of would be the value of objective
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/ant-xgboost_user_guide.md
Outdated
### Columns | ||
|
||
#### Feature Columns | ||
For now, two feature column schemas are available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two feature column schemas
=>
two kinds of feature columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/ant-xgboost_user_guide.md
Outdated
|
||
First one is `dense schema`, which concatenate numeric table columns transparently, such as `COLUMN f1, f2, f3, f4`. | ||
|
||
Second one is `sparse key-value schema`, which received string sparse feature formatted like `$k1:$v1,$k2:$v2,...`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does k
and v
mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
fix #746