-
-
Notifications
You must be signed in to change notification settings - Fork 21
Allow different splitting rules #96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1441aa5
to
ddc2967
Compare
Thanks for this PR! Not sure we have a contributing guide for this repo, or it may be outdated (sorry about that). But in order for a PR to be accepted, the code should be formatted with |
@juanitorduz as you have some experience with BART and categorical variables you may have comments about this. |
pymc_bart/split_rules.py
Outdated
class SubsetSplitRule: | ||
""" | ||
Choose a random subset of the categorical values and branch on if the value is within the chosen set. | ||
This is the approach taken in flexBART paper. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a reference to the paper including name of the article, authors and DOI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's this paper: https://arxiv.org/abs/2211.04459
Do you have an example of citation formating somewhere so I can align the style?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very interesting! Whenever I need to work with categorical variables, I actively omit the one-hot-encoding (reasons similar to the ones presented in the paper), and rather use variations of target encoding (see Category Encoders).
The idea behind flexBART
seems very interesting, especially the "pooling" between categories through the tree structures. I do not know the full details, and I think @aloctavodia is the expert to review the code in detail. I left some small comments regarding the code. I will be happy to help regarding type-hints and looking into tests :)
pymc_bart/bart.py
Outdated
@@ -109,6 +109,7 @@ def __new__( | |||
beta: float = 2.0, | |||
response: str = "constant", | |||
split_prior: Optional[List[float]] = None, | |||
split_rules: Optional[List] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we keep the exact expected types (maybe ContinuousSplitRule
from what I see below)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where I need a bit of help. If you look at split_rules.py you see there are multiple different split rule options.
They essentially form an abstract class, but as they have staticmethods (for performance reasons) I don't know if it is possible to abstract them into a SplitRule class type in any reasonable way that actually enforces the static methods existing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did my best here: created a class SplitRule with two static abstract methods and made other split rules inherit from it, then added SplitRule as hint here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok! It seems mypy
does not like abstract classes type=hints (python/mypy#4717). My suggestion is to focus on the core functionalities and test and come back to this minor detail once we are ready to merge :) (From my side, I will investigate how to do it)
@@ -0,0 +1,102 @@ | |||
# Copyright 2022 The PyMC Developers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add tests for this module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try. Don't have much experience with pytest but I can hopefully figure it out :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests in https://github.com/pymc-devs/pymc-bart/blob/main/tests/test_tree.py are probably the simplest ones in PyMC-BART, you can try following a similar pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added basic functionality tests to https://github.com/velochy/pymc-bart/blob/main/tests/test_split_rules.py
Co-authored-by: Osvaldo A Martin <aloctavodia@gmail.com>
I have now addressed all the comments to the best of my knowledge. |
You need to run |
Thanks @velochy! |
This PR is inspired by the wish to better handle categorical predictors, and basically implements the paper https://arxiv.org/abs/2211.04459
Not sure how the PR process works for your repo, but happy to comply with any and all standards to get the changes merged :)