-
Notifications
You must be signed in to change notification settings - Fork 17
Implement cluster model by subclassing of tensorflow.keras #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent PR.
sqlflow_models/cluster_keras.py
Outdated
metric['ari'] = np.round(ari(y, y_pred), 5) | ||
return q, metric | ||
|
||
def cluster_train_loop(self, x, y, batch_size=256, maxiter=8000, update_interval=150, tol=0.001): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the usage of y
? I thought the cluster model is trained on unlabeled data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
y is used for evaluating the performance of cluster model when training with labeled data.
You are right. It is really necessary to remove this or setting it as None by default.
sqlflow_models/cluster_keras_main.py
Outdated
print(dec.display_model_info(verbose=2)) | ||
|
||
if hasattr(dec, 'cluster_train_loop'): | ||
dec.cluster_train_loop(x=x, y=y, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if it is possible to use the model.fit
API to training the model, and put the early stopping strategy as a callback function? So that the code generation template in SQLFlow would be simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am entangled with this.
In classical scenario, when subclassing the keras.Model
class, user only need to define the structure of model in __init__
and the forward pass in the call
method. All the other operations such as compile
and fit
are inherited from keras.Model
.
However in cluster model, the inherited fit
cannot fit with such training process.
Cluster model uses auxiliary target distribution p
as latent label for training and it keeps changing during training. The inherited fit
always requires static y
. Based on these I used train_on_batch
instead of fit
.
q, metric = self.evaluate(x, y)
p = self.target_distribution(q)
...
loss = self.train_on_batch(x=x[idx], y=p[idx])
@@ -0,0 +1,520 @@ | |||
/Users/didi/Develop/VirtualEnvs/virtualenv_3.6.5/bin/python3.6 /Users/didi/Develop/PycharmProjects/temp_requirements/src/order_by_project/sqlflow/clustering_model/version_03/dec_demo_03.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this log file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, got it.
sqlflow_models/cluster_keras.py
Outdated
pretrain_initializer='glorot_uniform', | ||
loss=None): | ||
""" | ||
Implement cluster model mostly based on DEC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a link to DEC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that the cluster model is DEC but they are really close.
I am entangled with whether this link will mislead users.
sqlflow_models/cluster_keras.py
Outdated
self.get_layer(name='clustering').set_weights([self.kmeans.cluster_centers_]) | ||
self.display_model_info() | ||
index, loss, p = 0, 0., None | ||
y_pred_last = self.kmeans.fit_predict(self.encoded_input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is redundancy, L175 can return the y_pred_last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the discussion with @wenjing , the output prediction result group by group_id, so maybe we need to implement a function sqlflow_predict()
function that outputs the result which group by group_id
.
sqlflow_models/cluster_keras_main.py
Outdated
|
||
|
||
@timelogger | ||
def train_evaluate(datasource='mnist'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove train_evaluate
function in the model definition file, and add a unit test in the tests
folder.
tests/test_deep_embedding_cluster.py
Outdated
self.label = [0 for _ in range(50)] + [1 for _ in range(50)] | ||
feature_columns = [tf.feature_column.numeric_column(key) for key in | ||
self.features] | ||
self.model = sqlflow_models.DNNClassifier(feature_columns=feature_columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need to test sqlflow_models.DeepEmbeddingCluster
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need to test
sqlflow_models.DeepEmbeddingCluster
here.
Have added in new commit. Tks for remind.
…alling train_on_batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CI failed logs:
import sqlflow_models
sqlflow_models/__init__.py:4: in <module>
from .deep_embedding_cluster import DeepEmbeddingClusterModel
sqlflow_models/deep_embedding_cluster.py:19: in <module>
from sklearn.cluster import KMeans
E ModuleNotFoundError: No module named 'sklearn'
You can fix it by adding the dependency in https://github.com/sql-machine-learning/models/blob/develop/setup.py#L24
self.y_pred_last = self.kmeans.fit_predict(self.encoded_input) | ||
print('{} Done init centroids by k-means.'.format(datetime.now())) | ||
|
||
def cluster_train_loop(self, x): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can make cluster_train_loop
to be more general:
def sqlflow_trai_loop(self, dataset, epochs, verbos):
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can make
cluster_train_loop
to be more general:def sqlflow_trai_loop(self, dataset, epochs, verbos): ...
Actually I have thought about this.
epochs
is always used to define how many times to train on the whole dataset. However in this case, the auxiliary target distribution p
needs to be updated after update_interval
iterations (update_interval
always smaller than the number of batches in one epoch
). Since the pre-train process can fetch parameter pretrain_epochs
from construct function, and pre_train
should be consider as sub-train-process, I did not add epochs
as parameter here.
verbose
can be add here for control to display more details. I can add this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I see that epochs
is not necessary for this cause, but we need to support the custom train loop function by editing template_tf.go
in SQLFlow, and I think epochs
is useful for the most of the scenes, so maybe we can to add epochs
argument with a default value.
…ion cluster_train_loop to sqlflow_train_loop
…rm of custom model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the excellent PR!
Has mostly implement cluster model by subclassing of tensorflow.keras.
Still work in progress.
TODO: