You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -12,11 +12,14 @@ This is the official Python SDK for [*refinery*](https://github.com/code-kern-ai
12
12
-[Fetching lookup lists](#fetching-lookup-lists)
13
13
-[Upload files](#upload-files)
14
14
-[Adapters](#adapters)
15
-
-[HuggingFace](#hugging-face)
16
-
-[Sklearn](#sklearn)
17
-
-[Rasa](#rasa)
18
-
-[What's missing?](#whats-missing)
19
-
-[Roadmap](#roadmap)
15
+
-[Sklearn](#sklearn-adapter)
16
+
-[PyTorch](#pytorch-adapter)
17
+
-[HuggingFace](#hugging-face-adapter)
18
+
-[Rasa](#rasa-adapter)
19
+
-[Callbacks](#callbacks)
20
+
-[Sklearn](#sklearn-callback)
21
+
-[PyTorch](#pytorch-callback)
22
+
-[HuggingFace](#hugging-face-callback)
20
23
-[Contributing](#contributing)
21
24
-[License](#license)
22
25
-[Contact](#contact)
@@ -122,7 +125,35 @@ Alternatively, you can `rsdk push <path-to-your-file>` via CLI, given that you h
122
125
123
126
### Adapters
124
127
125
-
#### Hugging Face
128
+
#### Sklearn Adapter
129
+
You can use *refinery* to directly pull data into a format you can apply for building [sklearn](https://github.com/scikit-learn/scikit-learn) models. This can look as follows:
130
+
131
+
```python
132
+
from refinery.adapter.sklearn import build_classification_dataset
133
+
from sklearn.tree import DecisionTreeClassifier
134
+
135
+
data = build_classification_dataset(client, "headline", "__clickbait", "distilbert-base-uncased")
Transformers are great, but often times, you want to finetune them for your downstream task. With *refinery*, you can do so easily by letting the SDK build the dataset for you that you can use as a plug-and-play base for your training:
127
158
128
159
```python
@@ -175,25 +206,7 @@ trainer.train()
175
206
trainer.save_model("path/to/model")
176
207
```
177
208
178
-
#### Sklearn
179
-
You can use *refinery* to directly pull data into a format you can apply for building [sklearn](https://github.com/scikit-learn/scikit-learn) models. This can look as follows:
180
-
181
-
```python
182
-
from refinery.adapter.sklearn import build_classification_dataset
183
-
from sklearn.tree import DecisionTreeClassifier
184
-
185
-
data = build_classification_dataset(client, "headline", "__clickbait", "distilbert-base-uncased")
By the way, we can highly recommend to combine this with [Truss](https://github.com/basetenlabs/truss) for easy model serving!
195
-
196
-
#### Rasa
209
+
#### Rasa Adapter
197
210
*refinery* is perfect to be used for building chatbots with [Rasa](https://github.com/RasaHQ/rasa). We've built an adapter with which you can easily create the required Rasa training data directly from *refinery*.
198
211
199
212
To do so, do the following:
@@ -278,18 +291,167 @@ nlu:
278
291
279
292
Please make sure to also create the further necessary files (`domain.yml`, `data/stories.yml` and `data/rules.yml`) if you want to train your Rasa chatbot. For further reference, see their [documentation](https://rasa.com/docs/rasa).
280
293
281
-
#### What's missing?
282
-
Let us know what open-source/closed-source NLP framework you are using, for which you'd like to have an adapter implemented in the SDK. To do so, simply create an issue in this repository with the tag "enhancement".
283
294
295
+
### Callbacks
296
+
If you want to feed your production model's predictions back into *refinery*, you can do so with any version greater than [1.2.1](https://github.com/code-kern-ai/refinery/releases/tag/v1.2.1).
284
297
285
-
## Roadmap
286
-
- [ ] Register heuristics via wrappers
287
-
- [ ] Up/download zipped projects for versioning via DVC
288
-
- [x] Add project upload
289
-
- [x] Fetch project statistics
298
+
To do so, we have a generalistic interface and framework-specific classes.
290
299
300
+
#### Sklearn Callback
301
+
If you want to train a scikit-learn model an feed its outputs back into the refinery, you can do so easily as follows:
302
+
303
+
```python
304
+
from sklearn.linear_model import LogisticRegression
305
+
clf = LogisticRegression() # we use this as an example, but you can use any model implementing predict_proba
306
+
307
+
from refinery.adapter.sklearn import build_classification_dataset
308
+
data = build_classification_dataset(client, "headline", "__clickbait", "distilbert-base-uncased")
If you want to have something added, feel free to open an [issue](https://github.com/code-kern-ai/refinery-python-sdk/issues).
293
455
294
456
## Contributing
295
457
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.
Builds a classification dataset from a refinery client and a config string.
42
+
43
+
Args:
44
+
client (Client): Refinery client
45
+
sentence_input (str): Name of the column containing the sentence input.
46
+
classification_label (str): Name of the label; if this is a task on the full record, enter the string with as "__<label>". Else, input it as "<attribute>__<label>".
47
+
config_string (Optional[str], optional): Config string for the TransformerSentenceEmbedder. Defaults to None; if None is provided, the text will not be embedded.
48
+
num_train (Optional[int], optional): Number of training examples to use. Defaults to None; if None is provided, all examples will be used.
49
+
50
+
Returns:
51
+
Tuple[DataLoader, DataLoader, preprocessing.LabelEncoder]: Tuple of train and test dataloaders, and the label encoder.
0 commit comments