Extend notebook to include logging feedback feature

Mats-SX · breakanalysis · brs96 · Mats-SX · commit d898adbd0633 · 2023-08-08T17:09:17.000+02:00
Co-authored-by: Jacob Sznajdman &lt;breakanalysis@gmail.com&gt;
Co-authored-by: Brian Shi &lt;brian.shi@neotechnology.com&gt;
diff --git a/examples/python-runtime.ipynb b/examples/python-runtime.ipynb
@@ -71,19 +71,41 @@
     "It happens asynchronously, so it will return immediately (unless there's an unexpected error 😱).\n",
     "Of course, the training does not complete instantly, so you will have to wait for it to finish.\n",
     "\n",
-    "TODO: instructions for inspecting the log\n",
+    "## Observing the training progress\n",
+    "\n",
+    "You can observe the training progress by watching the logs.\n",
+    "This is done in the subsequent cell.\n",
+    "The watching doesn't automatically stop, so you will have to stop it manually.\n",
+    "Once you see the message 'Training Done', you can interrupt the cell and continue.\n",
+    "\n",
+    "## Graph and training parameters\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "| Parameter          | Default        | Type           | Description                                                                                                                                                                           |\n",
+    "|--------------------|----------------|----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n",
+    "| graph_name         | -              | str            | The name of the graph to train on.                                                                                                                                                    |\n",
+    "| model_name         | -              | str            | The name of the model. Must be unique per database and username combination. Models cannot be cleaned up at this time.                                                                |\n",
+    "| feature_properties | -              | List[str]      | The node properties to use as model features.                                                                                                                                         |\n",
+    "| target_property    | -              | str            | The node property that contains the target class values.                                                                                                                              |\n",
+    "| node_labels        | None           | List[str]      | The node labels to use for training. By default, all labels are used.                                                                                                                 |\n",
+    "| relationship_types | None           | List[str]      | The relationship types to use for training. By default, all types are used.                                                                                                           |\n",
+    "| target_node_label  | None           | str            | Indicates the nodes used for training. Only nodes with this label need to have the `target_property` defined. Other nodes are used for context. By default, all nodes are considered. |\n",
+    "| graph_sage_config  | None           | dict           | Configuration for the GraphSAGE training. See below.                                                                                                                                  |\n",
+    "\n",
     "\n",
     "## GraphSAGE parameters\n",
     "\n",
     "We have exposed several parameters of the PyG GraphSAGE model.\n",
     "\n",
-    "| Parameter       | Default  | Description |\n",
-    "|-----------------|----------|-------------|\n",
-    "| layer_config    | {}       | ???         |\n",
-    "| num_neighbors   | [25, 10] | ???         |\n",
-    "| dropout         | 0.5      | ???         |\n",
-    "| hidden_channels | 256      | ???         |\n",
-    "| learning_rate   | 0.003    | ???         |\n",
+    "| Parameter       | Default  | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                   |\n",
+    "|-----------------|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n",
+    "| layer_config    | {}       | Configuration of the GraphSAGE layers. It supports `aggr`, `normalize`, `root_weight`, `project`, `bias` from [this link](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.conv.SAGEConv.html). Additionally, you can provide message passing configuration from [this link](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.conv.MessagePassing.html#torch_geometric.nn.conv.MessagePassing). |\n",
+    "| num_neighbors   | [25, 10] | Sample sizes for each layer. The length of this list is the number of layers used. All numbers must be >0.                                                                                                                                                                                                                                                                                                                                                    |\n",
+    "| dropout         | 0.5      | Probability of dropping out neurons during training. Must be between 0 and 1.                                                                                                                                                                                                                                                                                                                                                                                 |\n",
+    "| hidden_channels | 256      | The dimension of each hidden layer. Higher value means more expensive training, but higher level of representation. Must be >0.                                                                                                                                                                                                                                                                                                                               |\n",
+    "| learning_rate   | 0.003    | The learning rate. Must be >0.                                                                                                                                                                                                                                                                                                                                                                                                                                |\n",
     "\n",
     "Please try to use any of them with any useful values.\n"
    ]
@@ -95,11 +117,21 @@
    "outputs": [],
    "source": [
     "# Let's train!\n",
-    "train_response = gds.gnn.nodeClassification.train(\n",
+    "job_id = gds.gnn.nodeClassification.train(\n",
     "    \"cora\", \"myModel\", [\"features\"], \"subject\", [\"CITES\"], target_node_label=\"Paper\", node_labels=[\"Paper\"]\n",
     ")"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# And let's follow the progress by watching the logs\n",
+    "gds.gnn.nodeClassification.watch_logs(job_id)"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -134,8 +166,7 @@
     "In this case, we will use it to predict the subject of papers in the Cora dataset.\n",
     "\n",
     "Again, this call is asynchronous, so it will return immediately.\n",
-    "\n",
-    "TODO: instructions for inspecting the log\n",
+    "Observe the progress by watching the logs.\n",
     "\n",
     "Once the prediction is completed, the predicted classes are added to GDS Graph Catalog (as per normal).\n",
     "We can retrieve the prediction result (the predictions themselves) by streaming from the graph.\n"
@@ -148,7 +179,17 @@
    "outputs": [],
    "source": [
     "# Let's trigger prediction!\n",
-    "predict_result = gds.gnn.nodeClassification.predict(\"cora\", \"myModel\", \"myPredictions\")"
+    "job_id = gds.gnn.nodeClassification.predict(\"cora\", \"myModel\", \"myPredictions\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# And let's follow progress by watching the logs\n",
+    "gds.gnn.nodeClassification.watch_logs(job_id)"
    ]
   },
   {
@@ -157,7 +198,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Let's get a graph object\n",
+    "# Now that prediction is done, let's see the predictions\n",
     "cora = gds.graph.get(\"cora\")"
    ]
   },
@@ -194,6 +235,7 @@
     "Thank you very much for participating in the testing.\n",
     "We hope you enjoyed it.\n",
     "If you've run the notebook for the first time, now's the time to experiment and changing graph, training parameters, etc.\n",
+    "For example, try out a heterogeneous graph problem? Or whether performance can be improved by changing some parameter? Run training jobs in parallel, on multiple databases?\n",
     "If you're feeling like you're done, please reach back to the Google Document and fill in our feedback form.\n",
     "\n",
     "Thank you!"