Skip to content

Commit 6e57786

Browse files
committed
Updated to num_workers, num_gpus, and demo update
1 parent b06ea19 commit 6e57786

File tree

9 files changed

+1173
-21
lines changed

9 files changed

+1173
-21
lines changed
Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "8d4a42f6",
6+
"metadata": {},
7+
"source": [
8+
"In this first notebook, we will go through the basics of using the SDK to:\n",
9+
" - Spin up a Ray cluster with our desired resources\n",
10+
" - View the status and specs of our Ray cluster\n",
11+
" - Take down the Ray cluster when finished"
12+
]
13+
},
14+
{
15+
"cell_type": "code",
16+
"execution_count": null,
17+
"id": "b55bc3ea-4ce3-49bf-bb1f-e209de8ca47a",
18+
"metadata": {},
19+
"outputs": [],
20+
"source": [
21+
"# Import pieces from codeflare-sdk\n",
22+
"from codeflare_sdk.cluster.cluster import Cluster, ClusterConfiguration\n",
23+
"from codeflare_sdk.cluster.auth import TokenAuthentication"
24+
]
25+
},
26+
{
27+
"cell_type": "code",
28+
"execution_count": null,
29+
"id": "614daa0c",
30+
"metadata": {},
31+
"outputs": [],
32+
"source": [
33+
"# Create authentication object for oc user permissions\n",
34+
"auth = TokenAuthentication(\n",
35+
" token = \"XXXXX\",\n",
36+
" server = \"XXXXX\",\n",
37+
" skip_tls=False\n",
38+
")\n",
39+
"auth.login()"
40+
]
41+
},
42+
{
43+
"cell_type": "markdown",
44+
"id": "bc27f84c",
45+
"metadata": {},
46+
"source": [
47+
"Here, we want to define our cluster by specifying the resources we require for our batch workload. Below, we define our cluster object (which generates a corresponding AppWrapper)."
48+
]
49+
},
50+
{
51+
"cell_type": "code",
52+
"execution_count": null,
53+
"id": "0f4bc870-091f-4e11-9642-cba145710159",
54+
"metadata": {},
55+
"outputs": [],
56+
"source": [
57+
"# Create and configure our cluster object (and appwrapper)\n",
58+
"cluster = Cluster(ClusterConfiguration(\n",
59+
" name='raytest',\n",
60+
" namespace='default',\n",
61+
" num_workers=2,\n",
62+
" min_cpus=1,\n",
63+
" max_cpus=1,\n",
64+
" min_memory=4,\n",
65+
" max_memory=4,\n",
66+
" num_gpus=0,\n",
67+
" instascale=False\n",
68+
"))"
69+
]
70+
},
71+
{
72+
"cell_type": "markdown",
73+
"id": "12eef53c",
74+
"metadata": {},
75+
"source": [
76+
"Next, we want to bring our cluster up, so we call the `up()` function below to submit our cluster AppWrapper yaml onto the MCAD queue, and begin the process of obtaining our resource cluster."
77+
]
78+
},
79+
{
80+
"cell_type": "code",
81+
"execution_count": null,
82+
"id": "f0884bbc-c224-4ca0-98a0-02dfa09c2200",
83+
"metadata": {},
84+
"outputs": [],
85+
"source": [
86+
"# Bring up the cluster\n",
87+
"cluster.up()"
88+
]
89+
},
90+
{
91+
"cell_type": "markdown",
92+
"id": "657ebdfb",
93+
"metadata": {},
94+
"source": [
95+
"Now, we want to check on the status of our resource cluster, and wait until it is finally ready for use."
96+
]
97+
},
98+
{
99+
"cell_type": "code",
100+
"execution_count": null,
101+
"id": "3c1b4311-2e61-44c9-8225-87c2db11363d",
102+
"metadata": {},
103+
"outputs": [],
104+
"source": [
105+
"cluster.status()"
106+
]
107+
},
108+
{
109+
"cell_type": "code",
110+
"execution_count": null,
111+
"id": "a99d5aff",
112+
"metadata": {},
113+
"outputs": [],
114+
"source": [
115+
"cluster.wait_ready()"
116+
]
117+
},
118+
{
119+
"cell_type": "code",
120+
"execution_count": null,
121+
"id": "df71c1ed",
122+
"metadata": {},
123+
"outputs": [],
124+
"source": [
125+
"cluster.status()"
126+
]
127+
},
128+
{
129+
"cell_type": "markdown",
130+
"id": "b3a55fe4",
131+
"metadata": {},
132+
"source": [
133+
"Let's quickly verify that the specs of the cluster are as expected."
134+
]
135+
},
136+
{
137+
"cell_type": "code",
138+
"execution_count": null,
139+
"id": "7fd45bc5-03c0-4ae5-9ec5-dd1c30f1a084",
140+
"metadata": {},
141+
"outputs": [],
142+
"source": [
143+
"cluster.details()"
144+
]
145+
},
146+
{
147+
"cell_type": "markdown",
148+
"id": "5af8cd32",
149+
"metadata": {},
150+
"source": [
151+
"Finally, we bring our resource cluster down and release/terminate the associated resources, bringing everything back to the way it was before our cluster was brought up."
152+
]
153+
},
154+
{
155+
"cell_type": "code",
156+
"execution_count": null,
157+
"id": "5f36db0f-31f6-4373-9503-dc3c1c4c3f57",
158+
"metadata": {},
159+
"outputs": [],
160+
"source": [
161+
"cluster.down()"
162+
]
163+
},
164+
{
165+
"cell_type": "code",
166+
"execution_count": null,
167+
"id": "0d41b90e",
168+
"metadata": {},
169+
"outputs": [],
170+
"source": [
171+
"auth.logout()"
172+
]
173+
}
174+
],
175+
"metadata": {
176+
"kernelspec": {
177+
"display_name": "Python 3 (ipykernel)",
178+
"language": "python",
179+
"name": "python3"
180+
},
181+
"language_info": {
182+
"codemirror_mode": {
183+
"name": "ipython",
184+
"version": 3
185+
},
186+
"file_extension": ".py",
187+
"mimetype": "text/x-python",
188+
"name": "python",
189+
"nbconvert_exporter": "python",
190+
"pygments_lexer": "ipython3",
191+
"version": "3.8.13"
192+
},
193+
"vscode": {
194+
"interpreter": {
195+
"hash": "f9f85f796d01129d0dd105a088854619f454435301f6ffec2fea96ecbd9be4ac"
196+
}
197+
}
198+
},
199+
"nbformat": 4,
200+
"nbformat_minor": 5
201+
}
Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "9865ee8c",
6+
"metadata": {},
7+
"source": [
8+
"In this second notebook, we will go over the basics of using InstaScale to scale up/down necessary resources that are not currently available on your OpenShift Cluster (in cloud environments)."
9+
]
10+
},
11+
{
12+
"cell_type": "code",
13+
"execution_count": null,
14+
"id": "b55bc3ea-4ce3-49bf-bb1f-e209de8ca47a",
15+
"metadata": {},
16+
"outputs": [],
17+
"source": [
18+
"# Import pieces from codeflare-sdk\n",
19+
"from codeflare_sdk.cluster.cluster import Cluster, ClusterConfiguration\n",
20+
"from codeflare_sdk.cluster.auth import TokenAuthentication"
21+
]
22+
},
23+
{
24+
"cell_type": "code",
25+
"execution_count": null,
26+
"id": "614daa0c",
27+
"metadata": {},
28+
"outputs": [],
29+
"source": [
30+
"# Create authentication object for oc user permissions\n",
31+
"auth = TokenAuthentication(\n",
32+
" token = \"XXXXX\",\n",
33+
" server = \"XXXXX\",\n",
34+
" skip_tls=False\n",
35+
")\n",
36+
"auth.login()"
37+
]
38+
},
39+
{
40+
"cell_type": "markdown",
41+
"id": "bc27f84c",
42+
"metadata": {},
43+
"source": [
44+
"This time, we are working in a cloud environment, and our OpenShift cluster does not have the resources needed for our desired workloads. We will use InstaScale to dynamically scale-up guaranteed resources based on our request (that will also automatically scale-down when we are finished working):"
45+
]
46+
},
47+
{
48+
"cell_type": "code",
49+
"execution_count": null,
50+
"id": "0f4bc870-091f-4e11-9642-cba145710159",
51+
"metadata": {},
52+
"outputs": [],
53+
"source": [
54+
"# Create and configure our cluster object (and appwrapper)\n",
55+
"cluster = Cluster(ClusterConfiguration(\n",
56+
" name='instascaletest',\n",
57+
" namespace='default',\n",
58+
" num_workers=2,\n",
59+
" min_cpus=2,\n",
60+
" max_cpus=2,\n",
61+
" min_memory=8,\n",
62+
" max_memory=8,\n",
63+
" num_gpus=1,\n",
64+
" instascale=True, # InstaScale now enabled, will scale OCP cluster to guarantee resource request\n",
65+
" machine_types=[\"m5.xlarge\", \"g4dn.xlarge\"] # Head, worker AWS machine types desired\n",
66+
"))"
67+
]
68+
},
69+
{
70+
"cell_type": "markdown",
71+
"id": "12eef53c",
72+
"metadata": {},
73+
"source": [
74+
"Same as last time, we will bring the cluster up, wait for it to be ready, and confirm that the specs are as-requested:"
75+
]
76+
},
77+
{
78+
"cell_type": "code",
79+
"execution_count": null,
80+
"id": "f0884bbc-c224-4ca0-98a0-02dfa09c2200",
81+
"metadata": {},
82+
"outputs": [],
83+
"source": [
84+
"# Bring up the cluster\n",
85+
"cluster.up()\n",
86+
"cluster.wait_ready()"
87+
]
88+
},
89+
{
90+
"cell_type": "markdown",
91+
"id": "6abfe904",
92+
"metadata": {},
93+
"source": [
94+
"While the resources are being scaled, we can also go into the console and take a look at the InstaScale logs, as well as the new machines/nodes spinning up.\n",
95+
"\n",
96+
"Once the cluster is ready, we can confirm the specs:"
97+
]
98+
},
99+
{
100+
"cell_type": "code",
101+
"execution_count": null,
102+
"id": "7fd45bc5-03c0-4ae5-9ec5-dd1c30f1a084",
103+
"metadata": {},
104+
"outputs": [],
105+
"source": [
106+
"cluster.details()"
107+
]
108+
},
109+
{
110+
"cell_type": "markdown",
111+
"id": "5af8cd32",
112+
"metadata": {},
113+
"source": [
114+
"Finally, we bring our resource cluster down and release/terminate the associated resources, bringing everything back to the way it was before our cluster was brought up."
115+
]
116+
},
117+
{
118+
"cell_type": "code",
119+
"execution_count": null,
120+
"id": "5f36db0f-31f6-4373-9503-dc3c1c4c3f57",
121+
"metadata": {},
122+
"outputs": [],
123+
"source": [
124+
"cluster.down()"
125+
]
126+
},
127+
{
128+
"cell_type": "markdown",
129+
"id": "c883caea",
130+
"metadata": {},
131+
"source": [
132+
"Once again, we can look at the machines/nodes and see that everything has been successfully scaled down!"
133+
]
134+
},
135+
{
136+
"cell_type": "code",
137+
"execution_count": null,
138+
"id": "0d41b90e",
139+
"metadata": {},
140+
"outputs": [],
141+
"source": [
142+
"auth.logout()"
143+
]
144+
}
145+
],
146+
"metadata": {
147+
"kernelspec": {
148+
"display_name": "Python 3 (ipykernel)",
149+
"language": "python",
150+
"name": "python3"
151+
},
152+
"language_info": {
153+
"codemirror_mode": {
154+
"name": "ipython",
155+
"version": 3
156+
},
157+
"file_extension": ".py",
158+
"mimetype": "text/x-python",
159+
"name": "python",
160+
"nbconvert_exporter": "python",
161+
"pygments_lexer": "ipython3",
162+
"version": "3.8.13"
163+
},
164+
"vscode": {
165+
"interpreter": {
166+
"hash": "f9f85f796d01129d0dd105a088854619f454435301f6ffec2fea96ecbd9be4ac"
167+
}
168+
}
169+
},
170+
"nbformat": 4,
171+
"nbformat_minor": 5
172+
}

0 commit comments

Comments
 (0)