diff --git a/menu/navigation.json b/menu/navigation.json index 6b0905c5db..0b084d5902 100644 --- a/menu/navigation.json +++ b/menu/navigation.json @@ -900,6 +900,10 @@ "label": "Monitor a deployment", "slug": "monitor-deployment" }, + { + "label": "Configure autoscaling", + "slug": "configure-autoscaling" + }, { "label": "Manage allowed IP addresses", "slug": "manage-allowed-ips" diff --git a/pages/managed-inference/how-to/configure-autoscaling.mdx b/pages/managed-inference/how-to/configure-autoscaling.mdx new file mode 100644 index 0000000000..7d3ccbe14d --- /dev/null +++ b/pages/managed-inference/how-to/configure-autoscaling.mdx @@ -0,0 +1,36 @@ +--- +meta: + title: How to scale Managed Inference deployments + description: This page explains how to scale Managed Inference deployments in size +content: + h1: How to manage scale Managed Inference deployments + paragraph: This page explains how to scale Managed Inference deployments in size +tags: managed-inference ai-data ip-address +dates: + validation: 2025-06-03 + posted: 2025-06-03 +categories: + - ai-data +--- + +You can scale your Managed Inference deployment up or down to match it to the incoming load of your deployment. + + + + - A Scaleway account logged into the [console](https://console.scaleway.com) + - A [Managed Inference deployment](/managed-inference/quickstart/) + - [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization + +## How to scale a Managed Inference deployment in size + +1. Click **Managed Inference** in the **AI** section of the [Scaleway console](https://console.scaleway.com) side menu. A list of your deployments displays. +2. Click a deployment name or > **More info** to access the deployment dashboard. +3. Click the **Settings** tab and navigate to the **Scaling** section. +4. Click **Update node count** and adjust the number of nodes in your deployment. + + High availability is only guaranteed with two or more nodes. + +5. Click **Update node count** to update the number of nodes in your deployment. + + Your deployment will be unavailable for 15-30 minutes while the node update is in progress. + \ No newline at end of file diff --git a/pages/managed-inference/how-to/create-deployment.mdx b/pages/managed-inference/how-to/create-deployment.mdx index ad5ed35260..a979180630 100644 --- a/pages/managed-inference/how-to/create-deployment.mdx +++ b/pages/managed-inference/how-to/create-deployment.mdx @@ -28,12 +28,16 @@ dates: - Choose the geographical **region** for the deployment. - Specify the GPU Instance type to be used with your deployment. -4. Enter a **name** for the deployment, and optional tags. -5. Configure the **network connectivity** settings for the deployment: +4. Choose the number of nodes for your deployment. + + High availability is only guaranteed with two or more nodes. + +5. Enter a **name** for the deployment, and optional tags. +6. Configure the **network connectivity** settings for the deployment: - Attach to a **Private Network** for secure communication and restricted availability. Choose an existing Private Network from the drop-down list, or create a new one. - Set up **Public connectivity** to access resources via the public internet. Authentication by API key is enabled by default. - Enabling both private and public connectivity will result in two distinct endpoints (public and private) for your deployment. - Deployments must have at least one endpoint, either public or private. -6. Click **Deploy model** to launch the deployment process. Once the model is ready, it will be listed among your deployments. \ No newline at end of file +7. Click **Deploy model** to launch the deployment process. Once the model is ready, it will be listed among your deployments. \ No newline at end of file