Closed
Description
Should we allow configurable [headgroup resources] ? (for development on a laptop with 8cpu x 16gb ram)
(
codeflare-sdk/src/codeflare_sdk/templates/base-template.yaml
Lines 145 to 152 in 52b94c4
The resources allocation with only the codeflare-stack (w/o any ODH component) was:
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 2445m (31%) 1100m (14%)
memory 8442Mi (55%) 768Mi (5%)
Creating a cluster on OpenShift local on a 8cpu x 16gb workstation would fail with insufficient resources.
cluster = Cluster(ClusterConfiguration(namespace="default", name="torch", min_worker=1, max_worker=1, min_cpus=1, max_cpus=1, min_memory=1, max_memory=1, gpu=0, instascale=False))
I0630 22:41:52.423038 1 queuejob_controller_ex.go:1009] [getAggAvaiResPri] cpu 5365.00, memory 7098806272.00, GPU 0 available resources to schedule
I0630 22:41:52.423066 1 queuejob_controller_ex.go:1260] [ScheduleNext] XQJ torch with resources cpu 3000.00, memory 9000000000.00, GPU 0 to be scheduled on aggregated idle resources cpu 5365.00, memory 7098806272.00, GPU 0
I0630 22:41:52.423204 1 queuejob_controller_ex.go:1336] [ScheduleNext] HOL Blocking by torch for 163.595µs activeQ=false Unsched=true &qj=0xc0007df900 Version=61385 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:9 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-06-30 22:40:12.010146 +0000 UTC ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-06-30 22:40:12.010149 +0000 UTC LastTransitionMicroTime:2023-06-30 22:40:12.01015 +0000 UTC Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-06-30 22:40:12.010603 +0000 UTC LastTransitionMicroTime:2023-06-30 22:40:12.010605 +0000 UTC Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-06-30 22:40:12.082065 +0000 UTC LastTransitionMicroTime:2023-06-30 22:40:12.082067 +0000 UTC Reason:FrontOfQueue. Message:} {Type:Backoff Status:True LastUpdateMicroTime:2023-06-30 22:40:32.322476 +0000 UTC LastTransitionMicroTime:2023-06-30 22:40:32.322478 +0000 UTC Reason:AppWrapperNotRunnable. Message:Insufficient resources to dispatch AppWrapper.}] PendingPodConditions:[]}
My openshift local config:
crc config view
- consent-telemetry : yes
- cpus : 8
- disk-size : 80
- memory : 16000
- network-mode : user
- pull-secret-file : /home/tedchang/secret.json
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Done