Closed
Description
Description
- PCUI has been amazing. Thank you. My bug:
- I am running a snakemake pipeline which has ~2000 tasks to complete, and I am allowing 200 jobs to be in queue at a time (limited by my max spot quota as well).
- When I go to the jobs list in PCUI, it stalls a bit, error messages begin to appear, and eventually behind them the list of jobs appears.
Steps to reproduce the issue
- Launch a lot of jobs, open the jobs tab in PCUI.
Expected behaviour
Job status errors
- To see the jobs list as it appears with fewer jobs in queue.
Actual behaviour
- Open
Jobs status
- Spins a moment
- Every 10s or so, an error appears in a red bar. There are a variety:
Error: Expecting property name enclosed in double quotes: line 1176 column 5 (char 23980)
Error: Expecting value: line 1177 column 1 (char 23980)
Error: Unterminated string starting at: line 1176 column 5 (char 23973)
Error: Expecting property name enclosed in double quotes: line 1176 column 4 (char 23980)

- jobs list does begin to appear.
- Once the jobs list has appeared, it seems to load w/out error messages for a while.
Job ID Link Error
- This is a bug that happens with every job, irrespective of the large number of jobs causing errors I report above. When clicking on a Job status
ID
from theID
column, I get the following error for every ID:
Error: not enough values to unpack (expected 2, got 1)
- NEW and only occuring when the large number of jobs behavior is seen, I also get this error:
Error: An error occurred while trying to complete your request. Please try again later. If the problem persists, please contact support for further assistance.

Required info
In order to help us determine the root cause of the issue, please provide the following information:
- Region PCUI : us-west-2
- AZ of cluster: us-west-2d
- version PCUI: public.ecr.aws/pcm/parallelcluster-ui:2024.10.0 (is this it?)
- version pcluster: 3.11.1
Additional info
The following information is not required but helpful:
- I connect to the pcui from a mac via chrome
If having problems with cluster creation or update
My cluster yaml:
---
Region: us-west-2
Image:
Os: ubuntu2204
HeadNode:
InstanceType: r7i.2xlarge
Networking:
ElasticIp: true
SubnetId: subnet-pub
DisableSimultaneousMultithreading: false
Ssh:
KeyName: KEY # must be ed25519 for ubuntu
AllowedIps: "0.0.0.0/0" # SET THIS TO YOUR DESIRED FILTER
Dcv:
Enabled: false
LocalStorage:
RootVolume:
Size: 775
VolumeType: gp3
DeleteOnTermination: true
EphemeralVolume:
MountDir: /head_root
CustomActions:
OnNodeConfigured:
Script:
s3://BUCKET/cluster_boot_config/post_install_ubuntu_combined.sh # head and each compute can have different scripts if desired
Args:
- us-west-2
- BUCKET
- na
- na
Iam:
S3Access:
- BucketName: BUCKET
EnableWriteAccess: false
AdditionalIamPolicies:
- Policy: arn:aws:iam::acct:policy/pclusterTagsAndBudget
- Policy: arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
Scheduling:
Scheduler: slurm
SlurmSettings:
EnableMemoryBasedScheduling: false
ScaledownIdletime: 5
Dns:
DisableManagedDns: false
QueueUpdateStrategy: DRAIN
SlurmQueues:
- Name: i8
CapacityType: SPOT
AllocationStrategy: lowest-price
ComputeResources:
- Name: r7gb64
Instances:
- InstanceType: r7i.2xlarge
MinCount: 0
MaxCount: 22
SpotPrice: 1.2488 # Calculated using (median spot price)+1.01.
Networking:
PlacementGroup:
Enabled: false
Efa:
Enabled: false
- Name: r6gb64
Instances:
- InstanceType: r6i.2xlarge
MinCount: 0
MaxCount: 22
SpotPrice: 1.2462 # Calculated using (median spot price)+1.01.
Networking:
PlacementGroup:
Enabled: false
Efa:
Enabled: false
Networking:
SubnetIds:
- subnet-012424a948f57e9ee
CustomActions:
OnNodeConfigured:
Script:
s3://BUCKET2/cluster_boot_config/post_install_ubuntu_combined.sh
Args:
- us-west-2
- BUCKET
- na
- na
Iam:
S3Access:
- BucketName: BUCKET
EnableWriteAccess: false
AdditionalIamPolicies:
- Policy: arn:aws:iam::acct:policy/pclusterTagsAndBudget
- Policy: arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
- Name: i128
CapacityType: SPOT
AllocationStrategy: lowest-price
ComputeResources:
- Name: c6gb256
Instances:
- InstanceType: c6i.metal
- InstanceType: c6i.32xlarge
MinCount: 0
MaxCount: 22
SpotPrice: 1.8034 # Calculated using (median spot price)+1.01.
Networking:
PlacementGroup:
Enabled: false
Efa:
Enabled: false
- Name: m6gb512
Instances:
- InstanceType: m6i.32xlarge
- InstanceType: m6i.metal
MinCount: 0
MaxCount: 22
SpotPrice: 2.1581 # Calculated using (median spot price)+1.01.
Networking:
PlacementGroup:
Enabled: false
Efa:
Enabled: false
- Name: r6gb1024r6
Instances:
- InstanceType: r6i.metal
- InstanceType: r6i.32xlarge
MinCount: 0
MaxCount: 22
SpotPrice: 2.0494 # Calculated using (median spot price)+1.01.
Networking:
PlacementGroup:
Enabled: false
Efa:
Enabled: false
Networking:
SubnetIds:
- subnet-012424a948f57e9ee
CustomActions:
OnNodeConfigured:
Script:
s3://BUCKET2/cluster_boot_config/post_install_ubuntu_combined.sh
Args:
- us-west-2
- BUCKET
- na
- na
Iam:
S3Access:
- BucketName: BUCKET
EnableWriteAccess: false
AdditionalIamPolicies:
- Policy: arn:aws:iam::acct:policy/pclusterTagsAndBudget
- Policy: arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
- Name: i192
CapacityType: SPOT
AllocationStrategy: lowest-price
ComputeResources:
- Name: c7gb384
Instances:
- InstanceType: c7i.48xlarge
- InstanceType: c7i.metal-48xl
MinCount: 0
MaxCount: 22
SpotPrice: 2.4093 # Calculated using (median spot price)+1.01.
Networking:
PlacementGroup:
Enabled: false
Efa:
Enabled: false
- Name: m7gb768
Instances:
- InstanceType: m7i.metal-48xl
- InstanceType: m7i.48xlarge
MinCount: 0
MaxCount: 22
SpotPrice: 2.5016 # Calculated using (median spot price)+1.01.
Networking:
PlacementGroup:
Enabled: false
Efa:
Enabled: false
- Name: r7gb1536
Instances:
- InstanceType: r7i.48xlarge
- InstanceType: r7i.metal-48xl
MinCount: 0
MaxCount: 22
SpotPrice: 2.209 # Calculated using (median spot price)+1.01.
Networking:
PlacementGroup:
Enabled: false
Efa:
Enabled: false
Networking:
SubnetIds:
- subnet-012424a948f57e9ee
CustomActions:
OnNodeConfigured:
Script:
s3://BUCKET/cluster_boot_config/post_install_ubuntu_combined.sh
Args:
- us-west-2
- BUCKET
- na
- na
Iam:
S3Access:
- BucketName: BUCKET
EnableWriteAccess: false
AdditionalIamPolicies:
- Policy: arn:aws:iam::acct:policy/pclusterTagsAndBudget
- Policy: arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
Monitoring:
DetailedMonitoring: false
Logs:
CloudWatch:
Enabled: true
RetentionInDays: 3 # must be 0,1,3,5,7,14,30,60,90...
SharedStorage: # This is the local FS which is expensive but fast, could be swapped for EFS, etc.
- MountDir: /fsx # The cost of this will be roughly $22.93 per day, so it should not be kept hot unless in active use.
Name: fsx-daylily-07123j # WARNING, EDIT NAME WILL DEL EXISTING DATA
StorageType: FsxLustre
FsxLustreSettings:
ImportPath: s3://BUCKET/data/
StorageCapacity: 4800
DeploymentType: SCRATCH_2
AutoImportPolicy: NEW_CHANGED_DELETED
DeletionPolicy: Retain # Set to true to keep the FSX after the cluster is deleted
Tags: # TAGs necessary for per-user/project/job cost tracking
- Key: aws-parallelcluster-username
Value: daylily
- Key: aws-parallelcluster-jobid
Value: NA
- Key: aws-parallelcluster-project
Value: da-us-west-2d-daylily-07123j
- Key: aws-parallelcluster-clustername
Value: daylily-07123j
- Key: aws-parallelcluster-enforce-budget
Value: enforce
DevSettings:
Timeouts:
HeadNodeBootstrapTimeout: 3600
ComputeNodeBootstrapTimeout: 3600
...
If having problems with custom image creation
n/a