-
Notifications
You must be signed in to change notification settings - Fork 218
Default resources java options #1775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… pods have good default cpu/memory resources
… pods have good default cpu/memory resources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will need to re-review the FAQ after all the literary and technical edits are completed. Please let me know when it's done.
@@ -0,0 +1,84 @@ | |||
--- | |||
title: "Considerations for Pod Resource (Memory and CPU) Requests and Limits" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considerations for Pod Resource (Memory and CPU) Requests and Limits -> Considerations for Pod resource (memory and CPU) requests and limits (we use sentence capitalization instead of title capitalization; appears more user friendly)
draft: true | ||
weight: 40 | ||
--- | ||
The operator creates a pod for each running WebLogic Server instance and each pod will have a container. It.s important that containers have enough resources in order for applications to run efficiently and expeditiously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pod -> Pod (globally, if you are referring to a Kubernetes resource)
It.s -> It's (typo)
--- | ||
The operator creates a pod for each running WebLogic Server instance and each pod will have a container. It.s important that containers have enough resources in order for applications to run efficiently and expeditiously. | ||
|
||
If a pod is scheduled on a node with limited resources, it.s possible for the node to run out of memory or CPU resources, and for applications to stop working properly or have degraded performance. It.s also possible for a rouge application to use all available memory and/or CPU, which makes other containers running on the same system unresponsive. The same problem can happen if an application has memory leak or bad configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node -> Node (globally, if you are referring to a Kubernetes resource)
It.s -> It's (typo, globally)
rouge -> rogue
has memory leak -> has a memory leak
|
||
If a pod is scheduled on a node with limited resources, it.s possible for the node to run out of memory or CPU resources, and for applications to stop working properly or have degraded performance. It.s also possible for a rouge application to use all available memory and/or CPU, which makes other containers running on the same system unresponsive. The same problem can happen if an application has memory leak or bad configuration. | ||
|
||
A pod.s resource requests and limit parameters can be used to solve these problems. Setting resource limits prevents an application from using more than it.s share of resource. Thus, limiting resources improves reliability and stability of applications. It also allows users to plan for the hardware capacity. Additionally, pod.s priority and the Quality of Service (QoS) that pod receives is affected by whether resource requests and limits are specified or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pod.s -> pod's (typo, globally; your apostrophes are periods)
of resource -> of resources
It also allows users to plan -> Also, it lets you plan
pod.s priority and the Quality of Service (QoS) that pod receives -> the pod's priority and Quality of Service (QoS) that the pod receives
are specified or not. -> are specified.
|
||
A pod.s resource requests and limit parameters can be used to solve these problems. Setting resource limits prevents an application from using more than it.s share of resource. Thus, limiting resources improves reliability and stability of applications. It also allows users to plan for the hardware capacity. Additionally, pod.s priority and the Quality of Service (QoS) that pod receives is affected by whether resource requests and limits are specified or not. | ||
|
||
## Pod Quality Of Service (QoS) and Prioritization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prioritization -> prioritization
|
||
## Beware of setting resource limits too high | ||
It.s important to keep in mind that if you set a value of CPU core count that.s larger than core count of the biggest node, then the pod will never be scheduled. Let.s say you have a pod that needs 4 cores but you have a kubernetes cluster that.s comprised of 2 core VMs. In this case, your pod will never be scheduled. WebLogic applications are normally designed to take advantage of multiple cores and should be given CPU requests as such. CPUs are considered as a compressible resource. If your apps are hitting CPU limits, kubernetes will start to throttle your container. This means your CPU will be artificially restricted, giving your app potentially worse performance. However it won.t be terminated or evicted. | ||
Just like CPU, if you put a memory request that.s larger than amount of memory on your nodes, the pod will never be scheduled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
than amount -> than he amount
## Beware of setting resource limits too high | ||
It.s important to keep in mind that if you set a value of CPU core count that.s larger than core count of the biggest node, then the pod will never be scheduled. Let.s say you have a pod that needs 4 cores but you have a kubernetes cluster that.s comprised of 2 core VMs. In this case, your pod will never be scheduled. WebLogic applications are normally designed to take advantage of multiple cores and should be given CPU requests as such. CPUs are considered as a compressible resource. If your apps are hitting CPU limits, kubernetes will start to throttle your container. This means your CPU will be artificially restricted, giving your app potentially worse performance. However it won.t be terminated or evicted. | ||
Just like CPU, if you put a memory request that.s larger than amount of memory on your nodes, the pod will never be scheduled. | ||
## CPU Affinity and lock contention in k8s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Affinity -> affinity
k8s -> Kubernetes (globally)
It.s important to keep in mind that if you set a value of CPU core count that.s larger than core count of the biggest node, then the pod will never be scheduled. Let.s say you have a pod that needs 4 cores but you have a kubernetes cluster that.s comprised of 2 core VMs. In this case, your pod will never be scheduled. WebLogic applications are normally designed to take advantage of multiple cores and should be given CPU requests as such. CPUs are considered as a compressible resource. If your apps are hitting CPU limits, kubernetes will start to throttle your container. This means your CPU will be artificially restricted, giving your app potentially worse performance. However it won.t be terminated or evicted. | ||
Just like CPU, if you put a memory request that.s larger than amount of memory on your nodes, the pod will never be scheduled. | ||
## CPU Affinity and lock contention in k8s | ||
We observed much higher lock contention in k8s env when running some workloads in kubernetes as compared to traditional env. The lock contention seem to be caused by the lack of CPU cache affinity and/or scheduling latency when the workload moves to different CPU cores. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
env -> environment (globally)
seem -> seems
## CPU Affinity and lock contention in k8s | ||
We observed much higher lock contention in k8s env when running some workloads in kubernetes as compared to traditional env. The lock contention seem to be caused by the lack of CPU cache affinity and/or scheduling latency when the workload moves to different CPU cores. | ||
|
||
In traditional (non-k8s) environment, often tests are run with CPU affinity achieved by binding WLS java process to particular CPU core(s) (using taskset command). This results in reduced lock contention and better performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
environment -> environments
java -> Java (always capitalized)
using taskset -> using the taskset
|
||
In traditional (non-k8s) environment, often tests are run with CPU affinity achieved by binding WLS java process to particular CPU core(s) (using taskset command). This results in reduced lock contention and better performance. | ||
|
||
In k8s environment. when CPU manager policy is configured to be "static" and QOS is "Guaranteed" for WLS pods, we see reduced lock contention and better performance. The default CPU manager policy is "none" (default). Please refer to controlling CPU management policies for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when CPU manager policy -> when the CPU manager policy
QOS -> QoS
|
||
An operator creates a pod for each WebLogic Server instance and each pod will have a container. It's important that the container has enough resources in order for WebLogic to run efficiently. | ||
|
||
If a pod is scheduled on a node with limited resources, then it's possible for node to run out of memory or CPU resources, and for the pod's applications to stop working properly or have degraded performance. It's also possible for a rogue application to use all of a node's available memory and/or CPU, which makes other containers running on the same node unresponsive. The same problem can happen if an application has a memory leak. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrasing, "It's also possible for a rogue application..." sounds scarier than I think we need. Customers understand that application and server usage of resources need to be controlled and limited. We might borrow phrases from the current WebLogic sizing guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have rephrased the introduction section and removed the phrasing about "rogue application using up all available resources".
For most use cases, Oracle recommends configuring WebLogic pods with memory and CPU requests and limits, and furthermore setting requests equal to their respective limits in order to ensure a `guaranteed` QoS. | ||
{{% /notice %}} | ||
|
||
In newer version of Kubernetes, it is possible to fine tune scheduling and eviction policies using [Pod Priority Preemption](https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/). TBD Ryan - is it possible to change the priority class of a pod? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have serverPod.priorityClassName.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. We have included wording about priority class tuning and two PriorityClasses shipped with Kubernetes.
value: "--XX:MinRAMPercentage=25.0 --XX:MaxRAMPercentage=50.0 -Djava.security.egd=file:/dev/./urandom" | ||
``` | ||
|
||
Additionally there's also a node-manager process that's running in the same container as the WebLogic Server which has its own heap and off-heap requirements. Its heap is tuned by using `-Xms` and `-Xmx` in the `NODEMGR_MEM_ARGS` environment variable. Oracle recommends setting the node manager heap memory to fixed sizes, instead of percentages, where [the default tuning]({{< relref "/userguide/managing-domains/domain-resource#jvm-memory-and-java-option-environment-variables" >}}) is usually sufficient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will readers understand "off-heap" requirements? I've heard this described as native memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed "off-heap" to "native memory".
Additionally there's also a node-manager process that's running in the same container as the WebLogic Server which has its own heap and off-heap requirements. Its heap is tuned by using `-Xms` and `-Xmx` in the `NODEMGR_MEM_ARGS` environment variable. Oracle recommends setting the node manager heap memory to fixed sizes, instead of percentages, where [the default tuning]({{< relref "/userguide/managing-domains/domain-resource#jvm-memory-and-java-option-environment-variables" >}}) is usually sufficient. | ||
|
||
{{% notice warning %}} | ||
If you set `USER_MEM_ARGS` or `NODEMGR_MEM_ARGS` in your domain resource, then it is usually recommended to include `-Djava.security.egd=file:/dev/./urandom` in order to speedup boot times on systems with low entropy. This setting is included in the respective defaults for these two environment variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it only "usually" recommended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was changed to match with other parts of Operator documentation.
|
||
- A WebLogic Server self-tuning work-manager may incorrectly optimize the number of threads it allocates for the default thread pool. | ||
|
||
It's also important to keep in mind that if you set a value of CPU core count that's larger than core count of your biggest node, then the pod will never be scheduled. Let's say you have a pod that needs 4 cores but you have a kubernetes cluster that's comprised of 2 core VMs. In this case, your pod will never be scheduled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more helpful to show an example of the error when a Pod is never scheduled (it will be stuck in "Pending").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an example showing pod in Pending state.
|
||
### Measuring JVM heap, pod CPU, and pod memory | ||
|
||
TBD Discuss/link to Grafana/Prometheus. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea... You can also link to OKE monitoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We added links to "Monitoring a SOA domain" which provides steps for setting up Prometheus and Grafana in order to monitor pod memory and CPU resources and setting up WebLogic monitoring exporter for JVM heap monitoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, added links to "Tools for Monitoring Resources" in the Kubernetes documentation. We didn't find an option to monitor pod level resources in OKE monitoring. There's a section about monitoring OKE cluster and section about node level monitoring in OCI IaaS documentation. Please let me know if we need to link a particular section in OKE documentation. Thanks.
# data storage directories are determined from the WebLogic domain home configuration. | ||
dataHome: "%DATA_HOME%" | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed extra space in commit 686e19e. Thanks
--- | ||
title: "Pod Memory and CPU Resources" | ||
date: 2020-06-30T08:55:00-05:00 | ||
draft: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Placeholder request for @rosemarymarano to please suggest description and weight that fits in with the updates you did for other FAQ's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I'll provide that input on my final review of the FAQ.
|
||
### Introduction | ||
|
||
An operator creates a pod for each WebLogic Server instance and each pod will have a container. You can tune pod container memory and/or CPU usage by configuring Kubernetes resource requests and limits, and you can tune a WebLogic JVM heap usage using the `USER_MEM_ARGS` environment variable in your domain resource. A resource request sets the minimum amount of a resource that a container requires. A resource limit is the maximum amount of resource a container is given and prevents a container from using more than its share of a resource. Additionally, resource requests and limits determine a pod's Quality of Service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"An operator" -> "The operator"
I think I would say it like this, "The operator creates a container in its own Pod for each WebLogic Server instance."
"pod container memory" -> "container memory"
"and/or" -> "and"
"maximum amount of resource" -> "maximum amount of a resource"
I don't think "Quality of Service" should be capitalized here.
Also, we've been matching the style in the Kubernetes doc (on their site) and capitalized the names of all resources when specifically referenced as a resource, so all "pod" -> "Pod" and "domain resource" -> "Domain resource".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made changes based on above review comments. Thanks.
|
||
A pod's Quality of Service (QoS) is based on whether it's configured with resource requests and limits: | ||
|
||
- **Best Effort QoS** (lowest priority): If you don't configure requests and limits for a pod, then the pod is given a `best-effort` QoS. In cases where a node runs out of non-shareable resources, a Kubernetes `kubelet` default out-of-resource eviction policy evicts running pods with the `best-effort` QoS first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"a Kubernetes kubelet
default out-of-resource" -> "the default"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the wording to "the default".
### Java heap size and memory resource considerations | ||
|
||
{{% notice note %}} | ||
For most use cases, Oracle recommends configuring Java heap sizes for WebLogic pods instead of relying on defaults. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does a use case exist where we wouldn't recommend this? If not, you can remove "For most use cases,"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed "For most use cases,"
|
||
### CPU resource considerations | ||
|
||
It's important to set both a CPU request and a limit for WebLogic Server pods. This ensures that all WebLogic server pods have enough CPU resources, and, as discussed earlier, if the request and limit are set to the same value, then they get a `guaranteed` QoS. A `guaranteed` QoS ensures the pods are handled with a higher priority during scheduling and so are the least likely to be evicted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't they guaranteed to not be evicted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible for Guaranteed Pod to be evicted in some cases.
kubernetes/samples/scripts/create-weblogic-domain/manually-create-domain/domain.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve subject to Rosemary's final pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, pre-approving anticipating Rosemary's edits.
Changes for OWLS-80384 - Verify that operator deployment and WebLogic pods have good default cpu/memory resources.
Changed WLS samples to use below defaults -
Changed JRF samples to use below defaults -
Created first draft of FAQ document for resource requests/limits and Java options for heap size considerations.