Skip to content

Nginx Ingress Pods Consuming Too Much Memory #8166

Closed
@ParagPatil96

Description

@ParagPatil96

NGINX Ingress controller version

NGINX Ingress controller
Release: v1.1.1
Build: a17181e
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.19.9


Kubernetes version

version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.14-gke.1900", GitCommit:"abc4e63ae76afef74b341d2dba1892471781604f", GitTreeState:"clean", BuildDate:"2021-09-07T09:21:04Z", GoVersion:"go1.15.15b5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: GCP

  • OS : Container-Optimized OS from Google

  • Kernel : 5.4.129+

  • How was the ingress-nginx-controller installed:

    • We have used helm to install Nginx Ingress Controller following are the values which provided
    ingress-nginx:
    controller:
      priorityClassName: "high-priority"
      replicaCount: 3
      image:
        registry: gcr.io
        image: fivetran-webhooks/nginx-ingress-controller
        tag: "v1.1.1"
        digest: ""
        pullPolicy: Always
      extraArgs:
        shutdown-grace-period: 60
      labels:
        app.kubernetes.io/part-of: wps
        wps-cloud-provider: gcp
        wps-location: <us/eu/uk>
      podLabels:
        app.kubernetes.io/part-of: wps
        wps-cloud-provider: gcp
        wps-location: <us/eu/uk>
      podAnnotations:
        fivetran.com/scrape-prometheus: "true"
        prometheus.io/port: "10254"
        cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
        fivetran.com/fivetran-app: "true"
      minReadySeconds: 30
      updateStrategy:
        rollingUpdate:
          maxSurge: 1
          maxUnavailable: 0
        type: RollingUpdate
      resources:
        requests:
          cpu: 2
          memory: 2Gi
      autoscaling:
        enabled: true
        minReplicas: 3
        maxReplicas: 9
        targetCPUUtilizationPercentage: 75
        targetMemoryUtilizationPercentage: 75
      service:
        enabled: true
        loadBalancerIP: null
      admissionWebhooks:
        enabled: false
      config:
        # logging config
        log-format-upstream: '{"logtype":"request_entry","status": $status, "request_id": "$req_id", "host": "$host", "request_proto": "$server_protocol", "path": "$uri", "request_query": "$args", "request_length": $request_length,  "request_time": $request_time, "method": "$request_method", "time_local": "$time_local", "remote_addr": "$remote_addr", "remote_user": "$remote_user", "http_referer": "$http_referer", "http_user_agent": "$http_user_agent", "body_bytes_sent": "$body_bytes_sent", "bytes_sent": "$bytes_sent", "upstream_addr": "$upstream_addr", "upstream_response_length": "$upstream_response_length", "upstream_response_time": "$upstream_response_time", "upstream_status": "$upstream_status"}'
        log-format-escape-json: 'true'
    
        # request contents config
        proxy-body-size: 9m
        client-body-buffer-size: 9m
    
        # request timeout config
        client-body-timeout: '300'
        client-header-timeout: '300'
        proxy-read-timeout: '300'
        proxy-send-timeout: '300'
    
        # upstream pod retry
        proxy-next-upstream: 'error timeout http_500 http_502 http_503 http_504 non_idempotent'
        proxy-next-upstream-timeout: '60'
        proxy-next-upstream-tries: '0'
        ssl-redirect: "false"
    
        # https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#load-balance
        load-balance: ewma
        worker-processes: '3' #As we are using 3 cpu for ingress controller pods
    
        # Recovery Server
        location-snippet: |
          proxy_intercept_errors on;
          error_page 500 501 502 503 = @fivetran_recovery;
        server-snippet: |
          location @fivetran_recovery {
          proxy_pass http://{Recovery Collector's ClusterIP service's IP address};
          }
    
  • Current State of the controller:

    • kubectl describe ingressclasses
    Name:         nginx
    Labels:       app.kubernetes.io/component=controller
                  app.kubernetes.io/instance=wps-us
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=ingress-nginx
                  app.kubernetes.io/part-of=wps
                  app.kubernetes.io/version=1.1.1
                  helm.sh/chart=ingress-nginx-4.0.15
                  wps-cloud-provider=gcp
                  wps-location=us
    Annotations:  meta.helm.sh/release-name: wps-us
                  meta.helm.sh/release-namespace: ingress-nginx
    Controller:   k8s.io/ingress-nginx
    Events:       <none>
    
  • Current state of ingress object, if applicable:

    • kubectl -n <appnamespace> describe ing <ingressname>
      Name:             collector-ingress
      Labels:           app.kubernetes.io/managed-by=Helm
      Namespace:        collector
      Address:          xxxxx
      Default backend:  default-http-backend:80 (10.18.48.196:8080)
      TLS:
        webhooks-ssl-certificate terminates 
        events-ssl-certificate terminates 
      Rules:
        Host                   Path  Backends
        ----                   ----  --------
        webhooks.fivetran.com  
                               /webhooks/|/internal/|/snowplow/|/health$|/api_docs|/swagger*   collector-service:80 (10.18.48.39:8001,10.18.52.54:8001,10.18.54.56:8001)
        events.fivetran.com    
                               /*   collector-service:80 (10.18.48.39:8001,10.18.52.54:8001,10.18.54.56:8001)
      Annotations:             kubernetes.io/ingress.class: nginx
                               meta.helm.sh/release-name: wps-us
                               meta.helm.sh/release-namespace: ingress-nginx
                               nginx.ingress.kubernetes.io/use-regex: true
      Events:                  <none>
      

What happened:
Our Ingress controller is serving ~1500 RPS
But over time ingress controller memory gets continuously increase but never goes down when it crosses the node limitation ~15GB pods gets evicted.

What you expected to happen:
We expect memory to get stablizise at some point.

profiling heap export:
high_mem.txt

For now we are manually restarting worker processes inorder to realise the memory

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/stabilizationWork for increasing stabilization of the ingress-nginx codebasekind/supportCategorizes issue or PR as a support question.priority/critical-urgentHighest priority. Must be actively worked on as someone's top priority right now.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions