Skip to content

ci-kubernetes-e2e-gci-gce-scalability-watch-list-off has unexpectedly high LIST latency #2287

Open
@mborsz

Description

@mborsz

I was reviewing performance of kube-apiserver in watchlist-off tests and found that the LIST latency in tests https://k8s-testgrid.appspot.com/sig-scalability-experiments#watchlist-off is around 40s:

I0609 08:33:55.589834      11 trace.go:236] Trace[2051566814]: "SerializeObject" audit-id:8a686097-a6c2-4869-9ff1-27f5b7a9dce5,method:GET,url:/api/v1/namespaces/watch-list-1/secrets,protocol:HTTP/2.0,mediaType:application/vnd.kubernetes.protobuf,encoder:{"encodeGV":"v1","encoder":"protobuf","name":"versioning"} (09-Jun-2023 08:33:11.330) (total time: 44259ms):
Trace[2051566814]: ---"Write call succeeded" writer:*gzip.Writer,size:304090734,firstWrite:true 43754ms (08:33:55.589)
Trace[2051566814]: [44.259564981s] [44.259564981s] END
I0609 08:33:55.589912      11 trace.go:236] Trace[1886779945]: "List" accept:application/vnd.kubernetes.protobuf,application/json,audit-id:8a686097-a6c2-4869-9ff1-27f5b7a9dce5,client:35.226.210.156,protocol:HTTP/2.0,resource:secrets,scope:namespace,url:/api/v1/namespaces/watch-list-1/secrets,user-agent:watch-list/v0.0.0 (linux/amd64) kubernetes/$Format,verb:LIST (09-Jun-2023 08:33:11.328) (total time: 44260ms):
Trace[1886779945]: ---"Writing http response done" count:400 44259ms (08:33:55.589)
Trace[1886779945]: [44.260928281s] [44.260928281s] END
I0609 08:33:55.590150      11 httplog.go:132] "HTTP" verb="LIST" URI="/api/v1/namespaces/watch-list-1/secrets?limit=500&resourceVersion=0" latency="44.263133267s" userAgent="watch-list/v0.0.0 (linux/amd64) kubernetes/$Format" audit-ID="8a686097-a6c2-4869-9ff1-27f5b7a9dce5" srcIP="35.226.210.156:45202" apf_pl="workload-low" apf_fs="service-accounts" apf_iseats=1 apf_fseats=0 apf_additionalLatency="0s" apf_execution_time="44.262028085s" resp=200

This is quite high for ~290M of data. From experience we usually observe 20-30MiB/s throughput on write due to compression, which should translate to more like ~10-15s latency.

I suspect we are lacking CPU or egress either on master or node side.

I'm not sure how important is it, but I'm afraid it may affect some benchmarks we are running.

/assign @p0lyn0mial

/cc @serathius

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions