Skip to content

Investigate throughput result when targeting a single powershell worker #127

Closed
@daxian-dbw

Description

@daxian-dbw

The throughput tests targeting a single powershell worker doesn't show cheering results:

  • When targeting a PowerShell worker deployed in a dedicated Azure sandbox, we see around 30 RPS regardless the size of PowerShellManager pool.
    The average response time shown in WCAT is between 3300ms to 3500ms.
  • When targeting a PowerShell worker running locally on my dev machine, we see around 46 RPS with pool size being 5 (better than size being 1).
    The average response time shown in LOCUS keeps increasing to over 2000 ms when the velocity of the in-coming request reaches about 70/s.

Based on the data collected from both environments, the average time spent in PS worker for each invocation request is minimal -- less than 100ms for the Azure sandbox setup, and less than 50ms for the local setup.

I'm suspecting the overhead is in the Host. I have discussed with @pragnagopa and we believe it would be the best for the Host to be able to measure the average time for an invocation request to be processed by a worker (time from sending the invocation request to the worker to receiving the invocation response from the worker).


Local setup data and analysis:

#123 (comment)


Azure sandbox data and analysis:

This throughput test was done targeting PS worker in a App Service Plan sandbox:

The PS worker used in this setup includes some instrumentation code to collect data about the time spent in PS worker, including:

  • total number of request received from the gPRC channel [Request]
  • total number of invocation request processed (a task is created for the invocation request, but not necessarily scheduled) [InvRequest]
  • total number of invocation request finished processing [InvDone]
  • total number of invocation response written out to host [Response]
  • average time for retrieving an invocation request [AveGetReqTime] = _sumReadRequestTime/InvRequest
  • average time for processing an invocation [AveInvTime] = _sumInvTime/InvDone
  • average time for fetching a PSManager instance from the pool [AveFetchFromPoolTime] = _sumFetchFromPoolTime/InvRequest
  • average time for writing a invocation response to host [AveResponseTime] = _sumResponseTime/Response

I used WCAT with 100 users and 3 min duration. As you can see in the log below, the average response time is around 3500 ms

temp

However, the data collection from the PS worker indicates there isn’t much time spent in the worker. The data collected are listed blow (also attached as in pool-1-100user-3min.txt).
As you can see, the average time for all operations in PS worker is decreasing gradually. The average invocation time and average fetch-from-pool time are mostly constant (23ms and 1.5ms).

Take the highlighted fraction as an example, PS worker processes 500 invocation requests during that time, and the average time for processing each invocation request during that time is 87.998ms.
$totalTimeSpentAfter4602 = 34.7941304347826 * 4600 + 22.6766688410524 * 4599 + 1.68565217391304 * 4600 + 36.4443236189648 * 4598 ## (result: 439668ms)
$totalTimeSpentAfter5102 = 34.9847058823529 * 5100 + 22.8405569719553 * 5099 + 1.57372549019608 * 5100 + 36.6294625343272 * 5098 ## (result: 489649ms)
$aveTimeFor500Inv = ($totalTimeSpentAfter5102 - $totalTimeSpentAfter4602) / 500 ## (result: 99.962ms)

So it looks to me most of the time was spent in the host, which matches the observation I got when running throughput tests locally on my dev machine.
This is really puzzling to me, and any help to understand it would be really appreciated.

Request,InvRequest,InvDone,Response,AveGetReqTime,AveInvTime,AveFetchFromPoolTime,AveResponseTime
102,100,99,69,27.12,42.9090909090909,18.19,39.8115942028985
202,200,199,192,38.27,32.7336683417085,9.695,40.7760416666667
302,300,299,296,34.9566666666667,29.61872909699,6.95,36.5506756756757
402,400,399,397,33.945,27.4260651629073,5.2125,35.0856423173804
502,500,499,495,31.884,27.503006012024,6.652,35.4888888888889
602,600,599,597,32.095,26.7345575959933,5.59,35.4505862646566
702,700,699,698,32.1071428571429,25.931330472103,4.84428571428571,35.0716332378223
802,800,799,798,42.19625,25.3128911138924,4.23875,44.859649122807
902,900,899,898,41.9455555555556,24.9866518353726,3.76777777777778,44.3507795100223
1002,1000,999,998,40.866,24.6136136136136,3.396,43.060120240481
1102,1100,1099,1098,39.8018181818182,24.3721565059145,3.11545454545455,41.8624772313297
1202,1200,1199,1198,39.9525,24.0892410341952,2.85583333333333,41.8622704507513
1302,1300,1299,1298,39.1776923076923,23.8306389530408,2.64923076923077,40.9722650231125
1402,1400,1399,1398,38.6421428571429,23.6182987848463,2.46357142857143,40.3354792560801
1502,1500,1499,1498,38.1406666666667,23.5350233488993,2.29933333333333,39.714953271028
1602,1600,1599,1598,37.72125,23.4521575984991,2.155625,39.2340425531915
1702,1700,1699,1698,37.3723529411765,23.3778693349029,2.03470588235294,38.8186101295642
1802,1800,1799,1798,36.9511111111111,23.2051139521957,1.92166666666667,38.3286985539488
1902,1900,1899,1897,36.6357894736842,23.1337546076883,1.82526315789474,37.9615181866104
2002,2000,1999,1998,36.3505,23.0525262631316,1.734,37.6151151151151
2102,2100,2099,2093,36.1504761904762,23.2853739876131,1.8647619047619,37.6684185379838
2202,2200,2199,2198,36.2,23.2201000454752,1.80818181818182,37.5973612374886
2302,2300,2299,2298,36.0017391304348,23.1239669421488,1.7295652173913,37.3424717145344
2402,2400,2399,2385,35.96375,23.0070862859525,1.68541666666667,37.5014675052411
2502,2500,2499,2497,35.3568,23.046418567427,2.1656,37.2378854625551
2602,2600,2599,2598,35.0684615384615,23.042323970758,2.17538461538462,36.9949961508853
2702,2700,2699,2698,34.9351851851852,23.0070396443127,2.11111111111111,36.8150481838399
2802,2800,2799,2798,34.8078571428571,22.9521257591997,2.03571428571429,36.6361686919228
2902,2900,2899,2898,34.6675862068966,22.8868575370818,1.96965517241379,36.4482401656315
3002,3000,2999,2998,34.6786666666667,22.8589529843281,1.90566666666667,36.4149432955304
3102,3100,3099,3096,34.4174193548387,22.8076798967409,1.89387096774194,36.187984496124
3202,3200,3199,3198,34.7071875,22.7596123788684,1.8346875,36.3977485928705
3302,3300,3299,3298,34.5327272727273,22.6968778417702,1.78272727272727,36.1770770163736
3402,3400,3399,3390,34.4988235294118,22.7069726390115,1.74705882352941,36.1693215339233
3502,3500,3499,3498,34.4234285714286,22.6684767076308,1.75057142857143,36.0711835334477
3602,3600,3599,3598,34.2975,22.6368435676577,1.70194444444444,35.9043913285158
3702,3700,3699,3698,34.1891891891892,22.5844822925115,1.66405405405405,35.7644672796106
3802,3800,3799,3795,33.8494736842105,22.6459594630166,1.92868421052632,35.7191040843215
3902,3900,3899,3898,34.5941025641026,22.6358040523211,1.89717948717949,36.4256028732683
4002,4000,3999,3998,34.5555,22.6006501625406,1.851,36.3554277138569
4102,4100,4099,4097,34.4546341463415,22.6216150280556,1.88341463414634,36.3099829143276
4202,4200,4199,4198,34.49,22.5999047392236,1.83857142857143,36.3120533587423
4302,4300,4299,4298,34.5125581395349,22.5887415678065,1.79697674418605,36.3008375988832
4402,4400,4398,4397,34.5706818181818,22.6339245111414,1.75613636363636,36.2972481237207
4502,4500,4499,4498,34.5691111111111,22.6463658590798,1.71711111111111,36.2865718096932
4602,4600,4599,4598,34.7941304347826,22.6766688410524,1.68565217391304,36.4443236189648
4702,4700,4699,4698,35.1448936170213,22.7961268354969,1.65106382978723,36.8288633461047
4802,4800,4799,4796,35.1420833333333,22.8524692644301,1.654375,36.8302752293578
4902,4900,4899,4898,35.15,22.8591549295775,1.62877551020408,36.8380971825235
5002,5000,4999,4998,35.1224,22.8499699939988,1.5962,36.7851140456183
5102,5100,5099,5098,34.9847058823529,22.8405569719553,1.57372549019608,36.6294625343272

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions