Skip to content

Avoid invoking run.ps1 and port helper module to C# to improve throughput performance #214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 22, 2019

Conversation

daxian-dbw
Copy link
Contributor

@daxian-dbw daxian-dbw commented May 8, 2019

Summary

Fix #127
Note that, the NO.1 change below is a big design change, and I would like your help to think about all scenarios that might not work with this change.
For example, @TylerLeonhardt had the great concern whether this will continue to work with local debugging. I tested it and it still works.

Major changes

  1. Deploy the Azure Function scripts (e.g. run.ps1) to Runspace as actual PowerShell functions, so as to avoid hitting the file system for every invocation when using AddCommand(<path-to-ps1-file>).Invoke().

    • In the sandbox environment, the FunctionApp files are on the user's storage account, and thus file system access is expensive. Invoking an in-memory PS function is 8x faster than invoking a .ps1 file in that environment.
    • Note that, the PS functions we deploy will be constant functions -- they cannot be modified or removed. So we don't need to worry about the function being changed on the fly.
    • #Requires only works when invoking the script file directly, so we don't do this deployment if #Requires is used in the function script and remain invoking the .ps1 file directly in that case.
    • After turning a .ps1 file to a PS function, script variables like $script:xxx used in the script will be made global variables as there is no script scope anymore. So we have to clean up global variables after every invocation.
    • Currently, we call ResetRunspaceState() to clean up variable table. Unfortunately, it does more than that -- it also reset the Debugger, create new EventManager, reset the CurrentPath. Resetting Debugger causes a problem to our local debugging experience -- breakpoints set during the first invocation debugging cannot be hit in the subsequent invocations. All we need is to clean up the global variables newly added by an invocation, so I replace ResetRunspaceState() with simply code that should have the similar functionality.
  2. Rewrite the helper module in C#, for two reasons: (1) improve performance (2) we will need to have C# cmdlets when getting to durable function implementation anyways. This change has the following impact:

    • The helper module cannot be loaded by a version of pwsh that is lower than the PowerShell NuGet packages we reference in the worker, because of the specific version of Newtonsoft.Json dependency. I talked to @joeyaiello about this, and we believe it's OK to do this, as we don't think it's useful to load this helper module outside of the worker in the first place. This change won't help us to have the auto-completion UX for the helper commands in VSCode, but it's a separate issue.
    • We cannot use Pester to test the helper module, so I move all pester tests to xUnit.
    • We cannot do the markdown help doc check anymore (cannot load the helper module unless download the right version of pwsh)

Next steps

  • Collect feedback and concerns for scenarios and do verification
    • Local debugging
      • DONE. Local debugging works as expected, and the breakpoint issue highlighted above is also resolved.
    • Special variables like $PSScriptRoot and $PSCommandPath
      • DONE. They work as expected. $PSScriptRoot points to the script file's parent directory. PSCommandPath points to the script file itself.
      • xUnit test added for this scenario.
    • Error should continue to report the correct script stack trace and line/column offset. (Thanks @JustinGrote!)
      • DONE. The error record works as expected. Script stack trace and line/column offset are the same as running the .ps1 directly.
    • More ... pending on feedback
  • Collect the throughput data with the final changes

Throughput measurements with the final code change

I measured for 20 times with a dedicated VM (1 CPU core, 1.7 GB memory).

Testing environment

I used a fresh sandbox process for every measurement with the following steps:

  1. Run net stop dwassvc to stop the service
  2. Run net start dwassvc to start the service
  3. Wait for CPU to come down
  4. Run irm <http-trigger-uri> for 3,4 times to pass the cold start.
  5. Start throughput test using WCAT Runner, with this setting: 100 Users, 1 minute Duration
    • WCAT first uses 30 seconds to warm up the service, then collects the average Response Per Second (RPS) value every 10 seconds for 1 minute

Testing FunctionApp

An HTTP function that returns a OK response for every invocation.

param($req, $TriggerMetadata)
Push-OutputBinding -Name res -Value ([HttpResponseContext]@{
  StatusCode = 200
  Body = "Hello"
})

Measurement data

Overall statistics for the 20 measurements (the last column from the detailed data set table below)

PS> 74, 70, 89, 69, 85, 83, 76, 68, 68, 70, 73, 80, 74, 69, 85, 73, 84, 68, 76, 73 | Measure-Object -AllStats

Count             : 20
Average           : 75.35
Sum               : 1507
Maximum           : 89
Minimum           : 68
StandardDeviation : 6.68285315683275

According to Dormann 2013 CV-values below 0.05 (5%) indicate very high precision of the data, values above 0.2 (20%) low precision (a rule of thumb).
The CV value of my measurement data set is 0.0887 (8.87%), so the data are relatively precise based on the rule of thumb.

Count Sum Maximum Minimum Mean Std. Deviation coefficient of variation (CV)
20 1507 89 68 75.35 6.683 0.0887 (Std. Dev. / Mean)

Detailed measurement data set

WCAT collects the average Response Per Second (RPS) value every 10 seconds for 1 minute

Id 5 sec 15 sec 25 sec 35 sec 45 sec 55 sec 60 sec
1 77 77 81 82 78 74 74
2 73 73 73 72 72 70 70
3 45 87 88 88 87 88 89
4 60 64 69 70 69 -- 69
5 69 79 83 84 85 86 85
6 72 84 87 87 86 85 83
7 79 75 81 82 79 76 76
8 67 66 67 65 69 68 68
9 68 58 62 67 69 68 68
10 68 75 73 72 71 70 70
11 56 73 71 71 72 73 73
12 78 84 85 85 81 80 80
13 73 77 81 81 76 74 74
14 54 66 69 68 69 68 69
15 77 84 85 87 87 87 85
16 72 74 73 72 72 72 73
17 75 72 78 82 84 -- 84
18 59 61 66 66 69 68 68
19 84 73 75 78 77 76 76
20 54 71 71 73 73 72 73

Measurement data WITHOUT changes in this PR

Before this PR, the average RPS is 22.3. This PR achieves over 3 time improvement (75.35 / 22.3 = 3.37)

Time Test Runtime RPS Total requests Average response time
2019/05/13 14:34:26 PS Ping 2.0.12455.0 20 (Threshold:20) 2468 4941.86
2019/05/12 14:35:25 PS Ping 2.0.12455.0 21 (Threshold:20) 2543 4792.75
2019/05/11 14:33:38 PS Ping 2.0.12455.0 23 (Threshold:20) 2878 4219.34
2019/05/10 14:33:26 PS Ping 2.0.12453.0 24 (Threshold:20) 2889 4201.02
2019/05/09 14:34:31 PS Ping 2.0.12451.0 21 (Threshold:20) 2530 4797.76
2019/05/08 14:33:00 PS Ping 2.0.12451.0 22 (Threshold:20) 2756 4405.71
2019/05/07 14:36:42 PS Ping 2.0.12451.0 23 (Threshold:20) 2819 4305.09
2019/05/06 14:33:37 PS Ping 2.0.12451.0 24 (Threshold:20) 2947 4119.65
2019/05/05 14:34:22 PS Ping 2.0.12451.0 22 (Threshold:20) 2681 4529.3
2019/05/04 14:34:27 PS Ping 2.0.12451.0 23 (Threshold:20) 2865 4238.09
2019/05/03 14:34:24 PS Ping 2.0.12449.0 22 (Threshold:20) 2758 4399.92
2019/05/02 14:33:10 PS Ping 2.0.12447.0 23 (Threshold:20) 2841 4273.55
2019/05/01 18:14:40 PS Ping 2.0.12447.0 22 (Threshold:20) 2675 4541.99

Future work on throughput

@pragnagopa has been working on analyzing how to boost workers' throughput in the gRPC layer, basically around how workers should handle incoming/outgoing gRPC messages more efficiently.

This PR focused on fixing the bottleneck in the PowerShell specific logic, and the future work should shift focus to the gRPC message processing in the PowerShell worker. Open #224 to track the future work.

Also, all the measurements and analysis I did for this PR was using the default PSWorkerInProcConcurrencyUpperBound value, which is 1, meaning that invocations are not processed concurrently. This default behavior makes sense for the consumption plan, given the limited processing power of the VM used for consumption plan. However, for the premium plan, the VM has much better processing power, and thus the throughput should be analyzed again in that environment with the concurrency support enabled.

daxian-dbw added 5 commits May 6, 2019 20:46
…re#194)

Fix no runspace available

Fix regression: create the first PowerShellManager in FunctionLoad, but delay the init

Update comments

Integrate all changes together

Clear the outputbinding hashtable for the runspace after get the results

Update tests and build scripts

Create constant functions so it's more secure

Alter the to-be-deployed PS function name to avoid conflicts with built-in functions

Use PowerShell API instead of 'InvokeScript' as the former is way faster

Bring back old code to clean global variables
@JustinGrote
Copy link

JustinGrote commented May 8, 2019 via email

@TylerLeonhardt
Copy link
Member

If we grab the latest version of PowerShell during CI, could we keep the PlatyPS check?

@daxian-dbw
Copy link
Contributor Author

How would this affect exceptions that log to application insights? Seems
like it would be harder to stack trace if the function is in-memory.

@JustinGrote, the script block used to create the function comes from the AST parsed from the script file, such as run.ps1, and thus the extent that will be used for error reporting should be the same as when you invoke the file directly. So I don't think this will affect the existing error reporting. That said, we don't handle error reporting well enough today (script stack trace is not presented to the user). We have a separate issue tracking that.

Copy link
Member

@pragnagopa pragnagopa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see PR for this issue 👍

@@ -40,9 +40,25 @@ internal AzFunctionInfo GetFunctionInfo(string functionId)
/// This method runs once per 'FunctionLoadRequest' during the code start of the worker.
/// It will always run synchronously because we process 'FunctionLoadRequest' synchronously.
/// </summary>
internal void LoadFunction(FunctionLoadRequest request)
internal static void LoadFunction(FunctionLoadRequest request)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not use static. In future, functions host will move to a model where language worker process does not restart with file changes, FunctionLoader needs to handle lifetime of loaded functions - unload when host sends FileChangeEventRequest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will be a non-trivial refactoring. Can you please open an issue describing what worker is supposed to do when receiving FileChangeEventRequest? For example, if there are on-going invocations, should the worker stop them or wait for them to finish before taking action on the FileChangeEventRequest? We need more information about this scenario to decide the appropriate refactoring work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the requirement is simply reset the worker, then we already have FunctionLoader.ClearLoadedFunctions to clean up the loaded functions.

@daxian-dbw
Copy link
Contributor Author

If we grab the latest version of PowerShell during CI, could we keep the PlatyPS check?

@TylerLeonhardt Yes we can do that. There is a gotcha though -- Update-MarkdownHelp doesn't recognize the [SupportWildcard] attribute for C# cmdlets, see my commit 4d44920
So we will have to fix it in PlatyPS before enabling it in CI.

@TylerLeonhardt
Copy link
Member

Interesting! Is there an issue on PlatyPS about this? If not, can you create a small one?

@daxian-dbw
Copy link
Contributor Author

@TylerLeonhardt It seems there is one: PowerShell/platyPS#202
But I wonder why it works for scripts.

Copy link
Member

@TylerLeonhardt TylerLeonhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just the one comment on tests

@daxian-dbw
Copy link
Contributor Author

I have updated the PR description with the detailed measurement data collected with the current changes in this PR.
/cc @pragnagopa @TylerLeonhardt @Francisco-Gamino @SatishRanjan

Copy link
Contributor

@Francisco-Gamino Francisco-Gamino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making this code change. Overall looks great. A couple of minor comments.

@Francisco-Gamino
Copy link
Contributor

LGTM!

@@ -38,7 +37,6 @@ internal RequestProcessor(MessagingStream msgStream)
{
Copy link
Contributor

@SatishRanjan SatishRanjan May 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the ProcessRequestLoop we're still awaiting for the rpc messages to be complete, and in this case the host doesn't send a new message to the client until it receives the response, so, for example if a function1 invocation request arrives while dependency download is going on then on azure portal user will see the message saying dependency download is in-progress, but if another function2 invocation request is received then host won't send the second invocation request to PowerShell language worker, and user won't see managed dependency download message on the azure portal, is it possible for PowerShell worker to receive and process invocation request concurrently?

internal async Task ProcessRequestLoop()

    internal async Task ProcessRequestLoop()

    {

    {

        StreamingMessage request, response;

        StreamingMessage request, response;

        while (await _msgStream.MoveNext())

Copy link
Contributor Author

@daxian-dbw daxian-dbw May 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, with the current implementation, the second invocation request will be blocked until the dependency download finishes, even if the user set PSWorkerInProcConcurrencyUpperBound greater than 1.
It can be addressed by not blocking the main thread but creating a task to call ProcessInvocationRequest for each InvocationRequest message. This was actually one of my early designs back when I worked on the in-proc concurrency, but that design resulted in heavy lock contention and I didn't choose it in the end. See the Design Change section in the PR description of #123. Quoted as follows:

The original design is to not block the main thread -- it reads a request message, start a task for processing and forget about it, then it reads the next message.
With this design, every in-coming request will be handled immediately -- a task will be created for it. So, a large amount of tasks will be created rapidly, and thus a large amount of thread-pool threads will be assigned to run those tasks. All those threads will compete for an idle PowerShellManager from the pool, which results in lock contention in the pool. The average time for fetching an idle PowerShellManager instance gets worse dramatically when velocity of in-coming requests reaches about 70/s -- it reaches 994 ms in my measurements, and becomes a huge bottleneck.

I guess we can revisit this design and consider the Rx.net, but the competition on acquiring a PowerShellManager instance from the pool will continue to be the source of lock contention.

Or, think it out of the box, what if we can change the design of dependency management? Separate it from the worker and make it a service or something and can be reused by all languages (e.g. the retry logic shouldn't be implemented once per worker, right?), and make the portal more aware of the dependency management component. That would be the more long term solution to all the issues to the dependency management feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely something worth more discussion, but I think it's out scope for this PR.
@SatishRanjan can you please create a more general issue for the challenges of the current dependency management design? @TylerLeonhardt and I talked about this a couple times, and we both think separating the dependency management logic from the worker would be the long term solution and we should start moving toward that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daxian-dbw - currently since there's await _msgStream.MoveNext() to move on next message from host to worker hence host does not even send invocation request to the powershell worker runtime, and I was thinking if we can remove wait on _msgStream.MoveNext() and concurrently process the invocation request, so if we do this way then the second invocation request will get dependency download in progress message as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SatishRanjan I don't think we should discuss the design/fix for another issue within this PR review. As I mentioned above, I suggest you to open a new issue and discussion can happen there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daxian-dbw => I was expecting adding capability to process multiple invocation request concurrently was the scope of this throughput improvement change. But yes, if we want to take it separately then that should be fine as well.

Copy link
Contributor Author

@daxian-dbw daxian-dbw May 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SatishRanjan I think you misunderstood the issue your brought up here.

  • First, the capability to process multiple invocations concurrently has been added to the PowerShell worker for quite a while (in Jan.), see Add in-process concurrency support to the PS worker #123. Processing multiple invocations concurrently works fine after the dependency downloading finishes. The concurrency level is controlled by the env variable PSWorkerInProcConcurrencyUpperBound which the user can set in the application settings.

  • Second, what you brought up here is a very specific scenario -- multiple invocation triggered while the dependency downloading is in progress. By definition, no invocation can actually be served when the downloading is still in progress, so what you really intend to address here is to make all invocations receive the "downloading in progress" logging message.

  • Third, also the most important one, the throughput should always be measured when the application is running in steady state, which means the cold start, where the downloading happens, is not in consideration when talking about throughput. Please take a look at the Testing environment section in my PR description, which I made it very clear that each measure is done after warming up the host/worker (WCAT will do another 30 seconds warming up).

All the measurements and analysis I did for this PR was using the default PSWorkerInProcConcurrencyUpperBound value, which is 1, meaning that invocations are not processed concurrently. This default behavior makes sense for the consumption plan, given the limited processing power of the VM used for consumption plan. However, for the premium plan, the VM has much better processing power, and thus the throughput should be analyzed again in that environment with the concurrency support enabled. This is tracked in #224

@pragnagopa
Copy link
Member

fyi @alrod - Once these changes are checked in, our perf test should reflect the improvements as well.

@SatishRanjan
Copy link
Contributor

I tested managed dependency download and this feature is working as expected with this change.

/// <summary>
/// Create the PowerShell function that is equivalent to the 'scriptFile' when possible.
/// </summary>
private void DeployAzFunctionToRunspace()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content of the $MyInvocation variable seems to be quite different before and after this change. Even though $PSScriptRoot and $PSCommandPath are identical, $MyInvocation.MyCommand.Path used to contain the path to the script, and now this field is absent. There are many other changes as well - I can provide more details if you need them.

This is a potentially breaking change for existing customers. If you decide to proceed with this optimization by default anyway, consider documenting this and/or providing an opt-out mechanism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, documentation needs to be updated. The issue #225 was opened to track the documentation updates.

The change in $MyInvocation.MyCommand is by design, it's was pointing to a ExternalScriptInfo object previously because we were calling run.ps1 directly, and now it's pointing to a FunctionInfo object, which doesn't have the Path property,

It's a breaking change that should definitely be called out in the documentation. But I don't think we need to provide an opt-out mechanism, partially because we are still in public preview and partially because it's something easy to change for the user.

@daxian-dbw
Copy link
Contributor Author

@SatishRanjan @AnatoliB I have addressed your comments. Please let me know if you have additional concerns. Otherwise, I will merge this PR. Thanks!

@AnatoliB
Copy link
Contributor

@daxian-dbw I'm ok.

@daxian-dbw daxian-dbw merged commit 46bacf7 into Azure:dev May 22, 2019
@daxian-dbw daxian-dbw deleted the thoughtput-final branch May 22, 2019 01:44
@daxian-dbw
Copy link
Contributor Author

Thanks everyone for your review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate throughput result when targeting a single powershell worker
7 participants