-
Notifications
You must be signed in to change notification settings - Fork 53
Avoid invoking run.ps1
and port helper module to C# to improve throughput performance
#214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…re#194) Fix no runspace available Fix regression: create the first PowerShellManager in FunctionLoad, but delay the init Update comments Integrate all changes together Clear the outputbinding hashtable for the runspace after get the results Update tests and build scripts Create constant functions so it's more secure Alter the to-be-deployed PS function name to avoid conflicts with built-in functions Use PowerShell API instead of 'InvokeScript' as the former is way faster Bring back old code to clean global variables
How would this affect exceptions that log to application insights? Seems
like it would be harder to stack trace if the function is in-memory.
…On Wed, May 8, 2019, 1:54 PM Dongbo Wang ***@***.***> wrote:
Summary
This PR is almost ready. I still need to collect throughput metrics with
the final change, but I think it's time to submit the PR for feedback.
Note that, the NO.1 change below is a big design change, and I would like
your help to think about all scenarios that might not work with this change.
For example, @TylerLeonhardt <https://github.com/TylerLeonhardt> had the
great concern whether this will continue to work with local debugging. I
tested it and it still works.
Major changes
1.
Deploy the Azure Function scripts (e.g. run.ps1) to Runspace as actual
PowerShell functions, so as to avoid hitting the file system for every
invocation when using AddCommand(<path-to-ps1-file>).Invoke().
- In the sandbox environment, the FunctionApp files are on the user's
storage account, and thus file system access is expensive. Invoking an
in-memory PS function is 8x faster than invoking a .ps1 file in
that environment.
- #Requires only works when invoking the script file directly, so
we don't do this deployment if #Requires is used in the function
script and remain invoking the .ps1 file directly in that case.
- After turning a .ps1 file to a PS function, script variables like
$script:xxx used in the script will be made global variables as
there is no script scope anymore. So we have to clean up global variables
after every invocation.
- Currently, we call ResetRunspaceState() to clean up variable
table. Unfortunately, it does more than that -- it also reset the
Debugger, create new EventManager, reset the CurrentPath. Resetting
Debugger causes a problem to our local debugging experience --
breakpoints set during the first invocation debugging cannot be hit in the
subsequent invocations. All we need is to clean up the global variables
newly added by an invocation, so I replace ResetRunspaceState()
with simply code that should have the similar functionality.
2.
Rewrite the helper module in C#, for two reasons: (1) improve
performance (2) we will need to have C# cmdlets when getting to durable
function implementation anyways. This change has the following impact:
- The helper module cannot be loaded by a version of pwsh that is
lower than the PowerShell NuGet packages we reference in the worker,
because of the specific version of Newtonsoft.Json dependency. I
talked to @joeyaiello <https://github.com/joeyaiello> about this,
and we believe it's OK to do this, as we don't think it's useful to load
this helper module outside of the worker in the first place. This change
won't help us to have the auto-completion UX for the helper commands in
VSCode, but it's a separate issue.
- We cannot use Pester to test the helper module, so I move all
pester tests to xUnit.
- We cannot do the markdown help doc check anymore (cannot load the
helper module unless download the right version of pwsh)
Next steps
- Collect feedback and concerns for scenarios that I didn't think of,
and do verification
- Local debugging
- More ... pending on feedback
- Collect the throughput data with the final changes
------------------------------
You can view, comment on, or merge this pull request online at:
#214
Commit Summary
- Refactor and clean up based on the dependency management changes
(#194)
- Minor fixes
- Update global variable cleanup
- Add some fixes
- minor perf improvement
File Changes
- *M* build.ps1
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-0>
(34)
- *M* docs/cmdlets/Get-OutputBinding.md
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-1>
(8)
- *M* docs/cmdlets/Push-OutputBinding.md
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-2>
(36)
- *M* docs/cmdlets/Trace-PipelineObject.md
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-3>
(6)
- *M* src/DependencyManagement/DependencyManager.cs
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-4>
(4)
- *M* src/FunctionInfo.cs
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-5>
(25)
- *M* src/FunctionLoader.cs
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-6>
(28)
- *M* src/Microsoft.Azure.Functions.PowerShellWorker.csproj
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-7>
(1)
- *M*
src/Modules/Microsoft.Azure.Functions.PowerShellWorker/Microsoft.Azure.Functions.PowerShellWorker.psd1
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-8>
(10)
- *D*
src/Modules/Microsoft.Azure.Functions.PowerShellWorker/Microsoft.Azure.Functions.PowerShellWorker.psm1
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-9>
(391)
- *D*
src/Modules/Microsoft.Azure.Functions.PowerShellWorker/PowerShellWorker.Resource.psd1
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-10>
(13)
- *M* src/PowerShell/PowerShellExtensions.cs
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-11>
(28)
- *M* src/PowerShell/PowerShellManager.cs
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-12>
(87)
- *A* src/Public/Commands/GetOutputBindingCommand.cs
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-13>
(92)
- *A* src/Public/Commands/PushOutputBindingCommand.cs
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-14>
(280)
- *A* src/Public/Commands/TracePipelineObjectCommand.cs
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-15>
(70)
- *M* src/Public/FunctionMetadata.cs
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-16>
(14)
- *M* src/RequestProcessor.cs
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-17>
(7)
- *M* src/Utility/Utils.cs
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-18>
(79)
- *M* src/resources/PowerShellWorkerStrings.resx
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-19>
(12)
- *M* test/Unit/Function/FunctionLoaderTests.cs
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-20>
(77)
- *A* test/Unit/Function/TestScripts/FuncWithRequires.ps1
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-21>
(10)
- *A* test/Unit/Modules/HelperModuleTests.cs
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-22>
(309)
- *D*
test/Unit/Modules/Microsoft.Azure.Functions.PowerShellWorker.Tests.ps1
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-23>
(314)
- *M* test/Unit/PowerShell/PowerShellManagerTests.cs
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-24>
(272)
- *M* test/Unit/PowerShell/TestScripts/testBasicFunction.ps1
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-25>
(4)
- *A*
test/Unit/PowerShell/TestScripts/testBasicFunctionWithRequires.ps1
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-26>
(14)
- *M*
test/Unit/PowerShell/TestScripts/testBasicFunctionWithTriggerMetadata.ps1
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-27>
(4)
- *M* test/Unit/PowerShell/TestScripts/testFunctionCleanup.ps1
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-28>
(2)
- *M* test/Unit/PowerShell/TestScripts/testFunctionWithEntryPoint.psm1
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-29>
(5)
- *M* tools/helper.psm1
<https://github.com/Azure/azure-functions-powershell-worker/pull/214/files#diff-30>
(16)
Patch Links:
-
https://github.com/Azure/azure-functions-powershell-worker/pull/214.patch
-
https://github.com/Azure/azure-functions-powershell-worker/pull/214.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#214>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADUNKUSEETRNXXRCPCPBUZ3PUMHVVANCNFSM4HLT3XCQ>
.
|
If we grab the latest version of PowerShell during CI, could we keep the PlatyPS check? |
test/Unit/PowerShell/TestScripts/testBasicFunctionWithTriggerMetadata.ps1
Show resolved
Hide resolved
@JustinGrote, the script block used to create the function comes from the AST parsed from the script file, such as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see PR for this issue 👍
@@ -40,9 +40,25 @@ internal AzFunctionInfo GetFunctionInfo(string functionId) | |||
/// This method runs once per 'FunctionLoadRequest' during the code start of the worker. | |||
/// It will always run synchronously because we process 'FunctionLoadRequest' synchronously. | |||
/// </summary> | |||
internal void LoadFunction(FunctionLoadRequest request) | |||
internal static void LoadFunction(FunctionLoadRequest request) | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not use static. In future, functions host will move to a model where language worker process does not restart with file changes, FunctionLoader needs to handle lifetime of loaded functions - unload when host sends FileChangeEventRequest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will be a non-trivial refactoring. Can you please open an issue describing what worker is supposed to do when receiving FileChangeEventRequest
? For example, if there are on-going invocations, should the worker stop them or wait for them to finish before taking action on the FileChangeEventRequest
? We need more information about this scenario to decide the appropriate refactoring work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the requirement is simply reset the worker, then we already have FunctionLoader.ClearLoadedFunctions
to clean up the loaded functions.
@TylerLeonhardt Yes we can do that. There is a gotcha though -- |
Interesting! Is there an issue on PlatyPS about this? If not, can you create a small one? |
@TylerLeonhardt It seems there is one: PowerShell/platyPS#202 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM just the one comment on tests
I have updated the PR description with the detailed measurement data collected with the current changes in this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making this code change. Overall looks great. A couple of minor comments.
...s/Microsoft.Azure.Functions.PowerShellWorker/Microsoft.Azure.Functions.PowerShellWorker.psd1
Show resolved
Hide resolved
LGTM! |
@@ -38,7 +37,6 @@ internal RequestProcessor(MessagingStream msgStream) | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the ProcessRequestLoop we're still awaiting for the rpc messages to be complete, and in this case the host doesn't send a new message to the client until it receives the response, so, for example if a function1 invocation request arrives while dependency download is going on then on azure portal user will see the message saying dependency download is in-progress, but if another function2 invocation request is received then host won't send the second invocation request to PowerShell language worker, and user won't see managed dependency download message on the azure portal, is it possible for PowerShell worker to receive and process invocation request concurrently?
internal async Task ProcessRequestLoop()
internal async Task ProcessRequestLoop()
{
{
StreamingMessage request, response;
StreamingMessage request, response;
while (await _msgStream.MoveNext())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, with the current implementation, the second invocation request will be blocked until the dependency download finishes, even if the user set PSWorkerInProcConcurrencyUpperBound
greater than 1.
It can be addressed by not blocking the main thread but creating a task to call ProcessInvocationRequest
for each InvocationRequest
message. This was actually one of my early designs back when I worked on the in-proc concurrency, but that design resulted in heavy lock contention and I didn't choose it in the end. See the Design Change section in the PR description of #123. Quoted as follows:
The original design is to not block the main thread -- it reads a request message, start a task for processing and forget about it, then it reads the next message.
With this design, every in-coming request will be handled immediately -- a task will be created for it. So, a large amount of tasks will be created rapidly, and thus a large amount of thread-pool threads will be assigned to run those tasks. All those threads will compete for an idle PowerShellManager from the pool, which results in lock contention in the pool. The average time for fetching an idle PowerShellManager instance gets worse dramatically when velocity of in-coming requests reaches about 70/s -- it reaches 994 ms in my measurements, and becomes a huge bottleneck.
I guess we can revisit this design and consider the Rx.net
, but the competition on acquiring a PowerShellManager
instance from the pool will continue to be the source of lock contention.
Or, think it out of the box, what if we can change the design of dependency management? Separate it from the worker and make it a service or something and can be reused by all languages (e.g. the retry logic shouldn't be implemented once per worker, right?), and make the portal more aware of the dependency management component. That would be the more long term solution to all the issues to the dependency management feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely something worth more discussion, but I think it's out scope for this PR.
@SatishRanjan can you please create a more general issue for the challenges of the current dependency management design? @TylerLeonhardt and I talked about this a couple times, and we both think separating the dependency management logic from the worker would be the long term solution and we should start moving toward that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daxian-dbw - currently since there's await _msgStream.MoveNext() to move on next message from host to worker hence host does not even send invocation request to the powershell worker runtime, and I was thinking if we can remove wait on _msgStream.MoveNext() and concurrently process the invocation request, so if we do this way then the second invocation request will get dependency download in progress message as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SatishRanjan I don't think we should discuss the design/fix for another issue within this PR review. As I mentioned above, I suggest you to open a new issue and discussion can happen there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daxian-dbw => I was expecting adding capability to process multiple invocation request concurrently was the scope of this throughput improvement change. But yes, if we want to take it separately then that should be fine as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SatishRanjan I think you misunderstood the issue your brought up here.
-
First, the capability to process multiple invocations concurrently has been added to the PowerShell worker for quite a while (in Jan.), see Add in-process concurrency support to the PS worker #123. Processing multiple invocations concurrently works fine after the dependency downloading finishes. The concurrency level is controlled by the env variable
PSWorkerInProcConcurrencyUpperBound
which the user can set in the application settings. -
Second, what you brought up here is a very specific scenario -- multiple invocation triggered while the dependency downloading is in progress. By definition, no invocation can actually be served when the downloading is still in progress, so what you really intend to address here is to make all invocations receive the "downloading in progress" logging message.
-
Third, also the most important one, the throughput should always be measured when the application is running in steady state, which means the cold start, where the downloading happens, is not in consideration when talking about throughput. Please take a look at the
Testing environment
section in my PR description, which I made it very clear that each measure is done after warming up the host/worker (WCAT
will do another 30 seconds warming up).
All the measurements and analysis I did for this PR was using the default PSWorkerInProcConcurrencyUpperBound
value, which is 1, meaning that invocations are not processed concurrently. This default behavior makes sense for the consumption plan, given the limited processing power of the VM used for consumption plan. However, for the premium plan, the VM has much better processing power, and thus the throughput should be analyzed again in that environment with the concurrency support enabled. This is tracked in #224
fyi @alrod - Once these changes are checked in, our perf test should reflect the improvements as well. |
I tested managed dependency download and this feature is working as expected with this change. |
/// <summary> | ||
/// Create the PowerShell function that is equivalent to the 'scriptFile' when possible. | ||
/// </summary> | ||
private void DeployAzFunctionToRunspace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The content of the $MyInvocation
variable seems to be quite different before and after this change. Even though $PSScriptRoot
and $PSCommandPath
are identical, $MyInvocation.MyCommand.Path
used to contain the path to the script, and now this field is absent. There are many other changes as well - I can provide more details if you need them.
This is a potentially breaking change for existing customers. If you decide to proceed with this optimization by default anyway, consider documenting this and/or providing an opt-out mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, documentation needs to be updated. The issue #225 was opened to track the documentation updates.
The change in $MyInvocation.MyCommand
is by design, it's was pointing to a ExternalScriptInfo
object previously because we were calling run.ps1
directly, and now it's pointing to a FunctionInfo
object, which doesn't have the Path
property,
It's a breaking change that should definitely be called out in the documentation. But I don't think we need to provide an opt-out mechanism, partially because we are still in public preview and partially because it's something easy to change for the user.
@SatishRanjan @AnatoliB I have addressed your comments. Please let me know if you have additional concerns. Otherwise, I will merge this PR. Thanks! |
@daxian-dbw I'm ok. |
Thanks everyone for your review! |
Summary
Fix #127
Note that, the NO.1 change below is a big design change, and I would like your help to think about all scenarios that might not work with this change.
For example, @TylerLeonhardt had the great concern whether this will continue to work with local debugging. I tested it and it still works.
Major changes
Deploy the Azure Function scripts (e.g.
run.ps1
) toRunspace
as actual PowerShell functions, so as to avoid hitting the file system for every invocation when usingAddCommand(<path-to-ps1-file>).Invoke()
..ps1
file in that environment.#Requires
only works when invoking the script file directly, so we don't do this deployment if#Requires
is used in the function script and remain invoking the.ps1
file directly in that case..ps1
file to a PS function, script variables like$script:xxx
used in the script will be made global variables as there is no script scope anymore. So we have to clean up global variables after every invocation.ResetRunspaceState()
to clean up variable table. Unfortunately, it does more than that -- it also reset theDebugger
, create newEventManager
, reset theCurrentPath
. ResettingDebugger
causes a problem to our local debugging experience -- breakpoints set during the first invocation debugging cannot be hit in the subsequent invocations. All we need is to clean up the global variables newly added by an invocation, so I replaceResetRunspaceState()
with simply code that should have the similar functionality.Rewrite the helper module in C#, for two reasons: (1) improve performance (2) we will need to have C# cmdlets when getting to durable function implementation anyways. This change has the following impact:
Newtonsoft.Json
dependency. I talked to @joeyaiello about this, and we believe it's OK to do this, as we don't think it's useful to load this helper module outside of the worker in the first place. This change won't help us to have the auto-completion UX for the helper commands in VSCode, but it's a separate issue.Pester
to test the helper module, so I move all pester tests to xUnit.pwsh
)Next steps
$PSScriptRoot
and$PSCommandPath
$PSScriptRoot
points to the script file's parent directory.PSCommandPath
points to the script file itself..ps1
directly.Throughput measurements with the final code change
I measured for 20 times with a dedicated VM (1 CPU core, 1.7 GB memory).
Testing environment
I used a fresh sandbox process for every measurement with the following steps:
net stop dwassvc
to stop the servicenet start dwassvc
to start the serviceirm <http-trigger-uri>
for 3,4 times to pass the cold start.WCAT Runner
, with this setting: 100 Users, 1 minute DurationWCAT
first uses 30 seconds to warm up the service, then collects the average Response Per Second (RPS) value every 10 seconds for 1 minuteTesting FunctionApp
An HTTP function that returns a OK response for every invocation.
Measurement data
Overall statistics for the 20 measurements (the last column from the detailed data set table below)
According to Dormann 2013 CV-values below 0.05 (5%) indicate very high precision of the data, values above 0.2 (20%) low precision (a rule of thumb).
The CV value of my measurement data set is 0.0887 (8.87%), so the data are relatively precise based on the rule of thumb.
Detailed measurement data set
WCAT
collects the average Response Per Second (RPS) value every 10 seconds for 1 minuteMeasurement data WITHOUT changes in this PR
Before this PR, the average RPS is 22.3. This PR achieves over 3 time improvement (75.35 / 22.3 = 3.37)
Future work on throughput
@pragnagopa has been working on analyzing how to boost workers' throughput in the gRPC layer, basically around how workers should handle incoming/outgoing gRPC messages more efficiently.
This PR focused on fixing the bottleneck in the PowerShell specific logic, and the future work should shift focus to the gRPC message processing in the PowerShell worker. Open #224 to track the future work.
Also, all the measurements and analysis I did for this PR was using the default
PSWorkerInProcConcurrencyUpperBound
value, which is 1, meaning that invocations are not processed concurrently. This default behavior makes sense for the consumption plan, given the limited processing power of the VM used for consumption plan. However, for the premium plan, the VM has much better processing power, and thus the throughput should be analyzed again in that environment with the concurrency support enabled.