Description
Environment details:
- nginx/1.13.7
- libmodsecurity/3.0.0
- ModSecurity-nginx/1.0.0
- OWASP CRS v3.0.2 (the entire ruleset is included)
- simple app for receiving POST requests (https://github.com/defanator/modsecurity-performance/tree/master/states/files/upload-app)
Payload: ~15kb JSON output from http://demo.nginx.com/status (publicly available demo server).
Request:
# curl -vi -H "Content-Type: application/json" --data-binary @status.json http://localhost/modsec-full/upload/
Results: nginx worker consumes 100% CPU, the above request takes a while to process (I wasn't able to get any response within ~5 minutes, then cancelled curl).
I have gathered a profile from running worker with perf record
and it showed that the worker process was spinning in a loop of ngx_http_modsec_pcre_malloc()
and ngx_pfree()
(called from ngx_http_modsec_pcre_free()
):
Samples: 5K of event 'cycles', Event count (approx.): 126680130211
Children Self Command Shared Object Symbol
+ 81.47% 81.24% nginx nginx [.] ngx_pfree
+ 63.25% 0.02% nginx ngx_http_modsecurity_module.so [.] ngx_http_modsec_pcre_malloc
+ 63.23% 0.00% nginx [unknown] [.] 0x480d74ff85480020
+ 2.61% 0.00% nginx [unknown] [.] 0000000000000000
+ 2.51% 2.47% nginx libpcre.so.3.13.3 [.] compile_regex
+ 2.21% 2.21% nginx libpcre.so.3.13.3 [.] emit_x86_instruction
1.29% 1.29% nginx libpcre.so.3.13.3 [.] sljit_generate_code
1.23% 1.23% nginx libc-2.24.so [.] __memmove_sse2_unaligned_erms
Corresponding source code entries:
https://github.com/SpiderLabs/ModSecurity-nginx/blob/master/src/ngx_http_modsecurity_pre_access.c#L198-L200
I disabled PCRE malloc workaround in this particular place with the following patch:
diff --git a/src/ngx_http_modsecurity_pre_access.c b/src/ngx_http_modsecurity_pre_access.c
index 70f9feb..48c424e 100644
--- a/src/ngx_http_modsecurity_pre_access.c
+++ b/src/ngx_http_modsecurity_pre_access.c
@@ -45,7 +45,7 @@ ngx_http_modsecurity_pre_access_handler(ngx_http_request_t *r)
#if 1
ngx_http_modsecurity_ctx_t *ctx = NULL;
ngx_http_modsecurity_conf_t *cf;
- ngx_pool_t *old_pool;
+ //ngx_pool_t *old_pool;
dd("catching a new _preaccess_ phase handler");
@@ -195,9 +195,9 @@ ngx_http_modsecurity_pre_access_handler(ngx_http_request_t *r)
/* XXX: once more -- is body can be modified ? content-length need to be adjusted ? */
- old_pool = ngx_http_modsecurity_pcre_malloc_init(r->pool);
+// old_pool = ngx_http_modsecurity_pcre_malloc_init(r->pool);
msc_process_request_body(ctx->modsec_transaction);
- ngx_http_modsecurity_pcre_malloc_done(old_pool);
+// ngx_http_modsecurity_pcre_malloc_done(old_pool);
ret = ngx_http_modsecurity_process_intervention(ctx->modsec_transaction, r);
if (ret > 0) {
This improved overall picture a lot:
# time -p curl -vi -H "Content-Type: application/json" --data-binary @status.json http://localhost/modsec-full/upload/
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 80 (#0)
> POST /modsec-full/upload/ HTTP/1.1
> Host: localhost
> User-Agent: curl/7.52.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 14774
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Server: nginx/1.13.7
Server: nginx/1.13.7
< Date: Mon, 25 Dec 2017 13:35:15 GMT
Date: Mon, 25 Dec 2017 13:35:15 GMT
< Content-Type: text/plain
Content-Type: text/plain
< Content-Length: 39
Content-Length: 39
< Connection: keep-alive
Connection: keep-alive
<
14774:0a75695f2134f7126825099de1943ab9
* Curl_http_done: called premature == 0
* Connection #0 to host localhost left intact
real 3.83
user 0.00
sys 0.00
Perf profile now contains the real reason of slowness:
Samples: 3K of event 'cycles', Event count (approx.): 68497505579
Children Self Command Shared Object Symbol
+ 44.04% 43.88% nginx libpcre.so.3.13.3 [.] compile_regex
+ 8.40% 0.00% nginx [unknown] [.] 0000000000000000
+ 6.86% 6.82% nginx libc-2.24.so [.] __memmove_sse2_unaligned_erms
+ 4.84% 4.58% nginx libc-2.24.so [.] _int_malloc
+ 4.52% 4.52% nginx libpcre.so.3.13.3 [.] check_escape
+ 3.97% 3.94% nginx libpcre.so.3.13.3 [.] add_to_class
I also prepared a couple of flamegraphs for both scenarios:
http://pp.nginx.com/defan/preaccess-palloc.svg
http://pp.nginx.com/defan/preaccess-malloc.svg
I'm not sure that this should be addressed in connector though. My understanding is that pcre_compile()
/ pcre_study()
may be called just once for every regex from CRS, leaving pcre_exec()
in runtime, while currently all of those are being called in runtime for every rule/comparison.
@zimmerle @victorhora appreciate your feedback here - TIA!