From 4e98c0c1d36104ed426d3b198a176e1a5df814fa Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Tue, 8 Aug 2023 16:17:22 +0200 Subject: BUG/MINOR: h1: do not accept '#' as part of the URI component Seth Manesse and Paul Plasil reported that the "path" sample fetch function incorrectly accepts '#' as part of the path component. This can in some cases lead to misrouted requests for rules that would apply on the suffix: use_backend static if { path_end .png .jpg .gif .css .js } Note that this behavior can be selectively configured using "normalize-uri fragment-encode" and "normalize-uri fragment-strip". The problem is that while the RFC says that this '#' must never be emitted, as often it doesn't suggest how servers should handle it. A diminishing number of servers still do accept it and trim it silently, while others are rejecting it, as indicated in the conversation below with other implementers: https://lists.w3.org/Archives/Public/ietf-http-wg/2023JulSep/0070.html Looking at logs from publicly exposed servers, such requests appear at a rate of roughly 1 per million and only come from attacks or poorly written web crawlers incorrectly following links found on various pages. Thus it looks like the best solution to this problem is to simply reject such ambiguous requests by default, and include this in the list of controls that can be disabled using "option accept-invalid-http-request". We're already rejecting URIs containing any control char anyway, so we should also reject '#'. In the H1 parser for the H1_MSG_RQURI state, there is an accelerated parser for bytes 0x21..0x7e that has been tightened to 0x24..0x7e (it should not impact perf since 0x21..0x23 are not supposed to appear in a URI anyway). This way '#' falls through the fine-grained filter and we can add the special case for it also conditionned by a check on the proxy's option "accept-invalid-http-request", with no overhead for the vast majority of valid URIs. Here this information is available through h1m->err_pos that's set to -2 when the option is here (so we don't need to change the API to expose the proxy). Example with a trivial GET through netcat: [08/Aug/2023:16:16:52.651] frontend layer1 (#2): invalid request backend (#-1), server (#-1), event #0, src 127.0.0.1:50812 buffer starts at 0 (including 0 out), 16361 free, len 23, wraps at 16336, error at position 7 H1 connection flags 0x00000000, H1 stream flags 0x00000810 H1 msg state MSG_RQURI(4), H1 msg flags 0x00001400 H1 chunk len 0 bytes, H1 body len 0 bytes : 00000 GET /aa#bb HTTP/1.0\r\n 00021 \r\n This should be progressively backported to all stable versions along with the following patch: REGTESTS: http-rules: add accept-invalid-http-request for normalize-uri tests Similar fixes for h2 and h3 will come in followup patches. Thanks to Seth Manesse and Paul Plasil for reporting this problem with detailed explanations. (cherry picked from commit 2eab6d354322932cfec2ed54de261e4347eca9a6) Signed-off-by: Amaury Denoyelle (cherry picked from commit 9bf75c8e22a8f2537f27c557854a8803087046d0) Signed-off-by: Amaury Denoyelle (cherry picked from commit 9facd01c9ac85fe9bcb331594b80fa08e7406552) Signed-off-by: Amaury Denoyelle (cherry picked from commit 832b672eee54866c7a42a1d46078cc9ae0d544d9) Signed-off-by: Willy Tarreau (cherry picked from commit e5a741f94977840c58775b38f8ed830207f7e4d0) Signed-off-by: Willy Tarreau (cherry picked from commit 178cea76b1c9d9413afa6961b6a4576fcb5b26fa) [wt: applied the same to http_parse_reqline() in http_msg.c] Signed-off-by: Willy Tarreau (cherry picked from commit 4ad6fd9eeb3078685fffdc58f1c6d4eb97e05d98) [wt: dropped the HTX part, adapted the legacy one in http_msg.c] Signed-off-by: Willy Tarreau --- src/h1.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/src/h1.c b/src/h1.c index d3a20c2ed..57be42f31 100644 --- a/src/h1.c +++ b/src/h1.c @@ -341,11 +341,11 @@ const char *http_parse_reqline(struct http_msg *msg, defined(__ARM_ARCH_7A__) /* speedup: skip bytes not between 0x21 and 0x7e inclusive */ while (ptr <= end - sizeof(int)) { - int x = *(int *)ptr - 0x21212121; + int x = *(int *)ptr - 0x24242424; if (x & 0x80808080) break; - x -= 0x5e5e5e5e; + x -= 0x5b5b5b5b; if (!(x & 0x80808080)) break; @@ -357,8 +357,15 @@ const char *http_parse_reqline(struct http_msg *msg, goto http_msg_ood; } http_msg_rquri2: - if (likely((unsigned char)(*ptr - 33) <= 93)) /* 33 to 126 included */ + if (likely((unsigned char)(*ptr - 33) <= 93)) { /* 33 to 126 included */ + if (*ptr == '#') { + if (msg->err_pos < -1) /* PR_O2_REQBUG_OK not set */ + goto invalid_char; + if (msg->err_pos == -1) /* PR_O2_REQBUG_OK set: just log */ + msg->err_pos = ptr - msg_start; + } EAT_AND_JUMP_OR_RETURN(ptr, end, http_msg_rquri2, http_msg_ood, state, HTTP_MSG_RQURI); + } if (likely(HTTP_IS_SPHT(*ptr))) { msg->sl.rq.u_l = ptr - msg_start - msg->sl.rq.u; -- 2.35.3