125 lines
5.2 KiB
Diff
125 lines
5.2 KiB
Diff
|
From 4e98c0c1d36104ed426d3b198a176e1a5df814fa Mon Sep 17 00:00:00 2001
|
||
|
From: Willy Tarreau <w@1wt.eu>
|
||
|
Date: Tue, 8 Aug 2023 16:17:22 +0200
|
||
|
Subject: BUG/MINOR: h1: do not accept '#' as part of the URI component
|
||
|
|
||
|
Seth Manesse and Paul Plasil reported that the "path" sample fetch
|
||
|
function incorrectly accepts '#' as part of the path component. This
|
||
|
can in some cases lead to misrouted requests for rules that would apply
|
||
|
on the suffix:
|
||
|
|
||
|
use_backend static if { path_end .png .jpg .gif .css .js }
|
||
|
|
||
|
Note that this behavior can be selectively configured using
|
||
|
"normalize-uri fragment-encode" and "normalize-uri fragment-strip".
|
||
|
|
||
|
The problem is that while the RFC says that this '#' must never be
|
||
|
emitted, as often it doesn't suggest how servers should handle it. A
|
||
|
diminishing number of servers still do accept it and trim it silently,
|
||
|
while others are rejecting it, as indicated in the conversation below
|
||
|
with other implementers:
|
||
|
|
||
|
https://lists.w3.org/Archives/Public/ietf-http-wg/2023JulSep/0070.html
|
||
|
|
||
|
Looking at logs from publicly exposed servers, such requests appear at
|
||
|
a rate of roughly 1 per million and only come from attacks or poorly
|
||
|
written web crawlers incorrectly following links found on various pages.
|
||
|
|
||
|
Thus it looks like the best solution to this problem is to simply reject
|
||
|
such ambiguous requests by default, and include this in the list of
|
||
|
controls that can be disabled using "option accept-invalid-http-request".
|
||
|
|
||
|
We're already rejecting URIs containing any control char anyway, so we
|
||
|
should also reject '#'.
|
||
|
|
||
|
In the H1 parser for the H1_MSG_RQURI state, there is an accelerated
|
||
|
parser for bytes 0x21..0x7e that has been tightened to 0x24..0x7e (it
|
||
|
should not impact perf since 0x21..0x23 are not supposed to appear in
|
||
|
a URI anyway). This way '#' falls through the fine-grained filter and
|
||
|
we can add the special case for it also conditionned by a check on the
|
||
|
proxy's option "accept-invalid-http-request", with no overhead for the
|
||
|
vast majority of valid URIs. Here this information is available through
|
||
|
h1m->err_pos that's set to -2 when the option is here (so we don't need
|
||
|
to change the API to expose the proxy). Example with a trivial GET
|
||
|
through netcat:
|
||
|
|
||
|
[08/Aug/2023:16:16:52.651] frontend layer1 (#2): invalid request
|
||
|
backend <NONE> (#-1), server <NONE> (#-1), event #0, src 127.0.0.1:50812
|
||
|
buffer starts at 0 (including 0 out), 16361 free,
|
||
|
len 23, wraps at 16336, error at position 7
|
||
|
H1 connection flags 0x00000000, H1 stream flags 0x00000810
|
||
|
H1 msg state MSG_RQURI(4), H1 msg flags 0x00001400
|
||
|
H1 chunk len 0 bytes, H1 body len 0 bytes :
|
||
|
|
||
|
00000 GET /aa#bb HTTP/1.0\r\n
|
||
|
00021 \r\n
|
||
|
|
||
|
This should be progressively backported to all stable versions along with
|
||
|
the following patch:
|
||
|
|
||
|
REGTESTS: http-rules: add accept-invalid-http-request for normalize-uri tests
|
||
|
|
||
|
Similar fixes for h2 and h3 will come in followup patches.
|
||
|
|
||
|
Thanks to Seth Manesse and Paul Plasil for reporting this problem with
|
||
|
detailed explanations.
|
||
|
|
||
|
(cherry picked from commit 2eab6d354322932cfec2ed54de261e4347eca9a6)
|
||
|
Signed-off-by: Amaury Denoyelle <adenoyelle@haproxy.com>
|
||
|
(cherry picked from commit 9bf75c8e22a8f2537f27c557854a8803087046d0)
|
||
|
Signed-off-by: Amaury Denoyelle <adenoyelle@haproxy.com>
|
||
|
(cherry picked from commit 9facd01c9ac85fe9bcb331594b80fa08e7406552)
|
||
|
Signed-off-by: Amaury Denoyelle <adenoyelle@haproxy.com>
|
||
|
(cherry picked from commit 832b672eee54866c7a42a1d46078cc9ae0d544d9)
|
||
|
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
||
|
(cherry picked from commit e5a741f94977840c58775b38f8ed830207f7e4d0)
|
||
|
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
||
|
(cherry picked from commit 178cea76b1c9d9413afa6961b6a4576fcb5b26fa)
|
||
|
[wt: applied the same to http_parse_reqline() in http_msg.c]
|
||
|
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
||
|
(cherry picked from commit 4ad6fd9eeb3078685fffdc58f1c6d4eb97e05d98)
|
||
|
[wt: dropped the HTX part, adapted the legacy one in http_msg.c]
|
||
|
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
||
|
---
|
||
|
src/h1.c | 13 ++++++++++---
|
||
|
1 file changed, 10 insertions(+), 3 deletions(-)
|
||
|
|
||
|
diff --git a/src/h1.c b/src/h1.c
|
||
|
index d3a20c2ed..57be42f31 100644
|
||
|
--- a/src/h1.c
|
||
|
+++ b/src/h1.c
|
||
|
@@ -341,11 +341,11 @@ const char *http_parse_reqline(struct http_msg *msg,
|
||
|
defined(__ARM_ARCH_7A__)
|
||
|
/* speedup: skip bytes not between 0x21 and 0x7e inclusive */
|
||
|
while (ptr <= end - sizeof(int)) {
|
||
|
- int x = *(int *)ptr - 0x21212121;
|
||
|
+ int x = *(int *)ptr - 0x24242424;
|
||
|
if (x & 0x80808080)
|
||
|
break;
|
||
|
|
||
|
- x -= 0x5e5e5e5e;
|
||
|
+ x -= 0x5b5b5b5b;
|
||
|
if (!(x & 0x80808080))
|
||
|
break;
|
||
|
|
||
|
@@ -357,8 +357,15 @@ const char *http_parse_reqline(struct http_msg *msg,
|
||
|
goto http_msg_ood;
|
||
|
}
|
||
|
http_msg_rquri2:
|
||
|
- if (likely((unsigned char)(*ptr - 33) <= 93)) /* 33 to 126 included */
|
||
|
+ if (likely((unsigned char)(*ptr - 33) <= 93)) { /* 33 to 126 included */
|
||
|
+ if (*ptr == '#') {
|
||
|
+ if (msg->err_pos < -1) /* PR_O2_REQBUG_OK not set */
|
||
|
+ goto invalid_char;
|
||
|
+ if (msg->err_pos == -1) /* PR_O2_REQBUG_OK set: just log */
|
||
|
+ msg->err_pos = ptr - msg_start;
|
||
|
+ }
|
||
|
EAT_AND_JUMP_OR_RETURN(ptr, end, http_msg_rquri2, http_msg_ood, state, HTTP_MSG_RQURI);
|
||
|
+ }
|
||
|
|
||
|
if (likely(HTTP_IS_SPHT(*ptr))) {
|
||
|
msg->sl.rq.u_l = ptr - msg_start - msg->sl.rq.u;
|
||
|
--
|
||
|
2.35.3
|
||
|
|