Fix a buffer overrun in deprecated utf8_to_uvchr

This commit is contained in:
Petr Písař 2018-09-05 12:18:51 +02:00
parent a2d9fa158f
commit e7f6de4785
4 changed files with 135 additions and 0 deletions

View File

@ -0,0 +1,29 @@
From 80ebe57f7bd7f07d3ad1ff9604b2580b98579582 Mon Sep 17 00:00:00 2001
From: Steve Hay <steve.m.hay@googlemail.com>
Date: Thu, 19 Jul 2018 13:49:00 +0100
Subject: [PATCH] Fix VC6 build following commit aa3c16bd70
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Signed-off-by: Petr Písař <ppisar@redhat.com>
---
utf8.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/utf8.c b/utf8.c
index 51039aed4f..57eac2d8f2 100644
--- a/utf8.c
+++ b/utf8.c
@@ -6363,7 +6363,7 @@ Perl_utf8_to_uvchr(pTHX_ const U8 *s, STRLEN *retlen)
}
return utf8_to_uvchr_buf(s,
- s + strnlen((char *) s, UTF8_MAXBYTES),
+ s + my_strnlen((char *) s, UTF8_MAXBYTES),
retlen);
}
--
2.14.4

View File

@ -0,0 +1,54 @@
From aa3c16bd709ef9b9c8c785af48f368e08f70c74b Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Tue, 17 Jul 2018 13:57:54 -0600
Subject: [PATCH] Make utf8_to_uvchr() safer
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This function is deprecated because the API doesn't allow it to
determine the end of the input string, so it can read off the far end.
But I just realized that since many strings are NUL-terminated, so we
can forbid it from reading past the next NUL, and hence make it safe in
many cases.
Signed-off-by: Petr Písař <ppisar@redhat.com>
---
utf8.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/utf8.c b/utf8.c
index dec8aa1252..51039aed4f 100644
--- a/utf8.c
+++ b/utf8.c
@@ -6345,7 +6345,26 @@ Perl_utf8_to_uvchr(pTHX_ const U8 *s, STRLEN *retlen)
{
PERL_ARGS_ASSERT_UTF8_TO_UVCHR;
- return utf8_to_uvchr_buf(s, s + UTF8_MAXBYTES, retlen);
+ /* This function is unsafe if malformed UTF-8 input is given it, which is
+ * why the function is deprecated. If the first byte of the input
+ * indicates that there are more bytes remaining in the sequence that forms
+ * the character than there are in the input buffer, it can read past the
+ * end. But we can make it safe if the input string happens to be
+ * NUL-terminated, as many strings in Perl are, by refusing to read past a
+ * NUL. A NUL indicates the start of the next character anyway. If the
+ * input isn't NUL-terminated, the function remains unsafe, as it always
+ * has been.
+ *
+ * An initial NUL has to be handled separately, but all ASCIIs can be
+ * handled the same way, speeding up this common case */
+
+ if (UTF8_IS_INVARIANT(*s)) { /* Assumes 's' contains at least 1 byte */
+ return (UV) *s;
+ }
+
+ return utf8_to_uvchr_buf(s,
+ s + strnlen((char *) s, UTF8_MAXBYTES),
+ retlen);
}
/*
--
2.14.4

View File

@ -0,0 +1,39 @@
From 2951abb4de83bfd534d332144e6a0bb3e2aaecdc Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@cpan.org>
Date: Mon, 30 Jul 2018 21:41:44 -0600
Subject: [PATCH] Make utf8_to_uvchr() slightly safer
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Recent commit aa3c16bd709ef9b9c8c785af48f368e08f70c74b made this
function safe if the input is a NUL-terminated string. But if not, it
can read past the end of the buffer. It used as a limit the maximum
length a UTF-8 code point can be. But most code points in real-world
use aren't nearly that long, and we know how long that can be by looking
at the first byte. Therefore, use the length determined by the first
byte as the limit instead of the maximum possible.
Signed-off-by: Petr Písař <ppisar@redhat.com>
---
utf8.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/utf8.c b/utf8.c
index ceb8ed82df..06b77689c0 100644
--- a/utf8.c
+++ b/utf8.c
@@ -5755,8 +5755,8 @@ Perl_utf8_to_uvchr(pTHX_ const U8 *s, STRLEN *retlen)
}
return utf8_to_uvchr_buf(s,
- s + my_strnlen((char *) s, UTF8_MAXBYTES),
- retlen);
+ s + my_strnlen((char *) s, UTF8SKIP(s)),
+ retlen);
}
/*
--
2.14.4

View File

@ -181,6 +181,12 @@ Patch22: perl-5.29.1-perl-133314-always-close-the-directory-handle-on-cle
# in upstream after 5.29.1
Patch23: perl-5.29.1-utf8.c-Make-safer-a-deprecated-function.patch
# Fix a buffer overrun in deprecated utf8_to_uvchr(),
# in upstrem after 5.29.0
Patch24: perl-5.29.0-Make-utf8_to_uvchr-safer.patch
Patch25: perl-5.29.0-Fix-VC6-build-following-commit-aa3c16bd70.patch
Patch26: perl-5.29.1-Make-utf8_to_uvchr-slightly-safer.patch
# Link XS modules to libperl.so with EU::CBuilder on Linux, bug #960048
Patch200: perl-5.16.3-Link-XS-modules-to-libperl.so-with-EU-CBuilder-on-Li.patch
@ -2756,6 +2762,9 @@ Perl extension for Version Objects
%patch21 -p1
%patch22 -p1
%patch23 -p1
%patch24 -p1
%patch25 -p1
%patch26 -p1
%patch200 -p1
%patch201 -p1
@ -2786,6 +2795,9 @@ perl -x patchlevel.h \
'Fedora Patch21: Fix a file descriptor leak in in-place edits (RT#133314)' \
'Fedora Patch22: Fix a file descriptor leak in in-place edits (RT#133314)' \
'Fedora Patch23: Fix a buffer overrun in deprecated S_is_utf8_common()' \
'Fedora Patch24: Fix a buffer overrun in deprecated utf8_to_uvchr()' \
'Fedora Patch25: Fix a buffer overrun in deprecated utf8_to_uvchr()' \
'Fedora Patch26: Fix a buffer overrun in deprecated utf8_to_uvchr()' \
'Fedora Patch200: Link XS modules to libperl.so with EU::CBuilder on Linux' \
'Fedora Patch201: Link XS modules to libperl.so with EU::MM on Linux' \
%{nil}
@ -5076,6 +5088,7 @@ popd
%changelog
* Wed Sep 05 2018 Petr Pisar <ppisar@redhat.com> - 4:5.28.0-421
- Fix a buffer overrun in deprecated S_is_utf8_common()
- Fix a buffer overrun in deprecated utf8_to_uvchr()
* Wed Aug 01 2018 Petr Pisar <ppisar@redhat.com> - 4:5.28.0-420
- Fix a file descriptor leak in in-place edits (RT#133314)