--- a/lib/Mail/SpamAssassin/Conf.pm 2019/08/01 12:28:38 1864149 +++ b/lib/Mail/SpamAssassin/Conf.pm 2019/08/08 08:11:36 1864686 @@ -3066,12 +3066,19 @@ as per the header tests, C<#> must be escaped (C<\#>) or else it is considered the beginning of a comment. -The 'body' in this case is the textual parts of the message body; -any non-text MIME parts are stripped, and the message decoded from -Quoted-Printable or Base-64-encoded format if necessary. The message -Subject header is considered part of the body and becomes the first -paragraph when running the rules. All HTML tags and line breaks will -be removed before matching. +The 'body' in this case is the textual parts of the message body; any +non-text MIME parts are stripped, and the message decoded from +Quoted-Printable or Base-64-encoded format if necessary. Parts declared as +text/html will be rendered from HTML to text. + +All body paragraphs (double-newline-separated blocks text) are turned into a +line breaks removed, whitespace normalized single line. Any lines longer +than 2kB are split into shorter separate lines (from a boundary when +possible), this may unexpectedly prevent pattern from matching. Patterns +are matched independently against each of these lines. + +Note that the message Subject header is considered part of the body and +becomes the first line when running the rules. =item body SYMBOLIC_TEST_NAME eval:name_of_eval_method([args]) @@ -3152,6 +3159,10 @@ tags and line breaks will still be present. Multiline expressions will need to be used to match strings that are broken by line breaks. +Note that the text is split into 2-4kB chunks (from a word boundary when +possible), this may unexpectedly prevent pattern from matching. Patterns +are matched independently against each of these chunks. + =item rawbody SYMBOLIC_TEST_NAME eval:name_of_eval_method([args]) Define a raw-body eval test. See above. --- a/lib/Mail/SpamAssassin/PerMsgStatus.pm 2019/08/03 13:55:00 1864336 +++ b/lib/Mail/SpamAssassin/PerMsgStatus.pm 2019/08/08 08:11:36 1864686 @@ -1769,8 +1769,10 @@ Returns the message body, with B or B encodings decoded, and non-text parts or non-inline attachments stripped. -It is returned as an array of strings, with each string representing -one newline-separated line of the body. +This is the same result text as used in 'rawbody' rules. + +It is returned as an array of strings, with each string being a 2-4kB chunk +of the body, split from boundaries if possible. =cut @@ -1784,6 +1786,8 @@ get_decoded_body_text_array()), with HTML rendered, and with whitespace normalized. +This is the same result text as used in 'body' rules. + It will always render text/html, and will use a heuristic to determine if other text/* parts should be considered text/html.