Opened 12 years ago
Last modified 12 years ago
#5086 new Patches
Fix for possible assertion failure in MSVC isctype.c
Reported by: | Owned by: | jsiek | |
---|---|---|---|
Milestone: | To Be Determined | Component: | tokenizer |
Version: | Boost Development Trunk | Severity: | Problem |
Keywords: | Cc: |
Description
Ticket #4791 about warning C6328 was closed as fixed. But in fact, the fix (r66855) only silenced the warning, not addressing the real problem indicated by the warning.
The problem is explained here. In short: (for character classification functions in <cctype>)
the valid range of values for its input argument is:
0 <= c <= 255, plus the special value EOF.
Otherwise, the behavior is undefined. Thus, passing an user provided value of char (may be signed) can cause UB. The existing comment just above struct traits_extension addresses the same issue.
Nothing was changed by the static_cast<int> introduced by the fix. It just expressed what the compiler implicitly does (integral promotion).
Here is the patch to fix the real problem, including a test about the problem.
Attachments (1)
Change History (8)
by , 12 years ago
Attachment: | boost_tokenizer_tolerate_negative_char.patch added |
---|
comment:1 by , 12 years ago
Version: | Boost 1.45.0 → Boost Development Trunk |
---|
follow-up: 3 comment:2 by , 12 years ago
Good explanation; I see the problem that needs to be fixed.
However, it seems to me that this patch will change the behavior of non-MS systems.
isspace ( -5 ) will become isspace ( 251 ) - is that correct?
comment:3 by , 12 years ago
isspace ( -5 ) will become isspace ( 251 ) - is that correct?
Yes. But the former was undefined behavior. I think change to any certain behavior from undefined behavior is enough convincing.
follow-up: 5 comment:4 by , 12 years ago
I don't see where that's undefined behavior (except on MS-systems)
Looking at the C specification (since the C++ one defers back to that), all I see is:
7.4.1.9 The isspace function
Synopsis:
#include <ctype.h> int isspace(int c);
Description: The isspace function tests for any character that is a standard white-space character or is one of a locale-specific set of characters for which isalnum is false. The standard white-space characters are the following: space (’ ’), form feed (’\f’), new-line (’\n’), carriage return (’\r’), horizontal tab (’\t’), and vertical tab (’\v’). In the "C" locale, isspace returns true only for the standard white-space characters.
Is there something that I'm missing?
comment:5 by , 12 years ago
Is there something that I'm missing?
Yes, here: 7.4 Character handling <ctype.h> p1
The header <ctype.h> declares several functions useful for classifying and mapping characters. In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.
follow-up: 7 comment:6 by , 12 years ago
Thanks for the pointer.
But I still see a problem with the path, specifically, when handling EOF. Casting EOF into an unsigned char will make it "not EOF" any more.
comment:7 by , 12 years ago
But I still see a problem with the path, specifically, when handling EOF.
A value char(EOF)
, which become equal to EOF
after integer promotion
if EOF
is representable by char, in a string should never be treated
as end of file in the first place.
This patch will somewhat cure that case, too.
svn diff for trunk r68230