Opened 12 years ago

Last modified 12 years ago

#5086 new Patches

Fix for possible assertion failure in MSVC isctype.c

Reported by: Kazutoshi Satoda <k_satoda@…> Owned by: jsiek
Milestone: To Be Determined Component: tokenizer
Version: Boost Development Trunk Severity: Problem
Keywords: Cc:

Description

Ticket #4791 about warning C6328 was closed as fixed. But in fact, the fix (r66855) only silenced the warning, not addressing the real problem indicated by the warning.

The problem is explained here. In short: (for character classification functions in <cctype>)

the valid range of values for its input argument is:

0 <= c <= 255, plus the special value EOF.

Otherwise, the behavior is undefined. Thus, passing an user provided value of char (may be signed) can cause UB. The existing comment just above struct traits_extension addresses the same issue.

Nothing was changed by the static_cast<int> introduced by the fix. It just expressed what the compiler implicitly does (integral promotion).

Here is the patch to fix the real problem, including a test about the problem.

Attachments (1)

boost_tokenizer_tolerate_negative_char.patch (2.5 KB ) - added by Kazutoshi Satoda <k_satoda@…> 12 years ago.
svn diff for trunk r68230

Download all attachments as: .zip

Change History (8)

by Kazutoshi Satoda <k_satoda@…>, 12 years ago

svn diff for trunk r68230

comment:1 by Kazutoshi Satoda <k_satoda@…>, 12 years ago

Version: Boost 1.45.0Boost Development Trunk

comment:2 by Marshall Clow, 12 years ago

Good explanation; I see the problem that needs to be fixed.

However, it seems to me that this patch will change the behavior of non-MS systems.

isspace ( -5 ) will become isspace ( 251 ) - is that correct?

in reply to:  2 comment:3 by Kazutoshi Satoda <k_satoda@…>, 12 years ago

isspace ( -5 ) will become isspace ( 251 ) - is that correct?

Yes. But the former was undefined behavior. I think change to any certain behavior from undefined behavior is enough convincing.

comment:4 by Marshall Clow, 12 years ago

I don't see where that's undefined behavior (except on MS-systems)

Looking at the C specification (since the C++ one defers back to that), all I see is:

7.4.1.9 The isspace function

Synopsis:

#include <ctype.h> 
int isspace(int c);

Description: The isspace function tests for any character that is a standard white-space character or is one of a locale-specific set of characters for which isalnum is false. The standard white-space characters are the following: space (’ ’), form feed (’\f’), new-line (’\n’), carriage return (’\r’), horizontal tab (’\t’), and vertical tab (’\v’). In the "C" locale, isspace returns true only for the standard white-space characters.

Is there something that I'm missing?

in reply to:  4 comment:5 by Kazutoshi Satoda <k_satoda@…>, 12 years ago

Is there something that I'm missing?

Yes, here: 7.4 Character handling <ctype.h> p1

The header <ctype.h> declares several functions useful for classifying and mapping characters. In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.

comment:6 by Marshall Clow, 12 years ago

Thanks for the pointer.

But I still see a problem with the path, specifically, when handling EOF. Casting EOF into an unsigned char will make it "not EOF" any more.

in reply to:  6 comment:7 by Kazutoshi Satoda <k_satoda@…>, 12 years ago

But I still see a problem with the path, specifically, when handling EOF.

A value char(EOF), which become equal to EOF after integer promotion if EOF is representable by char, in a string should never be treated as end of file in the first place.

This patch will somewhat cure that case, too.

Note: See TracTickets for help on using tickets.