Boost C++ Libraries: Ticket #3634: to_upper / to_lower incorrect for machines with signed chars https://svn.boost.org/trac10/ticket/3634 <p> I'm using an ISO8859-1 / ISO8859-15 (Latin-1 / Latin-9) character set for the source and a string containing german umlauts is not correctly converted according to the locale configuration. The problem is reproducible on any machine where the default configuration of C/C++ uses signed chars, in our case Sun Solaris: </p> <p> 1. The conversion functions toupper and tolower of the standard C library expect an int parameter. </p> <p> 2. Solaris' characters are signed by default (and Sun explicitly advises against changing that, the manpage of Sunstudio 12.1' CC says about the -xchar option: </p> <p> "It is strongly recommended that you never use -xchar to compile routines for any interface exported through a library. The Solaris ABI specifies type char as signed, and system libraries behave accordingly. The effect of making char unsigned has not been extensively tested with system libraries. Instead of using this option, modify your code so that it does not depend on whether type char is signed or unsigned. The sign of type char varies among compilers and operating systems.") </p> <p> 3. Characters with an unsinged value &gt;= 128 (e.g. an umlaut) have negative values for toupper and tolower and thus are never converted for any locale. </p> <p> An explicit static cast to unsigned character in the calls to the according standard C libraries function should solve this problem. </p> <p> Note that this may also be needed for other functions as well, e.g. the classification function. I didn't check those so far. </p> en-us Boost C++ Libraries /htdocs/site/boost.png https://svn.boost.org/trac10/ticket/3634 Trac 1.4.3 Thomas Dorner <td-eclipse@…> Wed, 18 Nov 2009 07:41:41 GMT attachment set https://svn.boost.org/trac10/ticket/3634 https://svn.boost.org/trac10/ticket/3634 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">testLocaleBoost.cpp</span> </li> </ul> <p> example demonstrating the problem </p> Ticket Pavol Droba Wed, 18 Nov 2009 19:15:59 GMT status changed https://svn.boost.org/trac10/ticket/3634#comment:1 https://svn.boost.org/trac10/ticket/3634#comment:1 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">assigned</span> </li> </ul> <p> Would you be by any chance able to submit a patch that is tested on solaris? I don't have access to such machine, so I would be fixing blindly. </p> <p> I will gladly incorporate it into the library code. </p> Ticket Thomas Dorner <td-eclipse@…> Thu, 19 Nov 2009 06:35:00 GMT attachment set https://svn.boost.org/trac10/ticket/3634 https://svn.boost.org/trac10/ticket/3634 <ul> <li><strong>attachment</strong> → <span class="trac-field-new">dirty_fix.patch</span> </li> </ul> <p> unified diff of a dirty quick-fix </p> Ticket Thomas Dorner <td-eclipse@…> Thu, 19 Nov 2009 06:35:57 GMT <link>https://svn.boost.org/trac10/ticket/3634#comment:2 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/3634#comment:2</guid> <description> <p> I don't have something you would like to use as I only have a fix that would completely break the locale parameter to the function call. I've attached the patch anyway, maybe you could give me an additional hint what to try instead. (I'm a bit lost between all those facet templates. ;-) </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Pavol Droba</dc:creator> <pubDate>Fri, 20 Nov 2009 19:35:51 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/3634#comment:3 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/3634#comment:3</guid> <description> <p> Looking at the problem again, I think that the problem actually lies in Solaris's C++ locales. There is nothing wrong with chars being signed. </p> <p> Anyway, I'll try to check it out and come up with a solution. </p> </description> <category>Ticket</category> </item> <item> <author>Thomas Dorner <td-eclipse@…></author> <pubDate>Mon, 23 Nov 2009 08:05:02 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/3634#comment:4 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/3634#comment:4</guid> <description> <p> Thanks! If you have something ready, feel free to contact me to test it. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Marshall Clow</dc:creator> <pubDate>Wed, 28 Dec 2011 18:52:35 GMT</pubDate> <title>owner, status, component changed https://svn.boost.org/trac10/ticket/3634#comment:5 https://svn.boost.org/trac10/ticket/3634#comment:5 <ul> <li><strong>owner</strong> changed from <span class="trac-author">Pavol Droba</span> to <span class="trac-author">Marshall Clow</span> </li> <li><strong>status</strong> <span class="trac-field-old">assigned</span> → <span class="trac-field-new">new</span> </li> <li><strong>component</strong> <span class="trac-field-old">string_algo</span> → <span class="trac-field-new">algorithm</span> </li> </ul> Ticket Marshall Clow Tue, 10 Jan 2012 18:08:02 GMT <link>https://svn.boost.org/trac10/ticket/3634#comment:6 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/3634#comment:6</guid> <description> <p> Replying to <a class="ticket" href="https://svn.boost.org/trac10/ticket/3634#comment:3" title="Comment 3">pavol_droba</a>: </p> <blockquote class="citation"> <p> Looking at the problem again, I think that the problem actually lies in Solaris's C++ locales. There is nothing wrong with chars being signed. </p> <p> Anyway, I'll try to check it out and come up with a solution. </p> </blockquote> <p> I'm coming to that conclusion, too. </p> <p> The C versions of <code>tolower</code>, et. al. take an <code>int</code> as a parameter. The C99 standard (section 7.4.1) says that the input to tolower needs to be representable as unsigned char, or EOF. To me that means "no negative numbers". Microsoft has a page that talks about this issue, too: <a class="ext-link" href="http://msdn.microsoft.com/en-us/library/ms245348.aspx"><span class="icon">​</span>http://msdn.microsoft.com/en-us/library/ms245348.aspx</a> </p> <p> However, the C++ version <code>std::tolower</code>, takes a <code>char</code> (templated) as a parameter, and I can't find any similar restriction in either the C++03 standard or the (draft) C++11 standard. To me, that means that all possible values of <code>char</code> are allowable (or whatever type the function is templated upon) </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Marshall Clow</dc:creator> <pubDate>Thu, 12 Jan 2012 18:45:58 GMT</pubDate> <title/> <link>https://svn.boost.org/trac10/ticket/3634#comment:7 </link> <guid isPermaLink="false">https://svn.boost.org/trac10/ticket/3634#comment:7</guid> <description> <p> I just checked in a fix in <a class="changeset" href="https://svn.boost.org/trac10/changeset/76435" title="should correct #3634; will close when merged to release">[76435]</a>, will merge to release after tests cycle. </p> </description> <category>Ticket</category> </item> <item> <dc:creator>Marshall Clow</dc:creator> <pubDate>Sun, 15 Jan 2012 16:05:57 GMT</pubDate> <title>status changed; resolution set https://svn.boost.org/trac10/ticket/3634#comment:8 https://svn.boost.org/trac10/ticket/3634#comment:8 <ul> <li><strong>status</strong> <span class="trac-field-old">new</span> → <span class="trac-field-new">closed</span> </li> <li><strong>resolution</strong> → <span class="trac-field-new">fixed</span> </li> </ul> <p> (In <a class="changeset" href="https://svn.boost.org/trac10/changeset/76522" title="Merge changes to release; fixes #3634">[76522]</a>) Merge changes to release; fixes <a class="closed ticket" href="https://svn.boost.org/trac10/ticket/3634" title="#3634: Bugs: to_upper / to_lower incorrect for machines with signed chars (closed: fixed)">#3634</a> </p> Ticket