Opened 16 years ago
Last modified 11 years ago
#698 closed Bugs
The case insensitive modifier doesn't work when followed by a character class — at Version 6
| Reported by: | nobody | Owned by: | John Maddock |
|---|---|---|---|
| Milestone: | Component: | regex | |
| Version: | Boost 1.45.0 | Severity: | Problem |
| Keywords: | Cc: |
Description (last modified by )
My name is Florin Trofin (ftrofin at _adobe_ dot com) and I work for Adobe Systems. We are using boost/regex 1.33.1 in one of our projects and we've encountered the following bug: The case insensitive modifier is supposed to make the string comparison case insensitive from the place at which it is encountered first till the end. In view of this, if we have "ABC abc aCb" as the text in which we will be doing search and if we have find string as "(?i)[bc]" then the expectation is that b/B/c/C will be found. But only b/c is found. If we have "(?i)a[bc]" as find string then ab/ac/AB/Ab/AC/Ac are found as expected. The only place which is having problem is when we specify character class[] immediately after case insensitive modifier "(?i)". We also have issues regarding character equivalence on Mac. Japanese character equivalence in general is not working i.e. if we have [[=x=]] where x is a Japanese character in hiragana or katakana then the equivalence is not matching correctly. Please let me know if you want me to open a separate bug on this issue. If you need more info please let me know. Thx!
Change History (7)
comment:1 by , 16 years ago
comment:2 by , 16 years ago
| Status: | assigned → closed |
|---|
Logged In: YES user_id=1312539 This Tracker item was closed automatically by the system. It was previously set to a Pending status, and the original submitter did not respond within 14 days (the time period specified by the administrator of this Tracker).
comment:3 by , 11 years ago
| Severity: | → Problem |
|---|
This bug still exists as described in the original post. I've tested with Boost 1.45 but I believe it's still there in 1.47.
The regular expression "(?i)[dh]og" should match on both "HOG" and "dog" but in fact only matches on "dog". Please see the attached sample code boostbug698.cpp.
I've fixed the bug as follows: at boost/regex/v4/basic_regex_creator.hpp line 1216 change m_icase to l_icase.
< if(&c != re_is_set_member(&c, &c + 1, static_cast<re_set_long<mask_type>*>(state), *m_pdata, m_icase)) > if(&c != re_is_set_member(&c, &c + 1, static_cast<re_set_long<mask_type>*>(state), *m_pdata, l_icase))
by , 11 years ago
| Attachment: | boostbug698.cpp added |
|---|
Sample code demonstrating this bug (compiles with MSVC9)
comment:4 by , 11 years ago
| Resolution: | None |
|---|---|
| Status: | closed → reopened |
comment:5 by , 11 years ago
| Summary: | The case insensitive modifier doesn't work → The case insensitive modifier doesn't work when followed by a character class |
|---|---|
| Version: | None → Boost 1.45.0 |
Note:
See TracTickets
for help on using tickets.

Logged In: YES user_id=14804 I can't reproduce this, the test program I'm using is below, can you check and see if this reproduces the issue for you? BTW I'm testing with the latest Boost-cvs, but the only patches I'm aware of making are to non-greedy repeats which shouldn't have any effect here. Re equivalence classes: there is no portable way to make this work unfortunately, it requires that the regex engine is able to decode the collation string produced by the locale to extract the primary equivalence class. The "kind" of sort key used by the platform is determined in a fairly heuristic way in find_sort_syntax() in boost/regex/v4/primary_transform.hpp, and the actually sort key is produced in cpp_regex_traits::primary_transform(). You may - with a bit of debugging - be able to find out what's going wrong (I don't have access to a mac BTW). The most important thing would be to find out what kind of sort keys are returned by std::collate<>::transform. Test program follows, John Maddock. #include <boost/regex.hpp> #include <iostream> int main(int,char**) { boost::regex e("(?i)[bc]"); std::string s("ABC abc aCb"); boost::sregex_iterator i(s.begin(), s.end(), e), j; while(i != j) { std::cout << *i << std::endl; ++i; } }