Opened 10 years ago

Last modified 10 years ago

#7266 assigned Patches

Gettext path formats.

Reported by: 166291@… Owned by: Artyom Beilis
Milestone: Boost 1.54.0 Component: locale
Version: Boost Development Trunk Severity: Not Applicable
Keywords: Cc:

Description

I'm extremely happy about Boost.Locale, but I've found a few things
lacking. So at first I wrote a some wrappers to get around a few flaws
with my usage of Boost.Locale. One of these flaws was how it's hardcoded
to use the Gettext directory hierarchy.

I'd rather store my stuff in 'lang/en_US.mo' rather than
'lang/en_US/LC_MESSAGES/my_app.mo'.

The patch adds a 'path format' feature, which allows you to format the
directory structure when finding Gettext catalogs, to achieve the effect
above. All you really have to do is run:

   gen.add_path_format("{1}/{2}.mo"); // Use a smaller hierarchy.

to achieve the result that I prefer, or

   gen.add_path_format("{1}/{2}/{3}/{4}.mo"); // Use a Gettext hierarchy.

to achieve the result that Boost.Locale uses right now. Ripped straight
from Doxygen comments:

   {1} A path to search for catalogs in.
   {2} The locale's name.
   {3} The locale's category.
   {4} The Gettext domain.

I apologize for the cut and paste from it but I'm having trouble with trac. The full thread with patches can be found at: http://lists.boost.org/Archives/boost/2012/08/195789.php

Attachments (2)

formats.patch (6.7 KB ) - added by 166291@… 10 years ago.
Patch v2
gettext_paths.patch (21.0 KB ) - added by Artyom Beilis 10 years ago.
Better patch for future updates

Download all attachments as: .zip

Change History (9)

by 166291@…, 10 years ago

Attachment: formats.patch added

Patch v2

comment:1 by 166291@…, 10 years ago

So I've created this patch. (formats.patch) that uses a slightly different API to the first patch, and tweaks create_messages_info to be future-proof, but besides that it's 100% compatible with existing applications (I think down to the ABI level), and implements path formats.

I haven't added anything to the generator class as from experience I create the messages facet myself to do anything with Gettext, but if it's a good idea I'd be happy to add it.

comment:2 by Artyom Beilis, 10 years ago

I'm sorry I cound't work on this

Unfortunatly the new patch still does changes ABI

It inlines functions that were not inlined.

I'll try to get to it this week.

If no please ping me again :-)

comment:3 by 166291@…, 10 years ago

Ah, thanks for the response (which I completely missed). I'll try and fix it. How do you check these things? All I'm really working with is a text editor.

comment:4 by Artyom Beilis, 10 years ago

Milestone: To Be DeterminedBoost 1.54.0
Status: newassigned

I created an alternative patch on your patch base, it did some more radical changes that allow to add more details in future without breaking ABI or API.

I attach the patch to the ticket so I can apply it later and not forget. I can't do it ready for 1.53 as it is closed for changes except bug fixes.

I'll apply it to 1.54

by Artyom Beilis, 10 years ago

Attachment: gettext_paths.patch added

Better patch for future updates

comment:5 by 166291@…, 10 years ago

Wow! That's amazing. I'm in awe.

comment:6 by 166291@…, 10 years ago

Would it be possible to somehow expose this and do #7727 at the same time?

comment:7 by 166291@…, 10 years ago

Okay, I've thought about this a lot and concluded that path formats are the wrong way to go about this in the short term, and only end up with technical debt. It should be handled by a callback class that receives information about the wanted catalog (locale name. category, etc) and returns the bytes of the file, as this is technically about custom file system support.

I haven't actually coded anything yet, but this is how I'll think it'd work, and more importantly, why it'd be better:

// --- BOOST.LOCALE CODE

//! Information about a single catalog we want to find.
class catalog_info
{
  string language;
  string country;
  string variant;
  string encoding;
  string locale_category;
  string domain;
};

//! Searches for, and loads a catalog.
class default_catalog_loader
{
  public:
    void add_search_path(string const& path);
    void add_path_format(string const& path);
    
    vector<byte> callback(catalog_info& info)
    {
      vector<string> paths;
      paths += language;
      paths += language + "_" + country + "@" + variant;
      paths += language + "@" + variant;
      paths += language + "_" + country;
      
      foreach(search_path in search_paths)
      {
        foreach(path_format in path_formats)
        {
          foreach(path in paths)
          {
            string formatted = format(insert uber long line from existing code);
            
            if(FILE_EXISTS(formatted))
            {
              // LOAD_FILE is hypothetical and utopian, thus UTF-8 encoded.
              return LOAD_FILE(formatted));
            }
          }
        }
      }
      
      return vector<byte>(); // Empty vector.
    }
}

// --- USER CODE

std::locale init_locale(void)
{
  default_catalog_loader loader;
  loader.add_search_path("/usr/share/locale/");
  loader.add_path_format("{1}/{2}/{3}/{4}.mo");
  
  generator gen;
  std::locale genLoc = gen("en_US.UTF-8");
  
  blg::messages_info info;
  info.language = "en";
  info.country  = "US";
  info.encoding = "UTF-8";
  info.variant  = "@euro";
  info.callback = boost::bind(loader::callback, &loader, _1, _2);
  
  std::locale gettextLoc(genLoc, blg::create_messages_facet<char>(info));
  
  return gettextLoc;
}

That would do what the existing code does (kind of, I'm sure I've forgotten something). Plus it means people could do things like this:

// --- USER CODE

// Include generated headers using bin2hex or something interesting.
#include "en_US.mo.h"
#include "en@euro.mo.h"

vector<byte> memory_callback(catalog_info& info)
{
  vector<string> paths;
  paths += language;
  paths += language + "_" + country + "@" + variant;
  paths += language + "@" + variant;
  paths += language + "_" + country;
  
  foreach(path in paths)
  {
    byte* file_bytes = GET_MEMORY_FILE(path + ".mo");
    
    if(file_bytes)
    {
      return VECTOR_FROM_BYTES(file_bytes);
    }
  }
  
  return vector<byte>(); // Empty vector.
}

Which implements a simple memory-based file loader. Personally I'd use this for my projects, as I already use a callback.

I have a few notes on this approach however:

There is shared repetition when looking for paths in multiple callback code. I don't know how to remove this without calling the callback multiple times, which is against this design (EXCEPT when we're trying to load multiple catalogs for multiple domains).

It'd be stupidly easy to have a get_loaded_catalogs method that implements #7727 in the library side, but it'd be best if it had some 'user' callback data. In the first case, which path format and search path and formatted path, in the second case, a pointer to the memory file.

messages_info's callback would need to be changed, breaking the ABI. However, having a create_messages_facet that ONLY takes a callback in this case would work.

Another thing is whether callbacks be classes? While the second example is one function long, making it a class would remove the horrible bind for class members, it'd allow the loader to be an interface which would be extendable if done right, I imagine callback data for catalog_info could be an abstract class defined by the loader too.

What it'd also allow to do is custom file FORMATS, by returning something that mo_messages could use as a string table. This means for example, I could write a parser that loads po files uncompiled, or somebody completely insane could use XML, which I imagine could help transition.

What're your thoughts on this?

Note: See TracTickets for help on using tickets.