Boost C++ Libraries: Ticket #11016: Boost file logging misbehaves when file system is full

version changed

michi.henning@… — Thu, 12 Feb 2015 02:00:49 GMT

version Boost 1.57.0 → Boost 1.55.0

Thu, 12 Feb 2015 09:13:03 GMT

Without having looked at the code, my guess is that the code creates a new log file, tries to write to it, gets ENOSPC, and doesn't unlink the log file at that point.

Andrey Semashev — Thu, 12 Feb 2015 10:58:21 GMT

What I think is happening is the file stream which is used to write the log file becomes !good() when there is no space to write a record. No exception is thrown but on the next record the library sees that the stream is not operational and rotates the file. This is where lots of empty files appear.

I agree that the current behavior is not correct, but it is not clear what it should be. I can throw exceptions when write fails (although I won't be able to detect the cause of the failure) but as a recovery procedure I still have to rotate the file on the next record so that the library repairs itself when environment becomes normal again. I can add a check and not actually perform rotation when the file is empty, although this makes behavior less obvious.

Thu, 12 Feb 2015 11:15:17 GMT

Eating up all available inodes (on top of the file system being full already) is not right. If any attempt to write a file returns ENOSPC, all write activity should be abandoned. In addition, I would unlink whichever file incurs that error. The file system is already full, and things are pretty much hopeless at that point anyway until someone comes along and makes more space available. So, losing whichever log file couldn't be written/appended to is not a big deal.

I don't think throwing an exception would help, in the sense that no code that uses boost::log expects to get an exception from one of the API calls. Throwing from a logging library just doesn't make sense. What am I supposed to do? Catch the exception and log an error about it having happened? ;-)

I suspect some sort of stateful error handling is necessary here. If any write returns ENOSPC, take note of that fact and remember it. Disable log rotation for a while. Occasionally, maybe every 10-60 seconds or so, check if there is space available again. If not, rinse and repeat. There is no point in continuously bashing away at a file system that is full already…

Another check might be to test whether a file is newly-created and empty. If so, the first write to the file should immediately unlink it again. That way, at least there won't be the endless attempts to create more and more files that can't be written to anyway.

Thu, 12 Feb 2015 11:18:41 GMT

If so, the first write to the file should immediately unlink it again.

My apologies, I meant to write "If so, the first write, if it fails, should immediately unlink it again."

The idea is that only files that exist already and have at least one byte of content are written to unconditionally. But, files that were just created and can't be written to should be unlinked if a write fails because the file system is full.

Andrey Semashev — Thu, 12 Feb 2015 12:43:22 GMT

Replying to Michi Henning <michi.henning@…>:

Eating up all available inodes (on top of the file system being full already) is not right.

Not arguing with that.

If any attempt to write a file returns ENOSPC, all write activity should be abandoned. In addition, I would unlink whichever file incurs that error.

Not if the file is not empty. You wouldn't like losing written logs, would you?

I don't think throwing an exception would help, in the sense that no code that uses boost::log expects to get an exception from one of the API calls. Throwing from a logging library just doesn't make sense. What am I supposed to do? Catch the exception and log an error about it having happened? ;-)

I could suggest multiple sensible reactions to such failure, like displaying a notification in GUI or cleaning up archived logs or terminating the app. The point is that the library has to indicate the problem, and it is the application's prerogative to decide how to react. You can suppress all exceptions from the library, if you like.

I suspect some sort of stateful error handling is necessary here. If any write returns ENOSPC, take note of that fact and remember it. Disable log rotation for a while. Occasionally, maybe every 10-60 seconds or so, check if there is space available again. If not, rinse and repeat. There is no point in continuously bashing away at a file system that is full already…

Another check might be to test whether a file is newly-created and empty. If so, the first write to the file should immediately unlink it again. That way, at least there won't be the endless attempts to create more and more files that can't be written to anyway.

Note that the library is not timer-driven but rather event-driven (where events are represented with log records as they are emitted by the app). For this reason time-based solutions have limited sense.

Thu, 12 Feb 2015 22:45:39 GMT

Replying to andysem:

If any attempt to write a file returns ENOSPC, all write activity should be abandoned. In addition, I would unlink whichever file incurs that error.

Not if the file is not empty. You wouldn't like losing written logs, would you?

Well… It's questionable whether the log file is valuable in this case. It's a bit like running out of memory: the system is in a state where correct functioning of some components is impossible. Seeing that this case will, in practice, be rare, unlinking the file would be quite acceptable, I believe. Running out of space really is an extreme case, and it justifies extreme recovery action, IMO.

I don't think throwing an exception would help, in the sense that no code that uses boost::log expects to get an exception from one of the API calls. Throwing from a logging library just doesn't make sense. What am I supposed to do? Catch the exception and log an error about it having happened? ;-)

I could suggest multiple sensible reactions to such failure, like displaying a notification in GUI or cleaning up archived logs or terminating the app. The point is that the library has to indicate the problem, and it is the application's prerogative to decide how to react. You can suppress all exceptions from the library, if you like.

I don't think a global suppression would help, at least not in all cases. I'm logging from a library. I don't know what the code that links with my library does. So, I've been very careful not to mess with the global boost log state and use separate loggers and sinks that don't interfere with the application, even if it happens to use boost log. I can't just change a process-wide setting from within a library. That's just as forbidden as changing umask or working directory would be.

In my opinion, throwing from log methods (at least the ones that create log messages) is a big no-no. If a log message can't be written, there is typically nothing that the code that does the logging can do. All it would achieve is that I would have to clutter the calling code with masses of catch handlers. Throwing from the log rotate call might be OK, if it's well documented that the method can throw. That's because any setup/initialization calls for logging are typically in a section of code that might have at least a chance of meaningfully reporting an error. But, seeing that the log rotation happens transparently while I'm logging, I don't think that would help much.

Throwing from the BOOST_LOG macros is definitely out, as far as I can see. All that would achieve is send most programs to terminate(), quick smart.

Note that the library is not timer-driven but rather event-driven (where events are represented with log records as they are emitted by the app). For this reason time-based solutions have limited sense.

Timers wouldn't be needed. Just remember the current time if a failure occurs. Then, when trying to write, if in the error state, check the current time and, if within the ban period, skip the write. Once a write works, re-enter the no-error state, to avoid the overhead of checking the current time.

Andrey Semashev — Fri, 13 Feb 2015 07:48:48 GMT

Replying to Michi Henning <michi.henning@…>:

I don't think a global suppression would help, at least not in all cases. I'm logging from a library.

I'm not saying global suppression. You can suppress exceptions on different levels, including the logger you use in your library.

In my opinion, throwing from log methods (at least the ones that create log messages) is a big no-no. If a log message can't be written, there is typically nothing that the code that does the logging can do. All it would achieve is that I would have to clutter the calling code with masses of catch handlers.

Have a look at exception handlers, you don't have to write try/catch everywhere.

Timers wouldn't be needed. Just remember the current time if a failure occurs. Then, when trying to write, if in the error state, check the current time and, if within the ban period, skip the write. Once a write works, re-enter the no-error state, to avoid the overhead of checking the current time.

What does such record peeling give? Other than losing some records which could have been written when free space appears, it doesn't seem to do any significant effect.

Fri, 13 Feb 2015 09:33:42 GMT

Replying to andysem:

I'm not saying global suppression. You can suppress exceptions on different levels, including the logger you use in your library.

Ah, OK, I didn't know that, thanks! Where do I look for this? Any doc on that?

In my opinion, throwing from log methods (at least the ones that create log messages) is a big no-no. If a log message can't be written, there is typically nothing that the code that does the logging can do. All it would achieve is that I would have to clutter the calling code with masses of catch handlers.

Have a look at exception handlers, you don't have to write try/catch everywhere.

OK, again thanks, I didn't know that!

Timers wouldn't be needed. Just remember the current time if a failure occurs. Then, when trying to write, if in the error state, check the current time and, if within the ban period, skip the write. Once a write works, re-enter the no-error state, to avoid the overhead of checking the current time.

What does such record peeling give? Other than losing some records which could have been written when free space appears, it doesn't seem to do any significant effect.

The savings would be minor. Basically, you'd avoid trying to write for a while, dealing with the failed calls and potentially re-establishing the previously bad situation. But I agree, this is just an embellishment and won't solve the fundamental problem.

Andrey Semashev — Fri, 13 Feb 2015 10:28:18 GMT

Replying to Michi Henning <michi.henning@…>:

Replying to andysem:

I'm not saying global suppression. You can suppress exceptions on different levels, including the logger you use in your library.

Ah, OK, I didn't know that, thanks! Where do I look for this? Any doc on that?

See here: http://www.boost.org/doc/libs/release/libs/log/doc/html/log/detailed/sources.html#log.detailed.sources.exception_handling

Have a look at exception handlers, you don't have to write try/catch everywhere.

OK, again thanks, I didn't know that!

http://www.boost.org/doc/libs/release/libs/log/doc/html/log/detailed/utilities.html#log.detailed.utilities.exception_handlers

Fri, 13 Feb 2015 10:48:00 GMT

Thank you! I missed this when I read the documentation.

status changed; resolution set

Andrey Semashev — Sat, 14 Feb 2015 16:22:45 GMT

status new → closed
resolution → fixed

Fixed in https://github.com/boostorg/log/commit/7ebfd3b6c4772cfa09c54366e96e4b5e8c079af6.

I didn't add exceptions yet but created a separate ticket #11026 for that. This will probably be a more complex task than I originally thought.

status changed; resolution deleted

Mon, 16 Feb 2015 23:24:39 GMT

status closed → reopened
resolution fixed

I just built 1.57 from source with your patch applied and re-ran my test. I'm still seeing lots of empty files being left behind in the log directories. There are not as many: it's about a dozen new empty files per run now (instead of hundreds). But the problem isn't completely fixed yet, it seems.

Andrey Semashev — Tue, 17 Feb 2015 03:47:28 GMT

I can't reproduce this, I'm only having one file per run, which is expected. I'm testing with the rotating_file example from the library, modified to work similarly to your setup. Can you try this example and Boost from git develop branch?

Tue, 17 Feb 2015 03:56:19 GMT

I'll try and put together a stand-alone test case. Could the few log files I still see being left empty be related to the fact that I have several process running concurrently, each of which uses a different log directory?

Andrey Semashev — Tue, 17 Feb 2015 04:02:58 GMT

Yes, that's possible. Also, different processes must not use the same directories because the library does no interprocess synchronization. File names may clash and limits are not maintained in this case.

Tue, 17 Feb 2015 05:19:33 GMT

There is definitely no directory sharing among these processes. It's just that there are a bunch of them running, each logging into a separate log dir.

Andrey Semashev — Tue, 17 Feb 2015 06:40:36 GMT

Does each process have a separate directory set as the target directory for the file collector? If so, I don't see how multiple empty files can appear in each of these directories after a single run. That is unless you recreate the sink or perform file rotation manually.

Please, try to run the test with a single process to simplify things.

Andrey Semashev — Sun, 22 Feb 2015 19:22:03 GMT

Any luck reproducing the issue with the library example?

Sun, 22 Feb 2015 21:36:04 GMT

My apologies Andy, I haven't had time yet to get back to this. Will look at it today.

anonymous — Mon, 23 Feb 2015 04:44:10 GMT

Below is a stand-alone test case that shows the problem. This is with boost 1.57 with your patch applied.

I create a 5 MB ram disk:

sudo mount -o size=5M -t tmpfs none /home/michi/tmp/ramdisk

Note that the code below hard-wires this path. Please adjust as necessary.

When I run the code in a loop, I see the empty log files appearing once the file system fills up (one empty file per run).

while :; do ./a.out ; done

int main()
{
    namespace keywords = boost::log::keywords;
    namespace logging = boost::log;
    namespace sinks = boost::log::sinks;
    logging::sources::severity_channel_logger_mt<> logger;
    typedef logging::sinks::asynchronous_sink<logging::sinks::text_file_backend> FileSinkT;
    typedef boost::shared_ptr<FileSinkT> FileSinkPtr;
    FileSinkPtr file_sink = boost::make_shared<FileSinkT>(
                                                  keywords::file_name = "/home/michi/tmp/ramdisk/log-%N.log",
                                                  keywords::rotation_size = 1024 * 512);
    file_sink->locked_backend()->set_file_collector(sinks::file::make_collector(
                                                        keywords::target = "/home/michi/tmp/ramdisk",
                                                        keywords::max_size = 1024 * 1024 * 10));
    file_sink->locked_backend()->scan_for_files();
    file_sink->locked_backend()->auto_flush(true);
    logging::core::get()->add_sink(file_sink);
    for (int i = 0; i < 100000; ++i)
    {
        BOOST_LOG(logger) << "Hello";
    }
    std::this_thread::sleep_for(std::chrono::milliseconds(1000));
}

status changed; resolution set

Andrey Semashev — Mon, 23 Feb 2015 10:11:14 GMT

status reopened → closed
resolution → fixed

When I run the code in a loop, I see the empty log files appearing once the file system fills up (one empty file per run).

Yes, that is how it should work. The empty file appears because the file is rotated at application exit, and the new run starts a new file. You can add std::ios::app to the file open flags in the sink backend if you want to append to the previous file on every next run.

Mon, 23 Feb 2015 10:16:42 GMT

Appending will be useful, thanks. I didn't know about this option. When I searched for appending to earlier log files recently, I found some threads in discussion forums that said that this is impossible. I take it that this is fairly recent?

As to the empty file, the problem is that there a new empty file created on every run. So, while the file system is full, I get a new empty file for each and every run of my process. (Many of my processes are short-lived.) That doesn't seem right to me. Why leave an empty log file behind?

Andrey Semashev — Mon, 23 Feb 2015 10:25:21 GMT

Appending will be useful, thanks. I didn't know about this option. When I searched for appending to earlier log files recently, I found some threads in discussion forums that said that this is impossible. I take it that this is fairly recent?

No, it's there for quite some time.

As to the empty file, the problem is that there a new empty file created on every run. So, while the file system is full, I get a new empty file for each and every run of my process. (Many of my processes are short-lived.) That doesn't seem right to me. Why leave an empty log file behind?

Unless you enable appending, you get a new file regardless of the file emptiness or space exhaustion. This behavior seems logical to me. Empty files can also be an indication of a certain behavior of the app, so they are not completely worthless.

status changed; resolution deleted

Mon, 23 Feb 2015 21:26:54 GMT

status closed → reopened
resolution fixed

Replying to andysem:

Yes, that is how it should work. The empty file appears because the file is rotated at application exit, and the new run starts a new file. You can add std::ios::app to the file open flags in the sink backend if you want to append to the previous file on every next run.

I just tried this, by adding keywords::open_mode = std::ios::app when I create the sink. But it doesn't appear to change anything. The last log file from the previous run isn't appended to; instead, a new log file is created regardless. Other people seem to have had the same experience:

http://stackoverflow.com/questions/8418917/boost-log-how-to-configure-a-text-sink-backend-to-append-to-rotated-files

Am I doing something wrong?

Unless you enable appending, you get a new file regardless of the file emptiness or space exhaustion. This behavior seems logical to me. Empty files can also be an indication of a certain behavior of the app, so they are not completely worthless.

I honestly can't think of a reason why an empty file would be useful.

I experimented with this some more. So, I run the test case until the file system is full, at which point it leaves an empty log file behind. Now I run the test three more times, so I end up with four empty log files (plus a whole bunch of non-empty ones). Then I delete some non-empty log files so there is plenty of room again, and run the test one more time. As expected, it now creates two new non-empty log files. But the file rotation code never removes the empty log files, even after there is space in the file system.

I'm sorry, but this is still not right. Once the file system fills up, boost log creates an empty log file on every file rotation. These empty files accumulate indefinitely and are never removed. That's a permanent resource leak.

The code should check on the first write after creating a log file whether the write succeeded. Otherwise, something is seriously wrong, and it should unlink the file it just created.

Mon, 23 Feb 2015 22:53:36 GMT

Just to be clear: the sole purpose of creating a file is to write to it. If the file can't be written to, there is no point in having the file. So, if a write fails for any reason AND the file is empty at that point, unlink the file. If the file has some contents at that point already, by all means, keep it. As far as I can see, that would get rid of the empty files.

status changed; resolution set

Andrey Semashev — Tue, 24 Feb 2015 07:52:44 GMT

status reopened → closed
resolution → fixed

I just tried this, by adding keywords::open_mode = std::ios::app when I create the sink. But it doesn't appear to change anything.

The newly generated file name must match the old file name, then it will append to the old file. This means that the file name must be sufficiently stable over time and not contain the counter. If you still can't get it to work, please, create a separate ticket.

I'm sorry, but this is still not right. Once the file system fills up, boost log creates an empty log file on every file rotation. These empty files accumulate indefinitely and are never removed. That's a permanent resource leak.

This is not a leak. Empty files, as well as non-empty ones are accounted for and deleted when threshold is reached. See the docs http://www.boost.org/doc/libs/release/libs/log/doc/html/log/detailed/sink_backends.html#log.detailed.sink_backends.text_file.managing_rotated_files, especially take note about the min_free_space parameter.

I don't see the reason to change the behavior wrt empty files. They are not special and will be processed just like any other log file.

Tue, 24 Feb 2015 08:32:47 GMT

Replying to andysem:

This is not a leak. Empty files, as well as non-empty ones are accounted for and deleted when threshold is reached. See the docs http://www.boost.org/doc/libs/release/libs/log/doc/html/log/detailed/sink_backends.html#log.detailed.sink_backends.text_file.managing_rotated_files, especially take note about the min_free_space parameter.

I don't see the reason to change the behavior wrt empty files. They are not special and will be processed just like any other log file.

Andy, the point is that the threshold isn't reached while the file system is full. Therefore, while the "file system full" condition persists, a potentially unbounded number of inodes is used up, making a bad situation worse. If the file system free space falls below the threshold then, yes, eventually the empty inodes are reclaimed. But that doesn't happen until the threshold *is* reached enough times for all the logs to rotate through until we get to the empty ones. In effect, this means that the empty files can keep kicking around for potentially weeks or months.

I'll have a look at min_free_space, thanks for the tip! But, as far as I can see, it would have to be at least as large max_size to have any effect?

Is it really that hard to check whether a write failed and, if so, stat the file and unlink it if empty? It seems like a simple fix, and it would get rid of the empty files. We'd have a more robust system that way.

Andrey Semashev — Tue, 24 Feb 2015 09:05:59 GMT

Replying to Michi Henning <michi.henning@…>:

Andy, the point is that the threshold isn't reached while the file system is full. Therefore, while the "file system full" condition persists, a potentially unbounded number of inodes is used up, making a bad situation worse.

If you only set the max_size limit then yes, the files will keep piling up. The proper fix for that is to set min_free_space - I assume you don't want the empty files to appear in the first place, do you?

I'll have a look at min_free_space, thanks for the tip! But, as far as I can see, it would have to be at least as large max_size to have any effect?

No, these limits are not related.

Is it really that hard to check whether a write failed and, if so, stat the file and unlink it if empty? It seems like a simple fix, and it would get rid of the empty files. We'd have a more robust system that way.

The question is not about hard. It's about consistency. A special behavior should be backed by a good reason.

Tue, 24 Feb 2015 10:12:59 GMT

Replying to andysem:

If you only set the max_size limit then yes, the files will keep piling up. The proper fix for that is to set min_free_space - I assume you don't want the empty files to appear in the first place, do you?

Exactly :-)

No, these limits are not related.

OK, I'll tinker with that tomorrow, thanks!

Is it really that hard to check whether a write failed and, if so, stat the file and unlink it if empty? It seems like a simple fix, and it would get rid of the empty files. We'd have a more robust system that way.

The question is not about hard. It's about consistency. A special behavior should be backed by a good reason.

I strongly agree with that. I'm coming at this from the perspective of a first-time user of boost log. So there is this file rotation thing, I can specify file name patterns, limit individual log file sizes, directory size, and so. All good. Works really well, no problem. Then I do an (admittedly extreme) test, trying it with a full file system.

Now that you've told me about min_free_space, I go "OK, so if I set that, it'll to the right thing". But that come as a real surprise to me. It effectively means that, if the file system is full, and I haven't set min_free_space, I end up with lots of empty log files. In other words, how am I supposed to know that not setting min_free_space causes empty log files?

Prior to your patch, the code was leaking lots of inodes. Now it leaks many fewer. Thanks again for that fix! But, is there really a difference between leaking many inodes as opposed to one? I can't think of what the utility of an empty log file would be, seeing that it contains no information, and was created only in order to add information to the file in the first place. What's wrong with unlinking a file that cannot be written to if the file is empty? Alternatively, what's to be gained by leaving the file behind?

Wed, 25 Feb 2015 01:13:01 GMT

The min_free_space setting avoids the problem, thanks for that tip!

I still believe I shouldn't have to set this though. It's certainly non-obvious that leaving it unset can cause empty log files to accumulate, in particular since it is highly unlikely that people would ever notice during testing.