#11166 closed Bugs (fixed)
boost::filesystem::remove is racy
Reported by: | Owned by: | Beman Dawes | |
---|---|---|---|
Milestone: | To Be Determined | Component: | filesystem |
Version: | Boost 1.57.0 | Severity: | Problem |
Keywords: | Cc: |
Description
As the attached program demonstrates (typically in way under 1s here), if two threads or processes attempt to remove the same file, one of the removers can get an exception because 'no such file or directory', even though not existing at the time of the call is *not* an error condition for remove.
I have only verified this bug to exist in 1.49 and 1.53, but I glanced at github's boostorg/filesystem/blob/master/src/operations.cpp and it looks like the same problem is likely to exist--namely, that remove_file can fail because somebody else removed the file we were about to remove, after we verified that the file existed via query_file_type.
Attachments (2)
Change History (7)
by , 8 years ago
Attachment: | fsremoverace.cc added |
---|
comment:1 by , 8 years ago
It's also interesting to contemplate that the change might not be [directory/file/symlink] -> [nonexistent], but could also be [directory/file/symlink] -> [directory/file/symlink].
comment:2 by , 8 years ago
The proposed patch fixes the file -> nonexistent and directory -> nonexistent cases. The others are not as important for our use case, but for example (errno == EISDIR) would be an appropriate check after unlink() for the file -> directory case. Lightly tested via my test case on Linux and Windows.
comment:3 by , 8 years ago
This bug may be easier to hit than I first supposed. We have now received some information that this may be exacerbated by the "SMB2 Client Redirector cache" (Windows, Vista and later), which by default will cache positive existence of a file for 10 seconds, making a nice big window for trouble as long as two machines sharing the same Windows share are involved. A similar design choice on Unix with NFS (attribute caching) probably creates nice big windows there too.
comment:4 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
A change similar to the suggested patch has been applied. It covers the same set of error codes used elsewhere to identify the general "file not found" case.
Please note that this change mitigates but does not totally eliminate the chance of external file system races.
It does not protect against multi-threading data races. "Data races are bad", to quote Howard Hinnant and a bunch of other experts. For rationale, see https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
The test program provide has both a file system race and a data race, so it always fails for me (four core CPU, Windows 10, 64-bit system, VC++ 2015, Boost 1.59) because of the data race regardless of anything done to mitigate the file system race.
Thanks for the report, example program, and patch,
--Beman
comment:5 by , 7 years ago
I appreciate your comments about the safety of threaded code. I provided a threaded program for ease of reproduction, and may have not adequately observed threaded programming safeguards.
In our use case, the real how-to-reproduce involves multiple processes, often on different systems with a networked file system, concurrently creating and deleting files.
Since these activities are human driven, we can't enforce that two users don't both delete the same item from the filesystem at close enough to the same time that the cached existence in the SMB2 Client Redirector Cache doesn't trigger this issue.
demonstrate race in boost::filesystem::remove