Opened 6 years ago

#12456 new Bugs

mapped_file issues with huge file support

Reported by: Igor Minin <igorm6387@…> Owned by: Jonathan Turkanis
Milestone: To Be Determined Component: iostreams
Version: Boost 1.61.0 Severity: Optimization
Keywords: Cc:

Description

I worked on an application that handles huge files (tens of Gb). Using memory mapped file is a common way to deal with such amounts of data. I'd like to use boost::iostreams::mapped_file to read and write those files. Sadly enough I faced a number of issues that makes it nearly impossible.

I describe all the problems in one ticket instead of several tickets because all the issues are tightly connected to each other and fixing one of those requires fixing another.

First, let us take a look at the mapped_file::open declaration:

    template<typename Path>
    void open( const Path& path,
               BOOST_IOS::openmode mode =
                   BOOST_IOS::in | BOOST_IOS::out,
               size_type length = max_length,
               stream_offset offset = 0 );

It has length parameter that says how many bytes of file we wish to map into the memory. By default it try to map the whole file, but in general this parameter should be much lesser than the file size. Consider working with file of 100 Gb. Mapping the whole file is too expencive and in many cases simply impossible (consider x86 OS, for example).

This length parameter is stored in the size_ member.

 size_ =
                static_cast<std::size_t>(
                    p.length != max_length ?
                        std::min<boost::intmax_t>(p.length, size) :
                        size
                );

std::size_t size() const { return size_; }

That leads us to the following problem:

  1. mapped_file::size() returns us NOT the file size. In general case it returns memory view size. If I need to know the file size

I must do additional queries outside of mapped_file code. It's a painful work because I need to reimplement a lot of mapped_file::open code. Mapped_file::open already knows this size, but doesn't expose it outside. I guess mapped_file should have two methods: size() and file_size() or view_size() and size() to separately get the whole file size and the size of mapped region.

  1. mapped_file::resize ignores length parameter. Consider:
void mapped_file_impl::resize(stream_offset new_size)
{
...
    size_ = new_size;
	param_type p(params_);
    map_file(p);  // May modify p.hint
...
}

Compare to the code from open. No min(length, size), it just uses the new_size as view size. Again, in case of a huge file it is inappropriate and sometimes impossible.

  1. mapped_file doesn't allow remapping file without closing it and open with new offset and length. That approach kills performance in case of application that needs intensively read huge file piece by piece. I guess mapped_file should have method remap that accepts new offset and do job similar to resize, but without resizing file, just remapping.

Change History (0)

Note: See TracTickets for help on using tickets.