wiki:BestPracticeHandbook

Version 39 (modified by Niall Douglas, 7 years ago) ( diff )

--

Links To Examples Of Best Practice For C++ 11/14 Libraries

originally written by Niall Douglas May 2015

As part of preparing my C++ Now 2015 presentation "A review of C++ 11/14 only Boost libraries" I examined ten C++ 11/14 mandatory libraries heading towards Boost which in May 2015 were:

Name Authors Min C++ Boost headers required Entered Boost peer review queue Description Repository
Boost.Fiber Oliver Kowalke 14 Context, Config Conditionally accepted a framework for micro-/userland-threads (fibers) scheduled cooperatively https://github.com/olk/boost-fiber
Boost.AFIO Niall Douglas, Paul Kirth 11 none 2013-10 strongly ordered portable asynchronous filesystem and file i/o extending ASIO https://github.com/BoostGSoC13/boost.afio
Boost.DI Krzysztof Jusiak 14 none 2015-01 provides compile time, macro free constructor dependency injection https://github.com/krzysztof-jusiak/di
Boost.Hana Louis Dionne 14 none 2015-04 functional programming in C++ based around transformations of heterogeneous collections of types. https://github.com/ldionne/hana
Boost.Http Vinícius dos Santos Oliveira 11 asio, filesystem, system, datetime, utility 2015-04 create applications that need to expose services through HTTP https://github.com/BoostGSoC14/boost.http
Boost.APIBind Niall Douglas 11 none not submitted yet toolkit for modularizing Boost libraries, managing versioned API and ABI dependency binds, makes it easy for Boost libraries to switch between use of the C++ 11 STL and the Boost STL https://github.com/ned14/Boost.BindLib
Boost.Expected Pierre Talbot, Vicente J. Botet Escriba 11 none not submitted yet category and functional programming in C++ based around the monadic expected<T, E> https://github.com/ptal/expected
Boost.Tick Paul Fultz II 11 none not submitted yet trait introspection and concept creator for C++11 https://github.com/pfultz2/Tick
Boost.Fit Paul Fultz II 11 none not submitted yet provides utilities for functions and function objects https://github.com/pfultz2/Fit
Boost.Sqlpp11 Roland Bock 11 none not submitted yet A type safe embedded domain specific language for SQL queries and results in C++ https://github.com/rbock/sqlpp11

There was very significant divergence in best practices between these libraries at the time of examination, and moreover the divergences were uneven possibly reflecting an uneven understanding of what best practice might be in C++ 11/14. Thence from my examination I have prepared the list below of what I felt were best practices in C++ 11/14 mandatory libraries with hyperlinks to the relevant source code files in the relevant libraries above. As what a "best practice" is or is not may vary according to opinion, I have attempted to provide a rationale for each of the suggestions below. It is up to library authors to weigh each suggestion and to decide whether to apply it, or not, to their library.

One of the most strikingly consistent features of these new libraries is the lack of dependence on Boost, even to the extent of not using the boost namespace as well as no Boost at all. The big advantage of this is modularity, so almost all of these libraries can be used standalone which is an oft requested feature by Boost end users. However this form of standalone modularity is more a case of ivory tower syndrome than good design, and is therefore likely not sustainable as more C++ 11/14 libraries become useful to other C++ 11/14 libraries and coupling therefore increases. I therefore dedicate significant effort below into how to most flexibly couple your library to other libraries to leave options open, the techniques for which have very significantly diversified in C++ 11.

One will note that the list below is much more C++ 11/14 focused than Boost focused. This is because it is derived from the first crop of C++ 11/14 mandatory Boost libraries. This is not a handbook for writing Boost libraries or even C++ 11/14 Boost libraries, if you want that first start reading here (note some of the guidelines on that page don't really apply to C++ 11/14 libraries) and then read here and here.

I have tried to keep these points generic to all C++ 11/14 libraries in the hope that they are useful far outside Boost. I have also ordered them with what I consider the most important ("strongly consider") first and not as important ("consider") later.

The final section is going to be the most controversial (probably as usual to the extent of receiving hate mail), which is why it gets a separate standalone section. It contains a set of essays on non-technical best practices, so specifically speaking they are essays discussing best practices at a C++ organisational, cultural, business, operational and management level rather than simply addressing purely C++ software technical practice. In case you wonder what authority I might have to discuss non-technical best practice, be aware that I am equally trained (in terms of degrees) in Economics and Management as I am in Software, plus I am an affiliate researcher with the Waterloo Institute for Complexity and Innovation Research Institute so this non-technical analysis considers software, and the human systems surrounding it, as the one and same complex system. Nevertheless, once you start criticising non-technical best practice you will usually be instantly shouted down by any technical community as being off-topic, irrelevant, trolling or some other ad hominem based dismissal of even considering the non-technical argument. Hence I have marked that final section with "SOAPBOX", and all opinions and claims therein are those of me personally.

  1. Links To Examples Of Best Practice For C++ 11/14 Libraries
    1. 1. CONVENIENCE: Strongly consider using git and [https://github.com/ …
    2. 2. COUPLING: Strongly consider versioning your library's namespace …
    3. 3. PORTABILITY: Strongly consider trying your library on Microsoft …
    4. 4. QUALITY: Strongly consider using free CI per-commit testing, even …
    5. 5. QUALITY: Strongly consider per-commit compiling your code with …
    6. 6. QUALITY/SAFETY: Strongly consider running a per-commit pass of your …
    7. 7. SAFETY: Strongly consider a nightly or weekly input fuzz automated …
    8. 8. DESIGN: (Strongly) consider using constexpr semantic wrapper …
    9. 9. MAINTENANCE: Consider making it possible to use an XML outputting …
    10. 10. DESIGN/QUALITY: Consider breaking up your testing into per-commit …
    11. 11. PORTABILITY: Consider not doing compiler feature detection yourself
    12. 12. CONVENIENCE: Consider having Travis send your unit test code …
    13. 13. CONVENIENCE: Consider creating a status dashboard for your library …
    14. 14. DESIGN: Consider making (more) use of ADL C++ namespace composure …
    15. 15. BUILD: Consider defaulting to header only, but actively manage …
    16. 16. COUPLING: Consider allowing your library users to dependency …
    17. 17. FUTURE PROOFING: Consider being C++ resumable function ready
    18. 18. COUPLING/SOAPBOX: Essays on non-technical best practices within …
      1. Modular vs Monolithic
      2. C++ in the 21st century
      3. A brief and cynical history of Boost and its relationship to C++
      4. What does this history have to do with defaulting to standalone …
      5. 18. COUPLING/SOAPBOX: Essay about wisdom of defaulting to standalone …
      6. 19. COUPLING/SOAPBOX: Essay about wisdom of dependency package …

1. CONVENIENCE: Strongly consider using git and GitHub to host a copy of your library and its documentation

There are many source code control systems, with subversion and CVS being the two most recently popular of yesteryear. Probably the current most popular source code control system is Git, and despite its (Mingw i.e. non-native) port on Microsoft Windows being a bit flaky, it is very useful once mastered.

There is less widespread consensus about where to host your git repositories, with the most popular by far being github which is a proprietary service run by a profit making company. Nevertheless, one often hears strong arguments in favour of gitlab, bitbucket and many other alternatives.

All the Boost libraries are on github, as are all the libraries I reviewed. The huge advantage of github over all others is that the free tooling exampled below integrates easily with github. Choosing github therefore makes your life much easier. Note that as git is a distributed source code control system, you can keep a canonical master copy anywhere and write a script which autorefreshes the github copy, thus triggering any of the free tooling you have integrated there. In other words, don't necessarily place all your eggs in the github basket, and consider making github simply a medium for conveniently triggering the free tooling.

Github also provides free website hosting for HTML. Have a script automatically generate documentation and commit it to the gh-pages branch in your normal repo. This should present a copy of your HTML at http:// username.github.io/repository.

This is the script which generates the documentation for proposed Boost.AFIO, and indeed you can see the exact output generated by this script at http://boostgsoc13.github.io/boost.afio/. You may find it useful.

cd boost-local
rm -rf libs/afio/doc/doxy/doxygen_output/html
mkdir -p libs/afio/doc/doxy/doxygen_output/html
cd doc
../b2 -a ../libs/afio/doc
cd ../..
if [ ! -e publish ]; then
git clone -b gh-pages git@github.com:BoostGSoC13/boost.afio.git publish
fi
cd publish
git reset --hard b1414e11be50ff81124e2e1583f1bbb734ad9ead
cd ..
rm -rf publish/*
mkdir -p publish/doc/html
cp -a boost-local/doc/html/afio* publish/doc/html/
cp -a doc/src publish/doc/
cp -a doc/src/images/boost.png publish/
cp -af boost-local/doc/src publish/doc/
mkdir -p publish/libs/afio/doc
cp -a doc/* publish/libs/afio/doc/
cd boost-local/libs/afio/doc/doxy
doxygen
cd ../../../../../publish
cp -a ../boost-local/libs/afio/doc/doxy/doxygen_output/html .
cp -a ../Readme.md .
cp -a ../Readme.md Readme.html
echo '<html><head><title>Boost.AFIO documentation</title><meta http-equiv="refresh" content="300"/><body>
<h1>Boost.AFIO documentation</h1>
<p><a href="doc/html/afio.html">BoostBook format documentation</a></p>
<p><a href="html/index.html">Doxygen format documentation</a></p>
<p><a href="afio-stable.tar.bz2">Ready to go stable AFIO distribution with all git submodules (from master branch)</a></p>
<p></p>' > index.html
cat Readme.md | tail -n +4 >> index.html
echo '
</body></html>' >> index.html
git add .
git commit -a -m 'SHA stamp by Jenkins as building correctly' || true
git push -f

Some may wonder what the hard git reset to a SHA is for. This prevents the gh-pages branch continuously growing in storage by breaking history for the branch, and therefore making git clone times grow excessively. As the branch is purely for HTML publishing, breaking history like this is safe.

An example of using Travis to build the documentation is at https://github.com/krzysztof-jusiak/di/blob/cpp14/.travis.yml, specifically the line:

  - if [ "${TRAVIS_BRANCH}" == "cpp14" ] && [ "${DOCUMENTATION}" != "" ]; then (travis_wait travis_retry wget --quiet http://sourceforge.net/projects/boost/files/boost/1.58.0/${BOOST}.tar.gz && tar zxf ${BOOST}.tar.gz && mkdir ${BOOST}/libs/di && cp -r example build doc ${BOOST}/libs/di && sudo apt-get install xsltproc && cd ${BOOST}/libs/di/doc && ../build/b2 -sBOOST_ROOT=../build && git clone https://github.com/krzysztof-jusiak/di.git && cd di && git checkout -b gh-pages --track origin/gh-pages && git reset --hard && rm -rf boost/libs/di/doc/html cpp14/boost/libs/di/doc/html && cp -r ../html boost/libs/di/doc && cp -r ../html cpp14/boost/libs/di/doc && git add -A . && git commit -am "doc regeneration" && git push --force --quiet "https://${GH_TOKEN}@github.com/krzysztof-jusiak/di"); fi

Other examples of libraries which use github for their documentation:

2. COUPLING: Strongly consider versioning your library's namespace using inline namespaces and requesting users to alias a versioned namespace instead of using it directly

C++ 11 adds a new feature called inline namespaces which are far more powerful than they first appear:

namespace boost { namespace afio { inline namespace v1 { /* stuff */ } } }
// Stuff is generated at the ABI link layer in boost::afio::v1
// But to the compiler everything boost::afio::v1::* appears identically in boost::afio::*
// INCLUDING for ADL and overload resolution
// In other words you can declare your code in boost::afio::v1, and use it as if declared in boost::afio.

// The other important C++ feature here is namespace aliasing, so
namespace local { namespace afio = boost::afio; /* use afio::* and it all works */ }

The reason this pattern is so useful is because it greatly eases the lives of your end users and you the library maintainer in years to come when you need to break API compatibility. Let's take a case example: imagine the situation typical in 03 C++ libraries where library Boost.Foo uses dependent library Boost.AFIO:

namespace boost { namespace afio {
  struct foo {
    static void api(int i, double f);
  };
} }
...
namespace boost { namespace foo {
  boost::afio::api(1, 2);
} }

Imagine that you now release an API breaking refactor of Boost.AFIO, which would look like this:

namespace boost { namespace afio {
  struct foo {
    static void api(double f, int i);  // Oh dear, f and i have been swapped!
  };
} }
...
namespace boost { namespace foo {
  boost::afio::api(1, 2);  // This is probably now a bug!
} }

The users of Boost.Foo which uses boost::afio::foo::api() now finds that their library no longer passes its unit testing because foo::api() has changed from before. They will quite rightly throw up a fuss, and under Boost's present rules you will be asked to roll back your refactor until Boost.Foo has also been refactored to match your refactor. This causes inconvenience for you the maintainer of Boost.AFIO, the maintainer of Boost.Foo, and is a general pain for users. It also breaks modularity, increases coupling between libraries in a way which saddles you the maintainer of Boost.AFIO with the lack of maintenance or failure of timely maintenance of libraries dependent on AFIO, and I can't strongly enough recommend you don't blindly copy the 03 idiom of suggesting client code use your library directly using fully qualified namespacing.

The good news is we can make all this go away with inline namespaces and namespace aliasing, so consider this pattern instead:

namespace boost { namespace afio { inline namespace v1 {
  struct foo {
    static void api(int i, double f);
  };
} } }
...
namespace boost { namespace foo {
  // Probably somewhere in this library's config.hpp
  namespace afio = boost::afio;  // This is the key use change which needs to be strongly recommended to your library's users
  ...
  // In implementation code after config.hpp
  afio::api(1, 2);  // Note the no longer fully qualified use of afio. The local namespace alias is used to "symlink" to "the latest" version of Boost.AFIO
} }

Now imagine your refactor occurs as before:

namespace boost { namespace afio {
  // Probably defined by boost/afio.hpp which in turn includes boost/afio_v2.hpp
  inline namespace v2 {
    struct foo {
      static void api(double f, int i);  // new implementation
    };
  }
  // Probably defined by boost/afio_v1.hpp
  namespace v1 {
    struct foo {
      static void api(int i, double f);  // old implementation
    };
  }
} }
...
namespace boost { namespace foo {
  // Probably somewhere in this library's config.hpp
  namespace afio = boost::afio::v1;  // By changing this one single line we "fix" the problem. Earlier we included <boost/afio_v1.hpp> instead of <boost/afio.hpp>.
  ...
  // In implementation code after config.hpp
  afio::api(1, 2);  // And this finds the v1 AFIO implementation, not the v2 implementation
} }

What have we just achieved?

  1. Library Boost.Foo dependent on Boost.AFIO no longer requires lots of refactoring work if Boost.AFIO is refactored. Just two lines changed in its config.hpp, something easy for the release managers to do.
  2. Library Boost.AFIO can now be evolved far quicker than before, and simply keep shipping entire copies of legacy versions without problems with colliding namespaces. As end users get round to upgrading, legacy versions can be removed from the distro after a period of warning.

What are the problems with this technique?

  1. You now need to ship multiple copies of your library, maintain multiple copies of your library, and make sure simultaneous use of multiple library versions in the same executable doesn't conflict. I suspect this cost is worth it for the added flexibility to evolve breaking changes for most library maintainers. You probably want to employ a per-commit run of http://ispras.linuxbase.org/index.php/ABI_compliance_checker to make sure you don't accidentally break the API (or ABI where appropriate) of a specific API version of your library, so in your custom build run you might check out an original SHA for your library separate to your latest commit, build both and use the ABI compliance checker tool to determine if anything has broken. Similarly, the same toolset (ABIDump) could be used to detect where ABIs collide by having some shell script error out if any ABI overlaps between two libraries, perhaps using the diff tool.

Also don't forget that git lets you recursively submodule yourself but pinned to a different branch by adding the `submodule.name.branch` stanza to .gitmodules, so if you do ship multiple versions you can mount specific version tracking branches of yourself within yourself such that a recursive submodule update checks out all the versions of yourself into a given checkout.

  1. The above technique alone is insufficient for header only end users where multiple versions of your library must coexist within the same translation unit. This is because almost every library header will have include guards, and these will prevent end users including alternative versions of your library within the same translation unit, even though those do not conflict at a C++ level due to having different namespaces. There is an additional problem in that most library headers define macros which will collide when you include multiple versions and may have (breaking) different values which induce misoperation. To this you can take a manual approach, and make sure that the header file for each version of your library has its own header guards and all macros are undefined on exit from a header. Or you can make use of another recommendation below which uses C preprocessor metaprogramming to automate header guard management for you.
  2. Many end users are not used to locally aliasing a library namespace in order to use it, and may continue to directly qualify it using the 03 idiom. You may consider defaulting to not using an inline namespace for the version to make sure users don't end up doing this in ways which hurt themselves, but that approach has both pros and cons.

Some fun extra things this technique enables:

  1. Something not so obvious above is that you can also stub out fake copies of dependencies where that dependency is missing in the current config. For example, imagine optional compression support where your config.hpp either namespace aliases to boost::foo::compression either a real compression library, or an internal stub copy which actually does nothing. Your code is then written to assume a compression library aliased at boost::foo::compression and need not consider if it's actually there or not. The advantages here for reducing coupling are very obvious. This is covered in a separate recommendation below.
  2. This technique is highly extensible to allow dependency injection of STL11 vs Boost on a per-feature basis e.g. your user wants Boost.Thread instead of STL11 thread but only for threading, so your library can be so modular as to allow both options to end users. This is covered in a separate recommendation below.

Examples of libraries which use versioned namespaces and aliasing to bind a namespace locally:

3. PORTABILITY: Strongly consider trying your library on Microsoft Visual Studio 2015

More than half the libraries reviewed had no support for Microsoft Visual Studio, and only supported GCC and clang. When the authors were asked why, in many cases it was simply assumed that MSVC didn't implement much C++ 11/14 and the authors hadn't attempted MSVC support.

This is in fact untrue. Here is a complete list of C++ 11/14 features which VS2015 does NOT support (everything else you can assume it supports, including a full and complete C++ 14 STL):

  • Expression SFINAE (there are workarounds. Note the STL "turns on" magic Expression SFINAE support for those parts of the STL requiring it, so any Expression SFINAE done by the STL for you works as expected).
  • Any serious constexpr use (try "#define constexpr" i.e. disable it completely. Most of the time your code will compile and work. Consider using a BOOST_LIBNAME_CONSTEXPR macro thereafter). It is claimed by Microsoft that full C++ 11 constexpr conformance is present, but to be honest in my own code I don't find anything less than C++ 14 constexpr useful in practice.
  • No two phase lookup. Reordering decls to make the compiler look up in a way which doesn't produce non-conforming outcomes is a straightforward, if annoying, workaround.
  • MSVC's C99 support is still less than complete, but it's significantly more complete than before.
  • MSVC's preprocessor is still non-conforming, but it's less broken than it has ever been.
  • Variable templates.
  • Non-static data member initialisers for aggregates.

VS2015 is a very highly conforming C++ 11/14 compiler. It meets or exceeds clang 3.3 on every C++ 11/14 feature, so if your library can be compiled by clang 3.3 then it is highly likely it should compile, without too much work, on VS2015 assuming the missing items above are not showstoppers. VS2015 even has some support for C++ 1z (C++ 17) matching about what clang 3.5 provides minus C++ 14 relaxed constexpr and C++ 14 variable templates. See http://blogs.msdn.com/b/vcblog/archive/2015/04/29/c-11-14-17-features-in-vs-2015-rc.aspx.

I am not claiming that you won't get surprises when you try getting MSVC to compile your code which you thought was standards compliant. MSVC is not an AST based compiler and uses heuristics to trigger partial local AST compilation, and therefore has a unique processing model which exposes your assumptions about what you think or thought is valid C++. This is exactly why it is worthwhile getting your C++ 11/14 library working on MSVC because you will get a better quality, more standards conforming C++ library out of it.

A good example of just how much C++ 14 support VS2015 provides is in Boost.DI. When I first contacted the author about the lack of VS2015 support, he proceeded to port his entirely C++ 14 codebase to VS2015 successfully, though he had to do a bit of refactoring to make the port work. Interestingly because he didn't push constexpr use past C++ 11 capacity in DI, VS2015's constexpr support was enough for DI to work as expected on VS2015 for the most part.

4. QUALITY: Strongly consider using free CI per-commit testing, even if you have a private CI

Despite that Travis provides free of cost per-commit continuous integration testing for Linux and OS X, and that Appveyor provides the same for Microsoft Windows, there were still libraries in those reviewed which made use of neither and furthermore had no per-commit CI testing whatsoever.

I'll be blunt: not having per-commit CI testing is unacceptable in this modern day and age and is an excellent sign of a library author not committed to software quality. Especially when such CI services are free of cost, and it's purely your laziness that you haven't configured them yet.

So first things first, if your C++ 11/14 library is not using any form of per-commit testing yet, go add Travis and Appveyor right support now. Configuration is extremely easy if your project lives on Github, simply login into both services using your github account and enable your project. Next add a suitable .travis.yml and an .appveyor.yml file to the root of your project, and push the commit. Watch Travis and Appveyor build and run your CI suitable unit test suite, and report back on Github (especially if it's a pull request) whether the build and tests passed or failed. From now on when someone issues a pull request fixing a bug, you'll instantly know if that pull request compiles and passes all unit tests on Linux, OS X and Windows, and much more importantly, so will the pull request submitter and they usually will go fix problems themselves so you the maintainer never need find out a pull request is defective on some build target.

Example travis.yml's to clone from:

Example appveyor.yml's to clone from:

Both Travis and Appveyor are excellent for getting an immediate 90% confidence signal that some commit did not break something. For a free service with little configuration effort that's fantastic. However if you want a 99% confidence signal you will need to spend a few months of your life configuring your own Jenkins CI installation probably best placed on its own dedicated server given the RAM you'll need (I suggest a cheap OVH dedicated server with at least 16Gb of RAM for about €15/month or US$20/month), most of that time will be learning how not to configure Jenkins as Jenkins is a black, black art indeed - but again great for being free of cost given the return on investment. Once mastered, Jenkins can do almost anything from per-commit testing to soak testing to input fuzz testing to automating a long list of tasks for you (e.g. diffing and synchronising two forks of a repo for you by bisecting commit histories against unit testing), but it will take many dedicated months to acquire the skills to configure a maintainable and stable Jenkins install.

Should you add Travis and Appveyor CI support if you already are using your own private Jenkins CI?

I think the answer is uncategorically yes. The reasons are these:

  1. Having Travis + Appveyor badges (see https://raw.githubusercontent.com/krzysztof-jusiak/di/cpp14/README.md for example Markdown for badges) on your open source project is a universally recognised signal of attention to quality.
  2. Other free tooling such as Coveralls.io have built in integration for github and travis. Hooking Jenkins into Coveralls isn't hard, but it "just works" with Travis instead and that's a similar pattern with most free tooling which consumes CI results.
  3. Future tooling by Boost which dashboards Boost libraries and/or ranks libraries by a quality score will almost certainly automate on Travis and Appveyor being queryable by their RESTful APIs. In other words, placing your library in Github and adding Travis and Appveyor CI support has the highest chance of working immediately with any future Boost tooling with minimal additional effort by you.

5. QUALITY: Strongly consider per-commit compiling your code with static analysis tools

In Travis and Appveyor it is easy to configure a special build job which uses the clang static analyser and clang lint tools on Linux/OS X and the MSVC static analyser on Windows. These perform lengthy additional static AST analysis tests to detect when your code is doing something stupid and the use of these is an excellent sign that the developer cares about code quality. Static analysis is perfectly suited to be run by a CI as it takes extra time to compile your program, so a CI can trundle off and do the lengthy work itself while you get on with other work.

Enabling Microsoft's static analyser is easy, simply add /analyze to the compiler command line. Your compile will take ten times longer and new warnings will appear. Note though that the MSVC static analyser is quite prone to false positives like miscounting array entries consumed. You can suppress those using the standard #pragma warning(disable: XXX) system around the offending code.

Enabling clang's static analyser is slightly harder. You'll need to replace the normal call of the compiler with whatever tool is set into the CXX environment variable by the scan-build tool. See http://clang-analyzer.llvm.org/scan-build.html. For Boost projects, I found this script to work well:

MYPWD=`pwd`
REPORTS="$MYPWD/clangScanBuildReports" 
rm -rf "$REPORTS"
git submodule update --init --recursive
cd boost-local
/usr/share/clang/scan-build-3.4/scan-build --use-analyzer=/usr/bin/clang-3.4 -o "$REPORTS" ./b2 toolset=gcc-4.7 libs/afio/test -a --test=test_all.cpp --link-test

Note that my b2 has a $HOME/user-config.jam which resets the compiler used to the value of $CXX from the environment:

import os ;
using gcc : : [ os.environ CXX ] ;

scan-build will generate a HTML report of the issues found with a pretty graphical display of the logic followed by the analyser into the $REPORTS directory. Jenkins has a plugin which can publish this HTML report for you per build, for other CIs you'll need to copy the generated files onto a website somewhere e.g. committing them to your repo under gh-pages and pushing them back to github.

Finally, the clang lint tool (called clang-tidy) is not something I have currently enabled in my own code, but it looks very promising. The following checks are default enabled on clang tidy 3.6:

Enabled checks:
    clang-analyzer-core.CallAndMessage
    clang-analyzer-core.DivideZero
    clang-analyzer-core.DynamicTypePropagation
    clang-analyzer-core.NonNullParamChecker
    clang-analyzer-core.NullDereference
    clang-analyzer-core.StackAddressEscape
    clang-analyzer-core.UndefinedBinaryOperatorResult
    clang-analyzer-core.VLASize
    clang-analyzer-core.builtin.BuiltinFunctions
    clang-analyzer-core.builtin.NoReturnFunctions
    clang-analyzer-core.uninitialized.ArraySubscript
    clang-analyzer-core.uninitialized.Assign
    clang-analyzer-core.uninitialized.Branch
    clang-analyzer-core.uninitialized.CapturedBlockVariable
    clang-analyzer-core.uninitialized.UndefReturn
    clang-analyzer-cplusplus.NewDelete
    clang-analyzer-cplusplus.NewDeleteLeaks
    clang-analyzer-deadcode.DeadStores
    clang-analyzer-llvm.Conventions
    clang-analyzer-security.FloatLoopCounter
    clang-analyzer-security.insecureAPI.UncheckedReturn
    clang-analyzer-security.insecureAPI.getpw
    clang-analyzer-security.insecureAPI.gets
    clang-analyzer-security.insecureAPI.mkstemp
    clang-analyzer-security.insecureAPI.mktemp
    clang-analyzer-security.insecureAPI.rand
    clang-analyzer-security.insecureAPI.strcpy
    clang-analyzer-security.insecureAPI.vfork
    clang-analyzer-unix.API
    clang-analyzer-unix.Malloc
    clang-analyzer-unix.MallocSizeof
    clang-analyzer-unix.MismatchedDeallocator
    clang-analyzer-unix.cstring.BadSizeArg
    clang-analyzer-unix.cstring.NullArg

I also see some default disabled checks which might be interesting:

    clang-analyzer-alpha.core.BoolAssignment
    clang-analyzer-alpha.core.CallAndMessageUnInitRefArg
    clang-analyzer-alpha.core.CastSize
    clang-analyzer-alpha.core.CastToStruct
    clang-analyzer-alpha.core.FixedAddr
    clang-analyzer-alpha.core.IdenticalExpr
    clang-analyzer-alpha.core.PointerArithm
    clang-analyzer-alpha.core.PointerSub
    clang-analyzer-alpha.core.SizeofPtr
    clang-analyzer-alpha.core.TestAfterDivZero
    clang-analyzer-alpha.cplusplus.VirtualCall
    clang-analyzer-alpha.deadcode.UnreachableCode
    misc-argument-comment
    misc-bool-pointer-implicit-conversion
    misc-swapped-arguments
    misc-undelegated-constructor
    misc-uniqueptr-reset-release
    misc-unused-raii
    misc-use-override
    readability-braces-around-statements
    readability-function-size
    readability-redundant-smartptr-get

6. QUALITY/SAFETY: Strongly consider running a per-commit pass of your unit tests under both valgrind and the runtime sanitisers

In Travis it is highly worth adding a special build job which runs your unit tests under:

valgrind memcheck (Linux only)
This detects illegal reads and writes, use of uninit values, use of unaddressable memory, illegal/double frees, and memory leaks. This tool is highly recommended, with its only downsides being a severe performance penalty (one can detect if running in valgrind inside your tests and treble timeouts. Look into the RUNNING_ON_VALGRIND macro in valgrind.h which by the way compiles just fine on MSVC too. You can also markup your code with valgrind instrumentation (also compatible with MSVC) and simply leave the instrumentation permanently in your binaries) and the fact it can't test Windows code.

Some will argue that their library is a pure constexpr metaprogramming library and does no memory allocation, and therefore running valgrind makes no sense for their library. Ah, but remember that valgrind isn't just testing your code, it is testing the code produced by the compiler. If you are doing cutting edge C++ 14 programming you may trigger code generation bugs in compilers past or future, or bugs in the STL caused by how your code uses it. A valgrind pass on your unit tests will catch bad code generation bugs, and potentially one day save you hours maybe days of frustrating debugging of weird segfaults!

Running your unit tests under valgrind is easy, simply prepend valgrind when calling your (preferably optimised though with debug info) test executable. You may find special compilation options will greatly improve the usefulness of error output, try -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-inline, though note disabling inlining may hide your bug.

Undefined behaviour sanitiser (GCC and clang only)
Turned on using -fsanitize=undefined, this detects when your code does undefined behaviour, and is sufficiently lightweight you should consider shipping release binaries with this permanently turned on along with stack smashing detection if using GCC 4.9 or later (-fstack-protector-strong). I personally have the ubsan always on for all builds of any code of mine capable of accepting untrusted input. At the time of writing, turning on the ubsan will prevent these things happening: use of misaligned pointer or reference, load of bool not 0 nor 1, out of bounds array indexing, bad casting, bad derived cast, bad cast of void* to type, bad or wrong vptr use, use of impossible enum value, divide by zero, bad function pointer call, use of null ptr, use of bytes not in object, exiting a value returning function without a return value, returning null from a function not allowed to return null, illegal shifts, signed integer overflow, reaching unreachable code, negative variable length array use.

As you can see, these tests make buffer overflow ROP chain exploits very hard, and therefore your code much, much harder to exploit from a security perspective. I think any library author whose library can accept untrusted input who doesn't always turn ubsan on is being irresponsible.

Thread sanitiser (GCC and clang only)
If your library is capable of threaded use or your unit testing creates threads, you definitely should soak execute your unit tests with the thread sanitiser (-fsanitize=thread) for a few hours per week which provides a good quality check of the correct use of the C11/C++11 atomic memory model e.g. are all your atomic acquires matched with atomic releases in the right order? Did you read a memory location which was written concurrently without an acquire-release serialisation lock? Sadly the tool can't detect use of memory fences which substantially reduces your flexibility when writing with atomics, so do bear that in mind.

Some may note I didn't recommend the address sanitiser (GCC and clang only). This is because you need to recompile your STL and libc and all libraries with the address sanitiser to achieve perfect coverage, plus valgrind detects far more problems and valgrind detects bad code generated by the compiler and memory corruption by third party libraries. The only real additional feature of the address sanitiser over valgrind is that it can detect memory corruption within the stack, which valgrind never can. I personally have not found stack corruption much of a problem as programs inevitably crash when it happens. However if valgrind is just far too slow for your testing then employing the address sanitiser can be a useful substitute for valgrind for certain tests only. Note that the address sanitiser is perfect for untrusted input fuzz testing as it is much faster than valgrind, so I recommend the address sanitiser in the next section.

7. SAFETY: Strongly consider a nightly or weekly input fuzz automated test if your library is able to accept untrusted input

If your library can consume any form of serialisation or parameters supplied from a network or file or query, including any regular expressions or any type of string even if you don't process it yourself and hand it off to another library, then you need to be doing input fuzz testing for a few hours weekly. Even with ubsan enabled in release builds (see previous section) and therefore use of untrusted input to subvert your security is harder, one can use missing code path verification logic to cause programs to delete or replace user data or write into secure data without introducing undefined behaviour.

The classic tool for fuzz testing data inputs is American Fuzzy Lop (afl). This is a mature, very well understood tool. You should use it in combination with the runtime sanitisers described above, so ideally with valgrind + ubsan, but if valgrind is too slow then with the address sanitiser + ubsan. You may also wish to consider additionally fuzz testing the parameters of every API in your library, see below for tooling to help with that.

One of the most promising new input fuzz testing tools going into the long term is LLVM's fuzz testing facilities which are summarised at http://llvm.org/docs/LibFuzzer.html as they make use of the clang sanitiser coverage recording facility to additionally find the code paths least covered, plus the tool is very fast compared to afl.

8. DESIGN: (Strongly) consider using constexpr semantic wrapper transport types to return states from functions

Thanks to constexpr and rvalue refs, C++ 11 codebases have much superior ways of returning states from functions. Let us imagine this C++ 11 function:

// handle_type is some class which takes ownership of a valid file descriptor, closing it on type destruction.

std::shared_ptr<handle_type> openfile(std::filesystem::path path)
{
  int fd;
  while(-1==(fd=::open(path.c_str(), O_RDWR|O_EXCL)) && EINTR==errno);
  if(-1==fd)
  {
    int code=errno;
    std::error_code ec(code, generic_category());
    std::string errstr(strerror(code));
    throw std::system_error(ec, std::move(errstr));
  }
  return std::make_shared<handle_type>(fd);
}

This is a highly simplified example, but an extremely common pattern in one form or another: when C++ code calls something not C++ and it returns an error, convert it into an exception and throw it. Else construct and return a RAII holding smart pointer to manage the resource just acquired.

The really nice thing about this highly simple design is that its API nicely matches its semantic meaning: if it succeeds you always get a shared_ptr. If it fails you always get an exception throw. Easy.

Unfortunately, throwing exceptions has unbounded time guarantees due to RTTI lookups, so for any code which worries about complexity guarantees the above is unacceptable: throwing exceptions should be exceptional as the purists would put it. So traditionally speaking the 03 pattern is to provide an additional overload capable of writing into an error_code, this being the pattern traditionally used by ASIO and most Boost libraries. That way if the error_code taking overload is chosen, you get an error code instead of exception but code is still free to use the always throwing overload above:

std::shared_ptr<handle_type> openfile(std::filesystem::path path, std::error_code &ec)
{
  int fd;
  while(-1==(fd=::open(path.c_str(), O_RDWR|O_EXCL)) && EINTR==errno);
  if(-1==fd)
  {
    int code=errno;
    ec=std::error_code(code, generic_category());
    return std::shared_ptr<handle_type>();  // Return a null pointer on error
  }
  return std::make_shared<handle_type>(fd);  // This function can't be noexcept as it can throw bad_alloc
}

This pushes the problem of checking for error conditions and interpreting error codes onto the caller, which is okay if a little potentially buggy if the caller doesn't catch all the outcomes. Note that code calling this function must still be exception safe in case bad_alloc is thrown. One thing which is lost however is semantic meaning of the result, so above we are overloading a null shared_ptr to indicate when the function failed which requires the caller to know that fact instead of instantly being able to tell from the API return type. Let's improve on that with a std::optional<T>:

namespace std { using std::experimental; }
std::optional<std::shared_ptr<handle_type>> openfile(std::filesystem::path path, std::error_code &ec)
{
  int fd;
  while(-1==(fd=::open(path.c_str(), O_RDWR|O_EXCL)) && EINTR==errno);
  if(-1==fd)
  {
    int code=errno;
    ec=std::error_code(code, generic_category());
    return std::nullopt;
  }
  return std::make_optional(std::make_shared<handle_type>(fd));
}

So far, so good, though note we can still throw exceptions and all of the above worked just fine in C++ 03 as Boost provided an optional<T> implementation for 03. However the above is actually semantically suboptimal now we have C++ 11, because C++ 11 lets us encapsulate far more semantic meaning which is cost free at runtime using a monadic transport like Boost.Expected:

namespace std { using std::experimental; }
std::expected<std::shared_ptr<handle_type>, std::error_code> openfile(std::filesystem::path path)
{
  int fd;
  while(-1==(fd=::open(path.c_str(), O_RDWR|O_EXCL)) && EINTR==errno);
  if(-1==fd)
  {
    int code=errno;
    return std::make_unexpected(std::error_code(code, generic_category());
  }
  return std::make_shared<handle_type>(fd);
}

The expected outcome is a shared_ptr to a handle_type, the unexpected outcome is a std::error_code, and the catastrophic outcome is the throwing of bad_alloc. Code using openfile() can either manually check the expected (its bool operator is true if the expected value is contained, false if the unexpected value) or simply unilaterally call expected<>.value() which will throw if the value is unexpected, thus converting the error_code into an exception. As you will immediately note, this eliminates the need for two openfile() overloads because the single monadic return based implementation can now perform both overloads with equal convenience to the programmer. On the basis of halving the number of APIs a library must export, use of expected is a huge win.

However I am still not happy with this semantic encapsulation because it is a poor fit to what opening files actually means. Experienced programmers will instantly spot the problem here: the open() call doesn't just return success vs failure, it actually has five outcome categories:

  1. Success, returning a valid fd.
  2. Temporary failure, please retry immediately: EINTR
  3. Temporary failure, please retry later: EBUSY, EISDIR, ELOOP, ENOENT, ENOTDIR, EPERM, EACCES (depending on changes on the filing system, these could disappear or appear at any time)
  4. Non-temporary failure due to bad or incorrect parameters: EINVAL, ENAMETOOLONG, EROFS
  5. Catastrophic failure, something is very wrong: EMFILE, ENFILE, ENOSPC, EOVERFLOW, ENOMEM, EFAULT

So you can see the problem now: what we really want is for category 3 errors to only return with error_code, whilst category 4 and 5 errors plus bad_alloc to probably emerge as exception throws (these aren't actually the ideal outcomes, but we'll assume this mapping for the purposes of brevity here). That way the C++ semantics of the function would closely match the semantics of opening files. So let's try again:

namespace std { using std::experimental; }
std::expected<
  std::expected<
    std::shared_ptr<handle_type>,              // Expected outcome
    std::error_code>,                          // Expected unexpected outcome
  std::exception_ptr>                          // Unexpected outcome
openfile(std::filesystem::path path) noexcept  // Note the noexcept guarantee!
{
  int fd;
  while(-1==(fd=::open(path.c_str(), O_RDWR|O_EXCL)) && EINTR==errno);
  try
  {
    if(-1==fd)
    {
      int code=errno;
      // If a temporary failure, this is an expected unexpected outcome
      if(EBUSY==code || EISDIR==code || ELOOP==code || ENOENT==code || ENOTDIR==code || EPERM==code || EACCES==code)
        return std::make_unexpected(std::error_code(code, generic_category());

      // If a non-temporary failure, this is an unexpected outcome
      std::string errstr(strerror(code));
      return std::make_unexpected(std::make_exception_ptr(std::system_error(ec, std::move(errstr))));
    }
    return std::make_shared<handle_type>(fd);
  }
  catch(...)
  {
    // Any exception thrown is truly unexpected
    return std::make_unexpected(std::current_exception());
  }
}

There are some very major gains now in this design:

  1. Code calling openfile() no longer need to worry about exception safety - all exceptional outcomes are always transported by the monadic expected transport. This lets the compiler do better optimisation, eases use of the function, and leads to few code paths to test which means more reliable, better quality code.
  2. The semantic outcomes from this function in C++ have a close mapping to that of opening files. This means code you write more naturally flows and fits to what you are actually doing.
  3. Returning a monadic transport means you can now program monadically against the result e.g. value_or(), then() and so on. Monadic programming - if and only if there is no possibility of exception throws - is also a formal specification, so you could in some future world use a future clang AST tool to formally verify the mathematical correctness of some monadic logic if and only if all the monadic functions you call are noexcept. That's enormous for C++.

You may have noticed though the (Strongly) in the title of this section being in brackets, and if you guessed there are caveats in the above then you are right. The first big caveat is that the expected<T, E> implementation in Boost.Expected is very powerful and full featured, but unfortunately has a big negative effect on compile times, and that rather ruins it for the majority of people who only need about 10% of what it provides (and would rather like that to be quick to compile). The second caveat is that integration between Expected and Future-Promise especially with resumable functions in the mix is currently poorly defined, and using Expected now almost certainly introduces immediate technical debt into your code that you'll have to pay for later.

The third caveat is that I personally plan to write a much lighter weight monadic result transport which isn't as flexible as expected<T, E> (and probably hard coded to a T, error_code and exception_ptr outcomes) but would have negligible effects on compile times, and very deep integration with a non-allocating all-constexpr new lightweight future-promise implementation. Once implemented, my monadic transport may be disregarded by the community, evolved more towards expected<T, E>, or something else entirely may turn up.

In other words, I recommend you very strongly consider some mechanism for more closely and cleanly matching C++ semantics with what a function does now that C++ 11 makes it possible, but I unfortunately cannot categorically recommend one solution over another at the time of writing.

9. MAINTENANCE: Consider making it possible to use an XML outputting unit testing framework, even if not enabled by default

A very noticeable trend in the libraries reviewed above is that around half use good old C assert() and static_assert() instead of a unit testing framework.

There are many very good reasons not to use a unit testing framework by default, but there are few good reasons to not be able to use a unit testing framework at all. A big problem for the Boost release managers when your library cannot output XML indicating exactly which tests pass and which fail (including the static ones) is that all they get instead is failure to compile or failure to execute. This forces them to dig into compiler error diagnostics and unit test diagnostics respectively. It also makes what may be a very minor problem easily delegated appear as serious as the most serious possible problem because there is no way to quickly disambiguate without diving into potentially a debugger, so all these are good reasons to support some XML outputting unit testing framework which reports an XML entry one per test for each test case in every test suite in your library.

Let me give you an example with Boost.AFIO which executes about a hundred thousand tests for about 70 test platforms and configurations per commit. I once committed a change and noticed in the test matrix that only statically linked libraries were failing. The cause was immediately obvious to me: I had leaked ABI in a way that the unit tests which deliberately build mismatching versions of AFIO to ensure namespace version changes don't conflict had tripped, and without even having to investigate the error itself I knew to revisit my commit for ABI leakage. For someone less familiar with the library, a quick look into the failing test would have revealed the name of the failing test case and instantly realise it was an ABI leakage problem. This sort of extra information is a big win for anyone trying to get a release out the door.

There are big advantages for unit test stability analysis tooling as well. Jenkins CI can record the unit tests for thousands of builds, and if you have a test that regularly but rarely fails then Jenkins can flag such unstable tests. Atlassian tooling free for open source can display unit test aggregate statistics on a dashboard, and free web service tooling able to do ever more sophisticated statistical analysis which you once had to pay for is becoming ever more common.

Finally, specifically for Boost libraries we have an automated regression testing system which works by various end users uploading XML results generated by Boost.Test to an FTP site where a cron script regularly runs to generate static HTML tables of passing and failing tests. Needless to say, if your library was as useful as possible to that system everybody wins, and your library is not as useful to that system if it uses assert() and even static_assert() because the XML uploaded is a compiler error console log or an assert failure diagnostic instead of a detailed list of which tests passed and which failed.

Hopefully by now I have persuaded you to use an XML outputting unit test framework. If you are a Boost library, the obvious choice is to use Boost.Test. Despite its many problems, being slow to develop against and lack of maintenance in its release branch (NOTE: Boost.Test v3 is now in testing, and should replace Boost.Test v2 soon), Boost.Test is still a very competitive choice, and if you ignore the overly dense documentation and simply lift the pattern from this quick sample you'll be up and running very quickly:

#include "boost/test/unit_test.hpp"  // Note the lack of angle brackets

BOOST_AUTO_TEST_SUITE(all)  // Can actually be any name you like

BOOST_AUTO_TEST_CASE(works/spinlock, "Tests that the spinlock works as intended")  // Note the forward slashes in the test name
{
  boost::spinlock::spinlock<bool> lock;
  BOOST_REQUIRE(lock.try_lock());
  BOOST_REQUIRE(!lock.try_lock());
  lock.unlock();
  
  std::lock_guard<decltype(lock)> h(lock);
  BOOST_REQUIRE(!lock.try_lock());
}
// More BOOST_AUTO_TEST_CASE(), as many as is wanted

BOOST_AUTO_TEST_SUITE_END()

Already those familiar with Boost.Test will notice some unusual choices, but I'll come back to why shortly. For reference there are additional common tests in addition to BOOST_REQUIRE:

BOOST_CHECK(expr)
Check if expr is true, continuing the test case anyway if false.
BOOST_CHECK_THROWS(expr)
Check if expr throws an exception, continuing the test case anyway if false.
BOOST_CHECK_THROW(expr, type)
Check if expr throws an exception of a specific type, continuing the test case anyway if false.
BOOST_CHECK_NO_THROW(expr)
Check if expr does not throw an exception, continuing the test case anyway if false.
BOOST_REQUIRE(expr)
Check if expr is true, immediately exiting the test case if false.
BOOST_REQUIRE_THROWS(expr)
Check if expr throws an exception, immediately exiting the test case if false.
BOOST_REQUIRE_THROW(expr, type)
Check if expr throws an exception of a specific type, immediately exiting the test case if false.
BOOST_REQUIRE_NO_THROW(expr)
Check if expr does not throw an exception, immediately exiting the test case if false.
BOOST_TEST_MESSAGE(msg)
Log a message with the XML output.
BOOST_CHECK_MESSAGE(pred, msg)
If pred is false, log a message with the XML output.
BOOST_WARN_MESSAGE(pred, msg)
If pred is false, log a warning message with the XML output.
BOOST_FAIL(msg)
Immediately exit this test case with a message.

Boost.Test provides an enormous amount of extra stuff (especially in its v3 branch) for all sorts of advanced testing scenarios, but for most software being developed in a person's free time most of those advanced testing facilities don't provide sufficient benefit for the significant added cost of implementation. Hence, for personally developed open source the above primitive checks, or a combination thereof into more complex solutions, is likely sufficient for 99% of C++ code. There is also a very specific reason I chose this exact subset of Boost.Test's functionality to suggest using here, because Boost.APIBind's lightweight header only Boost.Test emulation defines just the above subset and usefully does so into a header inside APIBind called "boost/test/unit_test.hpp" which is identical to the Boost.Test header path, so if you include just that header you get compatibility with APIBind and Boost.Test. In other words, by using the pattern just suggested you can:

  1. With a macro switch turn on full fat Boost.Test.
  2. For the default use Boost.APIBind's thin wrap of the CATCH header only unit testing library which I have forked with added thread safety support. CATCH is very convenient to develop against, provides pretty coloured console unit test output and useful diagnostics, and on request on the command line can also output JUnit format XML ready for consumption by almost every unit test XML consuming tool out there. Boost.Test theoretically can be used header only, but you'll find it's very hard on your compile times, whereas CATCH is always header only and has a minimal effect on compile time. CATCH also comes as a single kitchen sink header file, and APIBind includes a copy for you.
  3. For those so motivated that they really want assert() and nothing more, simply wrap the above macros with calls to assert(). Your single unit test code base can now target up to three separate ways of reporting unit test fails.

Note if CATCH doesn't have enough features and Boost.Test is too flaky, another popular choice with tons of bells and whistles is Google Test. Like Boost.Test its Windows support is sadly also a bit flaky - in many ways for advanced testing scenarios the Microsoft Visual Studio test tooling is hard to beat on Windows, and now they are porting Visual Studio to all other platforms it may become the one to watch in the future - another good reason to get your C++ 11/14 codebase working perfectly on VS2015.

What are the problems with replacing asserts with a unit test framework?

  1. Asserts are fast and don't synchronise threads. Unit test frameworks almost always must grab a mutex for every single check, even if that check passes, which can profoundly damage the effectiveness of your testing. The obvious workaround is to prepend an if statement of the test before every check, so if(!expr) BOOST_CHECK(expr); but be aware now only failures will be output into XML, and many CI parsers will consider zero XML test results in a test case to be a fatal error (workaround: always do a BOOST_CHECK(true) at the very end of the test).
  1. Getting static_asserts to decay cleanly into a BOOST_CHECK without #ifdef-ing is not always obvious. The obvious beginning is:
#ifndef AM_USING_BOOST_TEST_FOR_STATIC_ASSERTS
#define BOOST_CHECK_MESSAGE(pred, msg) static_assert(pred, msg)
#endif

... and now use BOOST_CHECK_MESSAGE instead of static_assert directly. If your static assert is inside library implementation code, consider a macro which the unit tests override when being built with a unit test framework, but otherwise defaults to static_assert.

  1. Asserts have no effect when NDEBUG is defined. Your test code may assume this for optimised builds, and a simple regex find and replace may not be sufficient.

Libraries implementing XML outputting unit testing with the Boost.Test macro API:

10. DESIGN/QUALITY: Consider breaking up your testing into per-commit CI testing, 24 hour soak testing, and parameter fuzz testing

When a library is small, you can generally get away with running all tests per commit, and as that is easier that is usually what one does.

However as a library grows and matures, you should really start thinking about categorising your tests into quick ones suitable for per-commit testing, long ones suitable for 24 hour soak testing, and parameter fuzz testing whereby a fuzz tool will try executing your functions with input deliberately designed to exercise unusual code path combinations. The order of these categories generally reflects the maturity of a library, so if a library's API is still undergoing heavy refactoring the second and third categories aren't so cost effective. I haven't mentioned the distinction between unit testing and functional testing and integration testing here as I personally think that distinction not useful for libraries mostly developed in a person's free time (due to lack of resources, and the fact we all prefer to develop instead of test, one tends to fold unit and functional and integration testing into a single amorphous set of tests which don't strictly delineate as really we should, and instead of proper unit testing one tends to substitute automated parameter fuzz testing, which really isn't the same thing but it does end up covering similar enough ground to make do).

There are two main techniques to categorising tests, and each has substantial pros and cons.

The first technique is that you tag tests in your test suite with keyword tags, so "ci-quick", "ci-slow", "soak-test" and so on. The unit test framework then lets you select at execution time which set of tags you want. This sounds great, but there are two big drawbacks. The first is that each test framework has its own way of doing tags, and these are invariably not compatible so if you have a switchable Boost.Test/CATCH/Google Test generic test code setup then you'll have a problem with the tagging. One nasty but portable workaround I use is to include the tag into the test name and then using a regex test selector string on the command line, this is why I have categorised slashes in the test names exampled in the section above so I can select tests by category via their name. The second drawback is that you will find that tests often end up internally calling some generic implementation with different parameters, and you have to go spell out many sets of parameters in individual test cases when one's gut feeling is that those parameters really should be fuzz variables directly controlled by the test runner. Most test frameworks support passing variables into tests from the command line, but again this varies strongly across test frameworks in a way hard to write generic test code, so you end up hard coding various sets of variables one per test case.

The second technique is a hack, but a very effective one. One simply parameterises tests with environment variables, and then code calling the unit test program can configure special behaviour by setting environment variables before each test iteration. This technique is especially valuable for converting per-commit tests into soak tests because you simply configure an environment variable which means ITERATIONS to something much larger, and now the same per-commit tests are magically transformed into soak tests. Another major use case is to reduce iterations for when you are running under valgrind, or even just a very slow ARM dev board. The big drawback here is the self deception that just iterating per commit tests a lot more does not a proper soak test suite make, and one can fool oneself into believing your code is highly stable and reliable when it is really only highly stable and reliable at running per commit tests, which obviously it will always be because you run those exact same patterns per commit so those are always the use patterns which will behave the best. Boost.AFIO is 24 hour soak tested on its per-commit tests, and yet I have been more than once surprised at segfaults caused by someone simply doing operations in a different order than the tests did them :(

Regarding parameter fuzz testing, there are a number of tools available for C++, some better or more appropriate to your use case than others. The classic is of course http://ispras.linuxbase.org/index.php/API_Sanity_Autotest, though you'll need their ABI Compliance Checker working properly first which has become much easier for C++ 11 code since they recently added GCC 4.8 support (note that GCC 4.8 still has incomplete C++ 14 support). You should combine this with an executable built with, as a minimum, the address and undefined behaviour sanitisers. I haven't played with this tool yet with Boost.AFIO, though it is very high on my todo list as I have very little unit testing in AFIO (only functional and integration testing), and fuzz testing of my internal routines would be an excellent way of implementing comprehensive exception safety testing which I am also missing (and feel highly unmotivated to implement by hand).

11. PORTABILITY: Consider not doing compiler feature detection yourself

Something extremely noticeable about nearly all the reviewed C++ 11/14 libraries is that they manually do compiler feature detection in their config.hpp, usually via old fashioned compiler version checking. This tendency is not unsurprising as the number of potential C++ compilers your code usually needs to handle has essentially shrunk to three unlike the dozen common compilers implementing the 1998 C++ standard, and the chances are very high that three compilers will be upper bound going into the long term future. This makes compiler version checking a lot more tractable than say fifteen years ago.

However, C++ 1z is expected to provide a number of feature detection macros via the work of SG-10, and GCC and clang already partially support these, especially in very recent compiler releases. To fill in the gaps in older editions of GCC and clang, and indeed MSVC at all, you might consider making use of the header file at https://github.com/ned14/Boost.APIBind/blob/master/include/cpp_feature.h which provides the following SG-10 feature detection macros on all versions of GCC, clang and MSVC:

cpp_exceptions
Whether C++ exceptions are available
cpp_rtti
Whether C++ RTTI is available
cpp_alias_templates
cpp_alignas
cpp_decltype
cpp_default_function_template_args
cpp_defaulted_functions
cpp_delegated_constructors
cpp_deleted_functions
cpp_explicit_conversions
cpp_generalized_initializers
cpp_implicit_moves
cpp_inheriting_constructors
cpp_inline_namespaces
cpp_lambdas
cpp_local_type_template_args
cpp_noexcept
cpp_nonstatic_member_init
cpp_nullptr
cpp_override_control
cpp_reference_qualified_functions
cpp_range_for
cpp_raw_strings
cpp_rvalue_references
cpp_static_assert
cpp_thread_local
cpp_auto_type
cpp_strong_enums
cpp_trailing_return
cpp_unicode_literals
cpp_unrestricted_unions
cpp_user_defined_literals
cpp_variadic_templates
cpp_contextual_conversions
cpp_decltype_auto
cpp_aggregate_nsdmi
cpp_digit_separators
cpp_init_captures
cpp_generic_lambdas
cpp_relaxed_constexpr
cpp_return_type_deduction
cpp_runtime_arrays
cpp_variable_templates

The advantage of using these SG-10 macros in C++ 11/14 code is threefold:

  1. It should be future proof.
  2. It's a lot nicer than testing compiler versions.
  3. It expands better if a fourth C++ compiler suddenly turned up.

Why use the https://github.com/ned14/Boost.APIBind/blob/master/include/cpp_feature.h header file instead of doing it by hand?

  1. Complete compiler support for GCC, clang and MSVC all versions.
  2. Updates in compiler support will get reflected into cpp_feature.h for you.
  3. You benefit from any extra compilers added automatically.
  4. If you're using Boost.APIBind you automatically get cpp_feature.h included for you as soon as you include any APIBind header file.

Problems with cpp_feature.h:

  1. No support for detecting STL library feature availability. One can do this somewhat with GCC as it always pairs to a libstdc++ version, and of course one can do this for MSVC. However clang pairs to whatever is the latest STL on the system, plus GCC combined with libc++ is becoming increasingly common on Linux. In short you are on your own for STL library feature detection as I am unaware of any easy way to abstract this without the SG-10 library feature detection facilities built into the compiler.

Incidentally Boost.APIBind wraps these SG-10 feature macros into Boost.Config compatible macros in https://github.com/ned14/Boost.APIBind/blob/master/include/boost/config.hpp which would be included, as with Boost, using "boost/config.hpp". You can therefore if you really want use the Boost feature detection macros instead, even without Boost being present.

12. CONVENIENCE: Consider having Travis send your unit test code coverage results to Coveralls.io

There is quite a neat web service called coveralls.io free for open source projects which graphically displays unit test line coverage in a pretty colour coded source code browser UI. You also get a badge which shows how much of your code is covered as a percentage. It might sound like a bit of a gimmick, but in terms of ease of quickly visualising what you haven't covered when you thought you should it's very handy. Also, if you hook coveralls into your github using travis, coveralls will comment on your pull requests and commits whether your test coverage has risen or fallen, and that can be more than useful when you send in a commit and an unexpected catastrophic fall in coverage occurs as that probably means you just committed buggy code.

Anyway, firstly take a look at these libraries which use coveralls.io and decide if you like what you see:

Assuming you are now convinced, firstly you obviously need travis working. You can use coveralls without travis, but it's a one click enable with travis and github, so we'll assume you've done that. Your next problem with be getting travis to calculate line coverage for you, and to send the results to coveralls.

There are two approaches to this, and we'll start with the official one. Firstly you'll need a coveralls API key securely encoded into travis, see this page for how. Next have a look at https://github.com/krzysztof-jusiak/di/blob/cpp14/.travis.yml, with the key line being:

after_success:
  - if [ "${TRAVIS_BRANCH}" == "cpp14" ] && [ "${VARIANT}" == "coverage" ]; then (sudo pip install requests[security] cpp-coveralls && coveralls -r . -b test/ --gcov /usr/bin/${GCOV} --repo-token c3V44Hj0ZTKzz4kaa3gIlVjInFiyNRZ4f); fi

This makes use of the coveralls c++ tool at https://github.com/eddyxu/cpp-coveralls to do the analysis, and you'll also need to adjust your Jamfile as per https://github.com/krzysztof-jusiak/di/blob/cpp14/test/Jamfile.v2 with some variant addition like:

extend-feature variant : coverage ;
compose <variant>coverage :
    <cxxflags>"-fprofile-arcs -ftest-coverage" <linkflags>"-fprofile-arcs"
    <optimization>off
;

... which gets the gcov files to be output when the unit tests are executed.

That's the official way, and you should try that first. However, I personally couldn't get the above working, though admittedly when I implemented coveralls support it was a good two years ago and I spent a large chunk of it fighting the tooling, so I eventually gave up and wrote my own coveralls coverage calculator which was partially borrowed from others. You can see mine at https://github.com/BoostGSoC13/boost.afio/blob/master/.travis.yml where you will note that I inject the fprofile-arcs etc arguments into b2 via its cxxflags from the outside. I then invoke a shell script at https://github.com/BoostGSoC13/boost.afio/blob/master/test/update_coveralls.sh:

#!/bin/bash
# Adapted from https://github.com/purpleKarrot/Karrot/blob/develop/test/coveralls.in
# which itself was adapted from https://github.com/berenm/cmake-extra/blob/master/coveralls-upload.in

if [ 0 -eq $(find -iname "*.gcda" | wc -l) ]
then
  exit 0
fi

gcov-4.8 --source-prefix $1 --preserve-paths --relative-only $(find -iname "*.gcda") 1>/dev/null || exit 0

cat >coverage.json <<EOF
{
  "service_name": "travis-ci",
  "service_job_id": "${TRAVIS_JOB_ID}",
  "run_at": "$(date --iso-8601=s)",
  "source_files": [
EOF

for file in $(find * -iname '*.gcov' -print | egrep '.*' | egrep -v 'valgrind|SpookyV2|bindlib|test')
do
  FILEPATH=$(echo ${file} | sed -re 's%#%\/%g; s%.gcov$%%')
  echo Reporting coverage for $FILEPATH ...
  cat >>coverage.json <<EOF
    {
      "name": "$FILEPATH",
      "source": $(cat $FILEPATH | python test/json_encode.py),
      "coverage": [$(tail -n +3 ${file} | cut -d ':' -f 1 | sed -re 's%^ +%%g; s%-%null%g; s%^[#=]+$%0%;' | tr $'\n' ',' | sed -re 's%,$%%')]
    },
EOF
done

#cat coverage.json
mv coverage.json coverage.json.tmp
cat >coverage.json <(head -n -1 coverage.json.tmp) <(echo -e "    }\n  ]\n}")
rm *.gcov coverage.json.tmp

#head coverage.json
#echo
curl -F json_file=@coverage.json https://coveralls.io/api/v1/jobs
#head coverage.json

This manually invokes gcov to convert the gcda files into a unified coverage dataset. I then use egrep to include all and egrep -v to exclude anything matching the pattern which is all the stuff not in the actual AFIO library. You'll note I build a JSON fragment as I go into the coverage.json temporary file, and the coverage is generated by chopping up the per line information into a very long string matching the coveralls JSON specification as per its API docs. Do note the separate bit of python called to convert the C++ source code into encoded JSON text (https://github.com/BoostGSoC13/boost.afio/blob/master/test/json_encode.py), I had some problems with UTF-8 in my source code, and forcing them through a ISO-8859 JSON string encode made coveralls happy. I then push the JSON to coveralls using curl. All in all a very blunt instrument, and essentially doing exactly the same thing as the official C++ coveralls tool now does, but you may find the manual method useful if the official tool proves too inflexible for your needs.

13. CONVENIENCE: Consider creating a status dashboard for your library with everything you need to know shown in one place

I like all-in-one-place software status dashboards where with a quick glance one can tell if there is a problem or not. I feel it makes it far more likely that I will spot a problem quickly if it is somewhere I regularly visit, and for that reason I like to mount my status dashboard at the front of my library docs and on my project's github Readme:

Implementing these is ridiculously easy: it's a table in standard HTML which github markdown conveniently will render as-is for me, and you can see its source markdown/HTML at https://raw.githubusercontent.com/BoostGSoC13/boost.afio/master/Readme.md. The structure is very simple, columns for OS, Compiler, STL, CPU, Build status, Test status and with three badges in each status row, one each for header only builds, static library builds, and shared DLL builds.

Keen eyes may note that the latter majority of that HTML looks automatically generated, and you would be right. The python script at https://github.com/BoostGSoC13/boost.afio/blob/master/scripts/GenJenkinsMatrixDashboard.py has a matrix of test targets configured on my Jenkins CI at https://ci.nedprod.com/ and it churns out HTML matching those. An alternative approach is https://github.com/BoostGSoC13/boost.afio/blob/master/scripts/JenkinsMatrixToDashboard.py which will parse a Jenkins CI test grid from a Matrix Build configuration into a collapsed space HTML table which fits nicely onto github. If you also want your HTML/markdown dashboard to appear in your BoostBook documentation, the script at https://github.com/BoostGSoC13/boost.afio/blob/master/scripts/readme_to_html.sh with the XSLT at https://github.com/BoostGSoC13/boost.afio/blob/master/scripts/xhtml_to_docbook.xsl should do a fine job.

All of the above dashboarding is fairly Jenkins centric, so what if you just have Travis + Appveyor? I think Boost.DI has it right by encoding a small but complete status dashboard into its BoostBook docs and github, so examine:

As a purely personal thing, I'd personally prefer the line of status badges before the table of contents such that I am much more likely to see it when jumping in and notice that something is red when it shouldn't be. But it's purely a personal thing, and each library author will have their own preference.

Finally, I think that displaying status summaries via badges like this is another highly visible universal mark of software quality. It shows that the library author cares enough to publicly show the current state of their library. Future tooling by Boost which dashboards Boost libraries and/or ranks libraries by a quality score will almost certainly find the application specific ids for Travis, Appveyor, Coveralls etc by searching any Readme.md in the github repo for status badges, so by including status badges in your github Readme.md you can guarantee that such Boost library ranking scripts will work out of the box with no additional effort by you in the future.

14. DESIGN: Consider making (more) use of ADL C++ namespace composure as a design pattern

Most C++ programmers are aware of C++ template policy based design. This example is taken from https://en.wikipedia.org/wiki/Policy-based_design:

#include <iostream>
#include <string>
 
template <typename OutputPolicy, typename LanguagePolicy>
class HelloWorld : private OutputPolicy, private LanguagePolicy
{
    using OutputPolicy::print;
    using LanguagePolicy::message;
 
public:
    // Behaviour method
    void run() const
    {
        // Two policy methods
        print(message());
    }
};
 
class OutputPolicyWriteToCout
{
protected:
    template<typename MessageType>
    void print(MessageType const &message) const
    {
        std::cout << message << std::endl;
    }
};
 
class LanguagePolicyEnglish
{
protected:
    std::string message() const
    {
        return "Hello, World!";
    }
};
 
class LanguagePolicyGerman
{
protected:
    std::string message() const
    {
        return "Hallo Welt!";
    }
};
 
int main()
{
    /* Example 1 */
    typedef HelloWorld<OutputPolicyWriteToCout, LanguagePolicyEnglish> HelloWorldEnglish;
 
    HelloWorldEnglish hello_world;
    hello_world.run(); // prints "Hello, World!"
 
    /* Example 2 
     * Does the same, but uses another language policy */
    typedef HelloWorld<OutputPolicyWriteToCout, LanguagePolicyGerman> HelloWorldGerman;
 
    HelloWorldGerman hello_world2;
    hello_world2.run(); // prints "Hallo Welt!"
}

This works very well when (a) your policy implementations fit nicely into template types and (b) the number of policy taking template types is reasonably low (otherwise you'll be doing a lot of typing as any changes to the policy design requires modifying every single instantiation of the policy taking template types). Another problem with policy based design is that it generates a lot of template instantiations, and generating a lot of template instantiations is bad because it is slow (type instantiation is typically linear to the number of types already instantiated and quadratic to the number of partial specialisations affecting a type, but some compilers are quadratic for both).

Consider instead doing an ADL based namespace composure design pattern which is just a different way of doing policy based design. It can be highly effective in those niches where the traditional policy taking template approach falls down. Here is the same program above written using ADL namespace composure:

#include <iostream>
#include <string>

template<typename MessageType>
void print(MessageType const &message)
{
  std::cout << message << std::endl;
}
namespace HelloWorld
{
  template<class T> void run(T v)
  {
    print(message(v));  // Cannot instantiate message() nor print() until T is known
  }
}
namespace LanguagePolicyEnglish
{
  struct tag {};
  template<class T> std::string message(T)
  {
    return "Hello, World!";
  }
}
namespace LanguagePolicyGerman
{
  struct tag {};
  template<class T> std::string message(T)
  {
    return "Hallo Welt!";
  }
}
namespace LanguagePolicyDefault
{
  struct tag {};
  using LanguagePolicyGerman::message;
}
int main()
{
  /* Example 1 */
  {
    using namespace LanguagePolicyEnglish;
    using namespace HelloWorld;
    run(tag()); // prints "Hello, World!"
    // This works because HelloWorld::run()'s message() resolves inside these
    // braces to LanguagePolicyEnglish::message() to the same namespace as
    // struct tag thanks to argument dependent lookup
  }

  /* Example 2
  * Does the same, but uses another language policy */
  {
    using namespace LanguagePolicyGerman;
    using namespace HelloWorld;
    run(tag()); // prints "Hallo Welt!"
    // Whereas HelloWorld::run()'s message() now resolves inside these
    // braces to LanguagePolicyGerman::message()
  }

  /* Example 3 */
  {
    using namespace LanguagePolicyDefault;
    using namespace HelloWorld;
    run(tag()); // prints "Hallo Welt!"
    // Tries to find message() inside namespace LanguagePolicyDefault,
    // which finds message aliased to LanguagePolicyGerman::message()
  }
}

The first example instantiates five types, so let's say it adds a cost of five. The second example instantiates merely the three empty tag types in each namespace, and otherwise only free functions which don't participate in global overload resolution unless ADL brings them into consideration. This ought to add much less load to the compiler than the traditional design. This second example may also require less refactoring in the face of changes than the traditional form.

The above pattern is in fact entirely C++ 03 code and uses no C++ 11. However, template aliasing in C++ 11 makes the above pattern much more flexible. Have a look at https://github.com/ptal/expected/blob/master/include/boost/functional/monads/rebindable.hpp for examples of this ADL invoked namespace composure design pattern.

15. BUILD: Consider defaulting to header only, but actively manage facilities for reducing build times

Making your library header only is incredibly convenient for your users - they simply drop in a copy of your project and get to work, no build system worries. Hence most Boost libraries and many C++ libraries are header only capable, often header only default. A minority are even header only only.

One thing noticed in the library review is just how many of the new C++ 11/14 libraries are header only only, and whilst convenient I think library authors should and moreover can do better. For some statistics to put this in perspective, proposed Boost.AFIO v1.3 provides a range of build configurations for its unit tests:

  1. Header only
  2. Precompiled header only (default)
  3. Precompiled not header only (library implementation put into a shared library)
  4. Precompiled header only with link time optimisation
Build flags Microsoft Windows 8.1 x64 with Visual Studio 2013 Ubuntu 14.04 LTS Linux x64 with GCC 4.9 and gold linker Ubuntu 14.04 LTS Linux x64 with clang 3.4 and gold linker
Debug header only 7m17s 12m0s 5m45s
Debug precompiled header only 2m10s 10m26s 5m46s
Debug precompiled not header only 0m55s 3m53s asio failure
Release precompiled header only 2m58s 9m57s 8m10s
Release precompiled not header only 1m10s 3m22s asio failure
Release precompiled header only link time optimisation 7m30s 13m0s 8m11s

These are for a single core 3.9Ghz i7-3770K computer. I think the results speak for themselves, and note that AFIO is only 8k lines with not much metaprogramming.

The approaches for improving build times for your library users are generally as follows, and in order of effect:

1. Offer a non-header only build configuration

Non-header build configurations can offer build time improvements of x4 or more, so these are always the best bang for your buck. Here is how many Boost libraries offer both header only and non-header only build configurations by using something like this in their config.hpp:

// If we are compiling not header only
#if (defined(BOOST_AFIO_DYN_LINK) || defined(BOOST_ALL_DYN_LINK)) && !defined(BOOST_AFIO_STATIC_LINK)

# if defined(BOOST_AFIO_SOURCE)                // If we are compiling the library binary
#  undef BOOST_AFIO_HEADERS_ONLY
#  define BOOST_AFIO_DECL BOOST_SYMBOL_EXPORT    // Mark public symbols as exported from the library binary
#  define BOOST_AFIO_BUILD_DLL                   // Tell code we are building a DLL or shared object
# else
#  define BOOST_AFIO_DECL BOOST_SYMBOL_IMPORT    // If not compiling the library binary, mark public symbols are imported from the library binary
# endif
#else                                          // If we are compiling header only
# define BOOST_AFIO_DECL                         // Do no markup of public symbols
#endif // building a shared library


// Configure Boost auto link to get the compiler to auto link your library binary
#if !defined(BOOST_AFIO_SOURCE) && !defined(BOOST_ALL_NO_LIB) && \
    !defined(BOOST_AFIO_NO_LIB) && !AFIO_STANDALONE && !BOOST_AFIO_HEADERS_ONLY

#define BOOST_LIB_NAME boost_afio

// tell the auto-link code to select a dll when required:
#if defined(BOOST_ALL_DYN_LINK) || defined(BOOST_AFIO_DYN_LINK)
#define BOOST_DYN_LINK
#endif

#include <boost/config/auto_link.hpp>

#endif  // auto-linking disabled


#if BOOST_AFIO_HEADERS_ONLY == 1                              // If AFIO is headers only
# define BOOST_AFIO_HEADERS_ONLY_FUNC_SPEC inline               // Mark all functions as inline
# define BOOST_AFIO_HEADERS_ONLY_MEMFUNC_SPEC inline            // Mark all member functions as inline
# define BOOST_AFIO_HEADERS_ONLY_VIRTUAL_SPEC inline virtual    // Mark all virtual member functions as inline virtual
// GCC gets upset if inline virtual functions aren't defined
# ifdef BOOST_GCC
#  define BOOST_AFIO_HEADERS_ONLY_VIRTUAL_UNDEFINED_SPEC { BOOST_AFIO_THROW_FATAL(std::runtime_error("Attempt to call pure virtual member function")); abort(); }
# else
#  define BOOST_AFIO_HEADERS_ONLY_VIRTUAL_UNDEFINED_SPEC =0;
# endif
#else                                                         // If AFIO is not headers only
# define BOOST_AFIO_HEADERS_ONLY_FUNC_SPEC extern BOOST_AFIO_DECL  // Mark all functions as extern dllimport/dllexport
# define BOOST_AFIO_HEADERS_ONLY_MEMFUNC_SPEC                      // Mark all member functions with nothing
# define BOOST_AFIO_HEADERS_ONLY_VIRTUAL_SPEC virtual              // Mark all virtual member functions as virtual (no inline)
# define BOOST_AFIO_HEADERS_ONLY_VIRTUAL_UNDEFINED_SPEC =0;        // Mark all pure virtual member functions with nothing special
#endif

This looks a bit complicated, but isn't really. Generally you will mark up those classes and structs you implement in a .ipp file (this being the file implementing the APIs declared in the header which is included by the header if building header only, else is included by a .cpp file if not building header only) with BOOST_AFIO_DECL, functions with BOOST_AFIO_HEADERS_ONLY_FUNC_SPEC, all out-of-class member functions (i.e. those not implemented inside the class or struct declaration) with BOOST_AFIO_HEADERS_ONLY_MEMFUNC_SPEC, all virtual member functions with BOOST_AFIO_HEADERS_ONLY_VIRTUAL_SPEC and append to all unimplemented virtual member functions BOOST_AFIO_HEADERS_ONLY_VIRTUAL_UNDEFINED_SPEC. This inserts the correct markup to generate both optimal header only and optimal non header only outcomes.

2. Precompiled headers

You probably noticed that in the table above that precompiled headers gain nothing on clang, +13% on GCC and +70% on MSVC. Those percentages vary according to source code, but I have found them fairly similar across my own projects - on MSVC, precompiled headers are a must have on an already much faster compiler than any of the others.

Turning on precompiled headers in Boost.Build is easy:

cpp-pch afio_pch : afio_pch.hpp : <include>. ;

And now simply link your program to afio_pch to enable. If you're on cmake, you definitely should check out https://github.com/sakra/cotire.

3. extern template your templates with their most common template parameters in the headers, and force instantiate those same common instances into a separate static library

The following demonstrates the technique:

// Header.hpp
template<class T> struct Foo
{
  T v;
  inline Foo(T _v);
};
// Definition must be made outside struct Foo for extern template to have effect
template<class T> inline Foo<T>::Foo(T _v) : v(_v) { }

// Inhibit automatic instantiation of struct Foo for common types
extern template struct Foo<int>;
extern template struct Foo<double>;


// Source.cpp
#include "Header.hpp"
#include <stdio.h>

// Force instantiation of struct Foo with common types. Usually compiled into
// a separate static library as bundling it into the main shared library can
// introduce symbol visibility problems, so it's easier and safer to use a static
// library
template struct Foo<int>;
template struct Foo<double>;

int main(void)
{
  Foo<long> a(5);   // Works!
  Foo<int> b(5);    // Symbol not found if not force instantiated above
  Foo<double> c(5); // Symbol not found if not force instantiated above
  printf("a=%l, b=%d, c=%lf\n", a.v, b.v, c.v);
  return 0;
}

The idea behind this is to tell the compiler to not instantiate common instantiations of your template types on demand for every single compiland as you will explicitly instantiate them exactly once elsewhere. This can give quite a bump to build times for template heavy libraries.

4. Use C++ Modules

If you are on a very recent clang, or on a MSVC made sometime after 2016, there will probably be some implementation of C++ Modules available. C++ Modules as presently proposed for C++ 1z is poorly named in that after multiple rounds of simplification it provides little modularity at all, and is now really the beginnings of an AST database assisted build system which can dramatically improve build times in a similar way to precompiled headers. I therefore really wish that "C++ Modules" were called "C++ Build Acceleration" instead, but unfortunately no one proposing this feature to WG21 seems to agree. Anyway, for clang follow the instructions at http://clang.llvm.org/docs/Modules.html whereby you write up a module map file which delineates your header files into AST database objects, and your compiler will precompile your headers into an AST database. Unfortunately it presently seems you must manually annotate what parts are exposed and what parts are hidden in the module map rather than being able to reuse any dllimport/dllexport infrastructure already present. Your code when it includes a header will now have the compiler automatically go fetch the precompiled version from the database and use all exported types.

Microsoft unfortunately presently do not intend to implement Modules in the same way as clang, and requires you to do a lot more work including source code modification. Follow the instructions at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4465.pdf instead.

If you think of "C++ Modules" as really a somewhat more flexible and reusable implementation of existing precompiled headers which lets the compiler figure out what precompiled parts to use without you manually telling it as you must right now, you can estimate whether it is worth the effort to add support for Modules to your library as the speedup is likely similar to a precompiled header. The clang method is non-intrusive to your source code and doesn't get in the way of older compilers, but adds significant hassle in keeping the module map and the source code in sync. The Microsoft method seems to require a fair bit of #ifdef, but the module specification must be part of the source code which may aid keeping it in sync.

16. COUPLING: Consider allowing your library users to dependency inject your dependencies on other libraries

As mentioned earlier, the libraries reviewed overwhelmingly chose to use STL11 over any equivalent Boost libraries, so hardcoded std::thread instead of boost::thread, hardcoded std::shared_ptr over boost::shared_ptr and so on. This makes sense right now as STL11 and Boost are still fairly close in functionality, however in the medium term there will be significant divergence between Boost and the STL as Boost "gets ahead" of the STL in terms of features. Indeed, one may find oneself needing to "swap in" Boost to test one's code with some future STL pattern shortly to become standardised.

Let me put this another way: imagine a near future where Boost.Thread has been rewritten atop of the STL11 enhancing the STL11 threading facilities very substantially with lots of cool features which may not enter the standard until the 2020s. If your library is hardcoded to only use the STL, you may lose out on substantial performance or feature improvements. Your users may clamour to be able to use Boost.Thread with your library. You will then have to add an additional code path for Boost.Thread which replicates the STL11 threading path, probably selectable using a macro and the alternative code paths swapped out with #ifdef. But you still may not be done - what if Boost.Chrono also adds significant new features? Or Boost.Regex? Or any of the Boost libraries now standardised into the STL? Before you know it your config.hpp may look like the one from ASIO which has already gone all the way in letting users choose their particular ASIO configuration, and let me quote a mere small section of it to give an idea of what is involved:

...
// Standard library support for chrono. Some standard libraries (such as the
// libstdc++ shipped with gcc 4.6) provide monotonic_clock as per early C++0x
// drafts, rather than the eventually standardised name of steady_clock.
#if !defined(ASIO_HAS_STD_CHRONO)
# if !defined(ASIO_DISABLE_STD_CHRONO)
#  if defined(__clang__)
#   if defined(ASIO_HAS_CLANG_LIBCXX)
#    define ASIO_HAS_STD_CHRONO 1
#   elif (__cplusplus >= 201103)
#    if __has_include(<chrono>)
#     define ASIO_HAS_STD_CHRONO 1
#    endif // __has_include(<chrono>)
#   endif // (__cplusplus >= 201103)
#  endif // defined(__clang__)
#  if defined(__GNUC__)
#   if ((__GNUC__ == 4) && (__GNUC_MINOR__ >= 6)) || (__GNUC__ > 4)
#    if defined(__GXX_EXPERIMENTAL_CXX0X__)
#     define ASIO_HAS_STD_CHRONO 1
#     if ((__GNUC__ == 4) && (__GNUC_MINOR__ == 6))
#      define ASIO_HAS_STD_CHRONO_MONOTONIC_CLOCK 1
#     endif // ((__GNUC__ == 4) && (__GNUC_MINOR__ == 6))
#    endif // defined(__GXX_EXPERIMENTAL_CXX0X__)
#   endif // ((__GNUC__ == 4) && (__GNUC_MINOR__ >= 6)) || (__GNUC__ > 4)
#  endif // defined(__GNUC__)
#  if defined(ASIO_MSVC)
#   if (_MSC_VER >= 1700)
#    define ASIO_HAS_STD_CHRONO 1
#   endif // (_MSC_VER >= 1700)
#  endif // defined(ASIO_MSVC)
# endif // !defined(ASIO_DISABLE_STD_CHRONO)
#endif // !defined(ASIO_HAS_STD_CHRONO)

// Boost support for chrono.
#if !defined(ASIO_HAS_BOOST_CHRONO)
# if !defined(ASIO_DISABLE_BOOST_CHRONO)
#  if (BOOST_VERSION >= 104700)
#   define ASIO_HAS_BOOST_CHRONO 1
#  endif // (BOOST_VERSION >= 104700)
# endif // !defined(ASIO_DISABLE_BOOST_CHRONO)
#endif // !defined(ASIO_HAS_BOOST_CHRONO)
...

ASIO currently has over 1000 lines of macro logic in its config.hpp with at least twelve different possible combinations, so that is 2 ^ 12 = 4096 different configurations of code paths (note some combinations may not be allowed in the source code, I didn't check). Are all of these tested equally? I actually don't know, but it seems a huge task requiring many days of testing if they are. However there is a far worse problem here: what happens if library A configures ASIO one way and library B configures ASIO a different way, and then a user combines both libraries A and B into the same process?

The answer is that such a combination violates ODR, and therefore is undefined behaviour i.e. it will crash. This makes the ability to so finely configure ASIO much less useful than it could be.

Let me therefore propose something better: allow library users to dependency inject from the outside the configuration of whether to use a STL11 dependency or its Boost equivalent. If one makes sure to encapsulate the dependency injection into a unique inline namespace, that prevents violation of ODR and therefore collision of the incompatibly configured library dependencies. If the dependent library takes care to coexist with alternative configurations and versions of itself inside the same process, this:

  • Forces you to formalise your dependencies (this has a major beneficial effect on design, trust me that your code enormously improves when you are forced to think correctly about this).
  • Offers maximum convenience and utility to your library's users.
  • Lets you better test your code against multiple (future) STL implementations.
  • Looser coupling.
  • Much easier upgrades later on (i.e. less maintenance).

What it won't do:

  • Prevent API and version fragmentation.
  • Deal with balkanisation (i.e. two configurations of your library are islands, and cannot interoperate).

In short whether the pros outweigh the cons comes down to your library's use cases, you as a maintainer, and so on. Indeed you might make use of this technique internally for your own needs, but not expose the facility to choose to your library users.

So how does one implement STL dependency injection in C++ 11/14? One entirely valid approach is the ASIO one of a large config.hpp file full of macro logic which switches between Boost and the STL11 for the following header files which were added in C++ 11:

Boost header array.hpp atomic.hpp chrono.hpp thread.hpp bind.hpp thread.hpp thread.hpp random.hpp ratio.hpp regex.hpp system/system_error.hpp thread.hpp tuple/tuple.hpp type_traits.hpp no equivalent
Boost namespace boost boost, boost::atomics boost::chrono boost boost boost boost boost::random boost boost boost::system boost boost boost
STL11 header array atomic chrono condition_variable functional future mutex random ratio regex system_error thread tuple type_traits typeindex
STL11 namespace std std std::chrono std std std std std std std std std std std std

At the time of writing, a very large proportion of STL11 APIs are perfectly substitutable with Boost i.e. they have identical template arguments, parameters and type signatures, so all you need to do is to alias either namespace std or namespace boost::? into your own library namespace as follows:

// In config.hpp
namespace mylib
{
  inline namespace MACRO_UNIQUE_ABI_ID {
#ifdef MYLIB_USING_BOOST_RATIO  // The external library user sets this
    namespace ratio = ::boost;
#else
    namespace ratio = ::std;
#endif
  }
}

// To use inside namespace mylib::MACRO_UNIQUE_ABI_ID, do:
ratio::ratio<2, 1> ...

As much as the above looks straightforward, you will find it quickly multiplies into a lot of work just as with ASIO's config.hpp. You will also probably need to do a lot of code refactoring such that every use of ratio is prefixed with a ratio namespace alias, every use of regex is prefixed with a regex namespace alias and so on. So is there an easier way?

Luckily there is, and it is called APIBind. APIBind takes away a lot of the grunt work in the above, specifically:

  • APIBind provides bind files for the above C++ 11 header files which let you bind just the relevant part of namespace boost or namespace std into your namespace mylib. In other words, in your namespace mylib you simply go ahead and use ratio<N, D> with no namespace prefix because ratio<N, D> has been bound directly into your mylib namespace for you. APIBind's bind files essentially work as follows:
// In header <ratio> the API being bound
namespace std { template <intmax_t N, intmax_t D = 1> class ratio; }

// Ask APIBind to bind ratio into namespace mylib
#define BOOST_STL11_RATIO_MAP_NAMESPACE_BEGIN namespace mylib {
#define BOOST_STL11_RATIO_MAP_NAMESPACE_END }
#include BOOST_APIBIND_INCLUDE_STL11(bindlib, std, ratio)  // If you replace std with boost, you bind boost::ratio<N, D> instead.

// Effect on namespace mylib
namespace mylib
{
  template<intmax_t _0, intmax_t _1 = 1> using ratio = ::std::ratio<_0, _1>;
}

// You can now use mylib::ratio<N, D> without prefixing. This is usually a very easy find and replace in files operation.
  • APIBind provides generation of inline namespaces with an ABI and version specific mangling to ensure different dependency injection configurations do not collide:
// BOOST_AFIO_V1_STL11_IMPL, BOOST_AFIO_V1_FILESYSTEM_IMPL and BOOST_AFIO_V1_ASIO_IMPL all are set to either boost or std in your config.hpp

// Note the last bracketed item is marked inline. On compilers without inline namespace support this bracketed item is ignored.
#define BOOST_AFIO_V1 (boost), (afio), (BOOST_BINDLIB_NAMESPACE_VERSION(v1, BOOST_AFIO_V1_STL11_IMPL, BOOST_AFIO_V1_FILESYSTEM_IMPL, BOOST_AFIO_V1_ASIO_IMPL), inline)
#define BOOST_AFIO_V1_NAMESPACE       BOOST_BINDLIB_NAMESPACE      (BOOST_AFIO_V1)
#define BOOST_AFIO_V1_NAMESPACE_BEGIN BOOST_BINDLIB_NAMESPACE_BEGIN(BOOST_AFIO_V1)
#define BOOST_AFIO_V1_NAMESPACE_END   BOOST_BINDLIB_NAMESPACE_END  (BOOST_AFIO_V1)

// From now on, instead of manually writing namespace boost { namespace afio { and boost::afio, instead do:
BOOST_AFIO_V1_NAMESPACE_BEGIN
  struct foo;
BOOST_AFIO_V1_NAMESPACE_END

// Reference struct foo from the global namespace
BOOST_AFIO_V1_NAMESPACE::foo;

// Alias hard version dependency into mylib
namespace mylib
{
  namespace afio = BOOST_AFIO_V1_NAMESPACE;
}
  • APIBind also provides boilerplate for allowing inline reconfiguration of a library during the same translation unit such that the following "just works":
// test_all_multiabi.cpp in the AFIO unit tests

// A copy of AFIO + unit tests completely standalone apart from Boost.Filesystem
#define BOOST_AFIO_USE_BOOST_THREAD 0
#define BOOST_AFIO_USE_BOOST_FILESYSTEM 1
#define ASIO_STANDALONE 1
#include "test_all.cpp"
#undef BOOST_AFIO_USE_BOOST_THREAD
#undef BOOST_AFIO_USE_BOOST_FILESYSTEM
#undef ASIO_STANDALONE

// A copy of AFIO + unit tests using Boost.Thread, Boost.Filesystem and Boost.ASIO
#define BOOST_AFIO_USE_BOOST_THREAD 1
#define BOOST_AFIO_USE_BOOST_FILESYSTEM 1
// ASIO_STANDALONE undefined
#include "test_all.cpp"
#undef BOOST_AFIO_USE_BOOST_THREAD
#undef BOOST_AFIO_USE_BOOST_FILESYSTEM

In other words, you can reset the configuration macros and reinclude afio.hpp to generate a new configuration of AFIO as many times as you like within the same translation unit. This allows header only library A to require a different configuration of AFIO than header only library B, and it all "just works". As APIBind is currently lacking documentation, I'd suggest you review the C++ Now 2015 slides on the topic until proper documentation turns up. The procedure is not hard, and you can examine https://github.com/BoostGSoC13/boost.afio/blob/master/include/boost/afio/config.hpp for a working example of it in action. Do watch out for the comments marking the stanzas which are automatically generated by scripting tools in APIBind, writing those by hand would be tedious.

17. FUTURE PROOFING: Consider being C++ resumable function ready

This is going to be one of the hardest topics to write about given the highly uncertain present plans for resumable function/coroutine support in C++ 1z. I base most of the following section on N4402 https://isocpp.org/files/papers/N4402.pdf and conversations with Gor Nishanov at C++ Now 2015, plus on conversations with Oliver Kowalke regarding proposed Boost.Fiber. Gor also kindly supplied me with a pre-draft N4499.

Firstly, should you care about being C++ resumable function ready? If your library ever:

  • Uses threads.
  • Does i/o.
  • Uses callbacks, including std::function.

Then the answer is yes! The current C++ 1z resumable function proposal provides two verbs to work with resumable functions:

  1. await: potentially suspend the execution of the current function and return a future result now, resuming this function's execution when the callable passed to await finishes execution somewhere else or at some other time. Note that the name of this keyword may change in the future. Use of the await keyword always causes the invoking function to return some synchronisation object (often a future<T>) with a known coroutine_traits specialisation, where the default catch all specialisation for any R(T...) callable is std::future<R>.

This probably is a bit head wrecky, so let's look at some code:

// This is some function which schedules some async operation whose result is returned via the future<int>
extern std::future<int> async_call();

// This is a function which calls async_call ten times, suspending itself while async_call runs.
std::future<int> accumulate()
{
  int total=0;
  for(size_t n=0; n<10; n++)
    total+=await async_call();  // Note the await keyword, this is where the function might suspend
  return total;
}

int main(void)
{
  std::cout << "Total is " << accumulate().get() << std::endl;
  return 0;
}

The await keyword is rather like C++ 11 range for loops in that it expands into a well specified boilerplate. Let's look at the above code but with some of the boilerplate inserted, remembering that C++ 1z futures now have continuations support via the .then(callable) member function, and bearing in mind this is a simplified not actual expansion of the boilerplate for the purposes of brevity (you can find the actual boilerplate expansion in the N-papers submitted to WG21 if you really want):

// Thanks to Gor Nishanov for checking this code for me!

// This is some function which schedules some async operation whose result is returned via the future<int>
extern std::future<int> async_call();

// This is a function which calls async_call ten times, suspending itself while async_call runs.
std::future<int> accumulate()
{
  // coroutine_traits<R, Ts...> is specialised with the call signature of this function i.e. std::future<int>()
  // promise_type defaults to a wrap of type R::promise_type, so in this case a wrap of std::promise<int>

  // Always dynamically allocate a context on entry to a resumable function (the frame allocator defaults to std::allocator<char>)
  auto &__frame_allocator=std::coroutine_traits<std::future<int>>::get_allocator();
  struct __context_type
  {
    std::coroutine_traits<std::future<int>>::promise_type __resume_promise;
    ... <args, stack, state> ...
  } *__context=__frame_allocator.allocate(sizeof(__context_type));

  // Construct the context frame
  new (__context) __context_type(<args, stack, state>);

  // Generate a future for the completion of this function
  auto __return_object(__context->__resume_promise.get_return_object());
  
  if(__context->__resume_promise.initial_suspend())
    return suspend_now();                          // Not expanded for brevity

  try
  {
    int total=0;
    for(size_t n=0; n<10; n++)
    {
      std::future<int> &&__temp=async_call();        // Bind to rvalue ref, no moving
      if(!__temp.await_ready())                      // defaults to __temp.is_ready()
      {
        // Internally generated by the compiler to resume me at the resume_me_detached goto label
        // after storing my state into __context
        unspecified_callable &&__resume_me(__frame_allocator, __resume_promise, resume_me_detached);

        // Have __temp, when signalled, resume my execution. defaults to __temp.then(__resume_me).
        // __resume_me will be executed immediately after __temp.set_value() by the thread calling set_value().
        __temp.await_suspend(__resume_me);

        return __return_object;                      // exits function with future, will resume and signal later
      }
never_resumed:                                       // This path only taken if function never resumed
      total+=__temp.await_resume();                  // defaults to __temp.get(), this is always ready and does not block
    }
    __context->__resume_promise.set_value(total);
  }
  catch(...)
  {
    __context->__resume_promise.set_exception(std::current_exception());
  }
  if(__context->__resume_promise.final_suspend())
    return suspend_now();                            // Not expanded for brevity, jumps to resume_me_addr path
  __context->~__context_type();
  __frame_allocator.deallocate(__context, sizeof(__context_type));
  return __return_object;
  


    // *** ALTERNATIVE DETACHED CODE PATH FOR WHEN THE FUNCTION IS RESUMED ***
    for(size_t n=0; n<10; n++)
    {
      std::future<int> &&__temp=async_call();        // Bind to rvalue ref, no moving
      if(!__temp.await_ready())                      // defaults to __temp.is_ready()
      {
        // Have __temp, when signalled, resume my execution. defaults to __temp.then(__resume_me).
        // __resume_me will be executed immediately after __temp.set_value() by the thread calling set_value().
        __temp.await_suspend(__resume_me);

        // Suspend myself using previously allocated __resume_me
        __resume_me.suspend();
      }
resume_me_detached:                                  // This path taken if function ever resumed
      total+=__temp.await_resume();                  // defaults to __temp.get(), this is always ready and does not block
    }
    __context->__resume_promise.set_value(total);
  }
  catch(...)
  {
    __context->__resume_promise.set_exception(std::current_exception());
  }
  if(__context->__resume_promise.final_suspend())
    return suspend_now();                            // Not expanded for brevity, jumps to resume_me_addr path
  __context->~__context_type();
  __frame_allocator.deallocate(__context, sizeof(__context_type));
  // No meaningful return from this function can occur
}

int main(void)
{
  std::cout << "Total is " << accumulate().get() << std::endl;
  return 0;
}

This boilerplate expanded version may still hurt the head, so here is essentially what happens:

  1. We dynamically allocate a stack frame on entry, and that dynamic memory allocation sticks around until the resumable function completes.
  2. We always construct a promise-future pair for the result (or whatever synchronisation object coroutine_traits says) on entry to the function.
  3. If async_call() always returns a signalled future, we add the result to total, signal the future and return the future.
  4. If async_call() ever returns an unsignaled future, we ask the unsignaled future to resume our detached path when it is signalled. We suspend ourselves, and return our future immediately. At some point our detached path will signal the future.

The await_ready(), await_suspend() and await_resume() functions are firstly looked up as member functions of the synchronisation type returned by the traits specialisation. If not present, they are then looked up as free functions within the same namespace as the synchronisation type with usual overload resolution.

  1. yield: repeatedly suspends the execution of the current function with the value specified to yield, resuming higher up the call tree with that value output. Note that the name of this keyword may change in the future. Yield is implemented as yet more boilerplate sugar for a repeated construction of a promise-future pair then repeatedly signalled with some output value until the generating function (i.e. the function calling yield) returns, with the sequence of those repeated constructions and signals wrapped into an iterable such that the following just works:
generator<int> fib(int n)
{
  int a = 0;
  int b = 1;
  while (n-- > 0)
  {
    yield a;
    auto next = a + b;
    a = b;
    b = next;
  }
}

int main()
{
  for (auto v : fib(35))
    std::cout << v << std::endl;
}

This just works because generator<int> provides an iterator at generator<int>::iterator and a promise type at generator<int>::promise_type. The iterator, when dereferenced, causes a single iteration of fib() between the yield statements and the value yielded is output by the iterator. Note that for each iteration, a new promise and future pair is created, then destroyed and created, and so on until the generator returns.

All this is great if you are on Microsoft's compiler which has an experimental implementation of these proposed resumable functions, but what about the rest of us before C++ 1z? Luckily Boost has a conditionally accepted library called Boost.Fiber which was one of the C++ 11/14 libraries reviewed above and this, with a few caveats, provides good feature parity with proposed C++1z coroutines at the cost of having to type out the boilerplate by hand. Boost.Fiber provides a mirror image of the STL threading library, so:

std::thread => fibers::fiber
std::this_thread => fibers::this_fiber
std::mutex => fibers::mutex
std::condition_variable => fibers::condition_variable
std::future<T> => fibers::future<T>

Rewriting the above example to use Boost.Fiber and Boost.Thread instead, remembering that Boost.Thread's futures already provide C++ 1z continuations via .then(callable):

// Thanks to Oliver Kowalke for checking this code for me!

// This is some function which schedules some async operation whose result is returned via the future<int>
extern boost::future<int> async_call();

// This is a function which calls async_call ten times, suspending itself while async_call runs.
boost::fibers::future<int> accumulate()
{
  boost::fibers::packaged_task<int()> t([]{
    int total=0;
    for(size_t n=0; n<10; n++)
    {
      boost::fibers::promise<int> p;
      boost::fibers::future<int> f(p.get_future());
      async_call().then([p=std::move(p)](boost::future<int> f){
        if(f.has_error())
          p.set_exception(f.get_exception_ptr());
        else
          p.set_value(f.get());
      });
      total+=f.get();
    }
    return total;
  });
  boost::fibers::future<int> f(t.get_future());
  boost::fibers::fiber(std::move(t)).detach();
  return f;
}

int main(void)
{
  std::cout << "Total is " << accumulate().get() << std::endl;
  return 0;
}

As you can see, there is an unfortunate amount of extra boilerplate to convert between Boost.Thread futures and Boost.Fiber futures, plus more boilerplate to make accumulate() into a resumable function -- essentially one must boilerplate out accumulate() as if it were a kernel thread complete with nested lambdas. Still, though, the above is feature equivalent to C++ 1z coroutines, but you have it now not years from now (for reference, if you want to write a generator which yields values to fibers in Boost.Fiber, simply write to some shared variable and notify a boost::fibers::condition_variable followed by a boost::fibers::this_thread::yield(), writing to a shared variable without locking is safe because Fibers are scheduled cooperatively on the thread on which they were created).

So after all that, you might be wondering what any of this has to do with:

  • Threads.
  • i/o.
  • Callbacks, including std::function.

We'll take the last first. Callbacks, in the form of an externally supplied C function pointer, are as old as the hills and in C++ are best represented as a std::function object. As a rule, the code which accepts externally supplied callbacks often has to impose various conditions on what the callback may do, so a classic condition imposed on callbacks is that reentrancy is not permitted (i.e. you may not use the object from which you are being called back) because maybe a mutex is being held, and reentering the same object would therefore deadlock. As with signal handlers on POSIX, if you wish to actually do anything useful you are then forced to use the callback to schedule the real callback work to occur elsewhere, where the classic pattern is scheduling the real work to a thread pool such that by the time the thread pool executes the real work implementation, the routine calling the callbacks will have completed and released any locks.

However, scheduling work to thread pools is expensive. Firstly, there is a thread synchronisation which induces CPU cache coherency overheads, and secondly if the thread pool gets there too soon it will block on the mutex, thus inducing a kernel sleep which is several thousand CPU cycles. A much more efficient alternative therefore is to schedule the work to occur on the same thread after the thing doing the callback has exited and released any locks.

This more efficient alternative is illustrated by this code taken from proposed Boost.AFIO:

struct immediate_async_ops
{
  typedef std::shared_ptr<async_io_handle> rettype;
  typedef rettype retfuncttype();
  size_t reservation;
  std::vector<enqueued_task<retfuncttype>> toexecute;

  immediate_async_ops(size_t reserve) : reservation(reserve) { }
  // Returns a promise which is fulfilled when this is destructed
  void enqueue(enqueued_task<retfuncttype> task)
  {
    if(toexecute.empty())
      toexecute.reserve(reservation);
    toexecute.push_back(task);
  }
  ~immediate_async_ops()
  {
    for(auto &i: toexecute)
    {
      i();
    }
  }
private:
  immediate_async_ops(const immediate_async_ops &);
  immediate_async_ops &operator=(const immediate_async_ops &);
  immediate_async_ops(immediate_async_ops &&);
  immediate_async_ops &operator=(immediate_async_ops &&);
};

What this does is lets you enqueue packaged tasks (here called enqueued tasks) into an immediate_async_ops accumulator. On destruction, it executes those stored tasks, setting their futures to any results of those tasks. What on earth might the use case be for this? AFIO needs to chain operations onto other operations, and if an operation is still pending one appends the continuation there, but if an operation has completed the continuation needs to be executed immediately. Unfortunately, in the core dispatch loop executing continuations there immediately creates race conditions, so what AFIO does is to create an immediate_async_ops instance at the very beginning of the call tree for any operation. Deep inside the engine, inside any mutexes or locks, it sends continuations which must be executed immediately to the immediate_async_ops instance. Once the operation is finished and the stack is unwinding, just before the operation API returns to user mode code it destructs the immediate_async_ops instance and therefore dispatches any continuations scheduled there without any locks or mutexes in the way.

This is quite neat, and is very efficient, but it is also intrusive and requires all your internal APIs to pass around an immediate_async_ops & parameter which is ugly. This callback-on-the-same-thread pattern is of course exactly what coroutines/fibers give us -- a way of scheduling code to run at some point not now but soon on the same thread at the next 'resumption point'. As with AFIO's immediate_async_ops, such a pattern can dramatically simplify a code engine implementation, and if you find your code expending much effort on dealing with error handling in a locking threaded environment where the complexity of handling all the outcome paths is exploding, you should very strong consider a deferred continuation based refactored design instead.

Finally, what does making your code resumable ready have to do with i/o or threads? If you are not familiar with WinRT which is Microsoft's latest programming platform, well under WinRT nothing can block i.e. no synchronous APIs are available whatsoever. That of course renders most existing code bases impossible to port to WinRT, at least initially, but one interesting way to work around "nothing can block" is to write emulations of synchronous functions which dispatch into a coroutine scheduler instead. Your legacy code base is now 100% async, yet is written as if it never heard of async in its life. In other words, you write code which uses synchronous blocking functions without thinking about or worrying about async, but it is executed by the runtime as the right ordering of asynchronous operations automagically.

What does WinRT have to do with C++? Well, C++ 1z should gain both coroutines and perhaps not long thereafter the Networking TS which is really ASIO, and ASIO already supports coroutines via Boost.Coroutine and Boost.Fiber. So if you are doing socket i/o you can already do "nothing can block" in C++ 1z. I'm hoping that AFIO will contribute asynchronous Filesystem and file i/o, and it is expected in the medium term that Boost.Thread will become resumable function friendly (Microsoft have a patch for Boost.Thread ready to go), so one could expect in not too distant a future that if you write exclusively using Boost facilities then your synchronously written C++ program could actually be entirely asynchronous in execution, just as on WinRT. That could potentially be huge as C++ suddenly becomes capable of Erlang type tasklet behaviour and design patterns, very exciting.

Obviously everything I have just said should be taken with a pinch of salt as it all depends on WG21 decisions not yet made and a lot of Boost code not yet written. But I think this 100% asynchronous vision of the future is worth considering as you write your C++ 11/14 code today.

18. COUPLING/SOAPBOX: Essays on non-technical best practices within the C++ ecosystem

If the last technical section on C++ coroutines was somewhat speculative due to the present uncertainties about the future developments in C++, this entire section of essays is almost entirely discussion pieces as they have no known good answers. I will write as if more certain than I actually am for the purposes of clarity and brevity. Moreover, two of the later essays argue the opposite from one another on the basis of plenty sound good reasons, so the intent is really to make you think and reflect about "the big picture" before you design your (C++) library whether or not you agree with anything I write below.

Consider them therefore more as food for thought rather than recommendations, and they are obviously my (Niall Douglas) personal perspective and interpretation on things. I'll make any discussion of the problem of how to deal with large legacy C++ code bases Boost-centric as contractual NDAs prevent me discussing problems in commercial code bases I have worked upon, but lots of corporations out there (Google is the most famous example) have truly enormous monolithic C++ codebases where this discussion equally applies if not far more so, so please consider find and replacing all mentions of Boost with <insert your large C++ code base here>.

And finally, this section is going to look quite eccentric to most C++ engineers as I will use arguments unfamiliar to most in software. Part of this is because I am equally trained (in terms of degrees) in Economics and Management as I am in Software, plus I am an affiliate researcher with the Waterloo Institute for Complexity and Innovation Research Institute so this analysis considers software, and the human systems surrounding it, as the one and same complex system.

Modular vs Monolithic

As we all know after a few years of experience, there is a constant tension in large scale software development between MODULARITY (tightly specified at points of coupling between structures) and MONOLITHICITY (loose or unspecified at points of coupling between structures) where structures include the human organisations which do the programming. This tradeoff is fundamental to the universe and appears throughout mathematics and physics because of how rigidity works: the choice between unpredictability and rigidity is always a zero sum tradeoff, so you get to choose which parts of a system you would like to make flexible only by making other parts of the same system inflexible.

Modular software has very strongly defined interface contracts between modules. This allows teams working within that module to change the module with any change ripples hopefully never propagating outside the module boundary, and therefore what one team does should not accidentally affect the work of another team if a module boundary is between them. As effort required to modify software approximates its size cubed, for very large organisations this is an enormous productivity gain because it enables your development to occur as if you were working on a much smaller codebase, and the most popular and successful modularity framework for C++ by far is Microsoft COM (the latest iteration of which is called Windows Runtime) which consists of a well specified event handling loop and framework and a well specified method of inspecting and calling functions provided by a COM component. Unfortunately, very strictly defining coupling between modules has an enormous cost in what is allowed to traverse between modules due to needing to prevent ripple traversal, and even to this day nothing may pass through Microsoft COM/WinRT which cannot be perfectly represented in C, so no C++ feature not perfectly representable in C can traverse a COM boundary.

Monolithic software has loose or unspecified (anything goes) interface contracts between parts. This lets programmers do what they want when they want, and that usually means a much more enjoyable experience for the programmer because of the feeling of power and control they get over the code (even if experience and self discipline mean they rarely exercise it). This approach encourages experimentation, fun, prototyping and more often than not is how volunteer-led open source software ends up being organised at a code level, partially because as a hobby you don't want to bother with all the boring compliance and worrying about other teams stuff you have to do at work. However just because an open source codebase may be monolithic internally doesn't mean that open source software is monolithic between codebases, if anything the extreme reusable modularisation of cloud and web services into individual single purpose solutions supplied at a network socket which the internet public mash together has been the truly defining innovation of the past twenty years of software.

C++ in the 21st century

As most younger C++ programmers, or indeed non-C++ programmers will ruefully note, C++ as a language and an ecosystem has almost entirely ignored the trends of the past twenty years towards extreme reusable modularity. The last big innovation in C++ modularity was Microsoft COM back in 1993, and it formalised the C++ then available by Cfront because that was the most standardised implementation available in 1989 when COM was originally being proposed. To use Microsoft COM is therefore to restrict oneself to C++ as it was approximately in 1989. Despite such an enormous limitation, Microsoft COM is enormously popular, and those same limitations means you can wrap any language capable of speaking C inside a COM object whereupon it can be used by any other COM object without regard as to how it is internally implemented.

I don't know this for sure, but I suspect much of the recent investment by the tech majors in new systems programming languages (Swift by Apple, Go by Google, and probably the two biggest upcoming threats to C++, Rust by Mozilla and the conversion of .NET into a portable systems programming platform by Microsoft) stems from the continuing abject failure of C++ to finish supplanting C as the best general purpose systems language available (note that none of these work easily with C++, .NET is probably the easiest and Swift the next easiest via an Objective C++ shim, after that you're stuck with SWIG bindings).

A lot of that continuing abject failure to finish supplanting C and become the best general purpose systems language available stems in my opinion from the followings causes:

  1. A deliberate refusal by C++ to finish becoming a better systems glue language

Investing in the development of a real modular component framework substantially improving on Microsoft COM (C++ 1z Modules is actually a build performance feature and has little to nothing to do with actual modularisation -- this occurred due to repeated watering downs of the Modules proposal), standardising the C++ ABI, or even providing a C reflection library allowing C code to inspect some unknown C++ binary in a useful way, would go a long way to helping persuade language runtimes such as Python and the other new languages to use the C++ instead of C ABI. Indeed, such is the continuing failure of C++ as a community here that increasingly LLVM is becoming the next generation "better than C with just enough C++ stuff support" ABI, so the new systems languages such as Rust target LLVM, indeed the recent conversion of .NET into a portable systems programming platform was also to LLVM.

The thing which annoys me and many others is that a much enhanced replacement for Microsoft COM is not particularly technically challenging (note that paper is by myself), it just requires someone to supply the sustained funding for the necessary two or three years to make it happen. This is exactly what the Standard C++ Foundation is supposedly there to fund, but I have observed almost zero interest from the C++ community to invest in becoming a better neighbour and especially glue to other programming languages. I suspect, sadly, C++ will need to be mortally threatened by something like Rust to get its act together and get its head out of its ivory tower.

Even that big picture stuff aside, there is a disdain within the C++ community for considering ease of use by other languages when designing standard C++ library facilities. It's hardly the only example, but one of my personal bugbears is the current design of std::future<T> which is an arse to use from C code because you can't compose waits on a future with anything else you'd wait upon in C or any other language for that matter. If std::future<T> had the optional ability to be waited upon along with other things inside a select()/epoll()/kqueue() multiplexed wait call, that would be enormously useful to anyone needing to work with C++ futures from outside C++. Hell, it would even be super useful within C++ right now, at least until the Concurrency TS when_any()/when_all() supports comes through.

  1. A tiredness and fear of disruptive change in the C++ thought leadership

Internally C++ has undergone the wrenching change of getting the enormously delayed and overdue C++ 11 standard out of the door, and to achieve that the C++ leadership essentially had to eat its own young in a process of attrition to such an extent that quite a number of the leading lights which were so historically influential in the leadership have since left C++ altogether either due to exhaustion, bitterness or their wives suggesting that divorce would be imminent unless they stopping putting C++ before their families. What I think has happened is that it has left the C++ leadership quite tired of disruptive change, and enormously defensive of any suggestion that they are not mostly correct on any given technical choice as I have often observed personally just on the Boost Developers mailing list, let alone at C++ conferences. People I know who attend recent WG21 meetings find them overwhelmingly negative and exhausting, and more than one has observed that the only kind of person who now thrives there is at best of a sociopathic disposition (and don't get me wrong, the right kind of sociopath even psychopath can be crucial to the success of any technically challenging and complex project, but for it to work you generally want at most just one sociopath in the room at once, collecting many of them into the same room never produces a positive outcome welcomed by all). Setting aside the consequences of the brain drain of good engineers forced out by the C++ process and culture, I think this was and is a great shame given what it could be instead, and I'll elaborate a bit more below on what it ought to be instead.

  1. A rejection of the business side of the C++ open source ecosystem

Most of the C++ leadership are -- how shall I put it -- of a certain age and job security not shared by younger C++ programmers or programmers in the newer languages. Back when they were younger programmers, open source software was something you did as a hobby, a charity and as a noble and especially non-commercial pursuit outside of normal work hours as a means of achieving an excellence in your open source software that your employer wouldn't permit for business reasons. Employment was very secure, benefits and pensions were high, and you had a strong chance of lifelong employment at the same employer if you so wished. Thanks to you working pretty much in the same field and technologies continuously, both within open source and within your employment your authority as an engineer had a good correlation with your years of experience, so pay rises and seniority were automatic. You knew your position in the hierarchy of things, and you had worked and sacrificed to reach that position over a long career. Unsurprisingly, this generation of software engineer likes permanence, dislikes radical change, and really hates throwing away known-good code even when it is no longer fit for purpose due to bitrot and/or lack of maintenance. They also dislike the idea that one should make money or a living from open source, and especially that one ought to leverage the dependency of firms on your free of cost open source software to extract rents (they call it blackmailing).

The younger programmer tends to see a career in the tech industry for what it is: a lousy industry where employees are disposable resources to be mined for all their value before disposal, pensions are probably worthless assuming you'll ever retire at a reasonable age, real estate pricing is far beyond your means and you're still saddled with enormous debts from gaining all those degrees, and the only advantage of being in tech as compared to other industries is that it's far worse in most other industries available to your age cohort -- assuming you can get a job at all despite holding multiple Masters degrees and being vastly more qualified than the people interviewing you. For many a younger programmer, contributing to open source means something far different to the older generation: an opportunity to escape the meaningless existence of writing pointless code for other people who only care about you insofar as to make you drink their Koolaid and place their company before everything else in your life, including your family. And by escape, I don't just mean merely psychologically, I mean that most have a vision that one day their open source efforts could turn into a sufficient means for living away from those who exploit you, whether self employed via a startup or remote expert consulting for some crazy hourly rate at which point those same earlier employers tend to suddenly value your time and inconvenience oddly enough.

This is obviously a very different experience and understanding of open source software. Not coincidentally most of the big open source software projects invest heavily in the business side of acquiring and dispersing funding for vital work on the software, and this trend of seeing the open source software organisation as primarily one of actualising its business side instead of being just some central source code repo and web presence is the most prevalent in the newest open source projects simply through them being founded more recently, and therefore adopting what they felt was the state of the art at the time.

In case you are one of those engineers of a certain age and don't know what I'm talking about here, let me compare two open source projects: the Boost C++ Libraries and the Django web framework. Boost was originally conceived as a proving ground for new C++ standard library ideas making better use of the 1998 C++ ISO standard. Due to its roots as a playpen and proving ground for standard C++ libraries, Boost is unusually similar to the C++ standard library which is both good and bad: good in terms of the quality of design, algorithms, implementation and testing -- bad in terms of monolithicity, poor coupling management, and over-reliance on a single person in charge of each library (great for the library if that maintainer is active, not so great across libraries when maintainers must work together, terrible if a maintainer vanishes or departs). In particular, like many other open source projects founded in the 1990s, Boost does not believe in there being a business side to itself apart from its annual conferences which began in 2006, and its steering committee website has this excellent paragraph which was written after I caused a fuss about how little steering the "steering committee" does:

"In the Boost community decisions have always been made by consensus and individual members have shown leadership by stepping forward to do what they felt needed to be done. Boost has not suffered from a lack of leadership or volunteer participation. It is not the role of the Steering Committee to inhibit this kind of spontaneous leadership and action, which has served Boost and the wider C++ community so well. On the contrary, it is the role of the Steering Committee to facilitate community-based leadership and decision making. The role of the Committee is to be able to commit the organization to specific action either where funds are required or where consensus cannot be reached, but a decision must be made."

Firstly, I do appreciate Jon for even getting this statement put together at all -- the steering committee had managed to get four years to pass without actually stating what it felt its purpose was which was not what the slides at BoostCon 2011 said it was going to do. As much as this is helpful, do you notice that the "mission statement" clearly disavows taking any leadership role whatsoever unless unavoidable? This effectively means that the Boost steering committee is really a board of trustees who have very little strategic interaction with the thing they approve funding for, and that is pretty much what happens in practice: individual committee members may help you out privately on something, but publicly they have no position on anything unless someone petitions them to make a decision. Note that they make no decisions at all until someone formally asks for one, just as with a board of trustees.

So what is the problem one may now ask? Well, you've got to understand what it is like trying to achieve anything in Boost, and indeed those BoostCon 2011 slides summed up the problem nicely. Firstly you must achieve consensus in the community, which involves persuading a majority on the Boost Developers list -- or rather, persuading a majority of those who respond on boost-dev that what you proposed is not a terrible idea, often by you investing dozens of hours of your free unpaid time into some working prototype you can show people. This part also usually involves you fending off trolls, those with chips on their shoulder, those with vested interests, those threatened by any form of change, those who simply don't like you and are trying to take you down a peg etc which most unfortunately can include members of the steering committee itself. To achieve any significant change usually involves years of campaigning, because to get a dispersed heterogeneous community with few common interests to reach consensus on some breaking change even with a proven working prototype takes a minimum of a year, and usually many years, and again you must have a skin thick enough to repel all those trolls and naysayers I mentioned. If this begins to sound as disspiriting and as exhausting as attending WG21 meetings -- except this effort is entirely unpaid -- you would be right. Anyway, if after years of unpaid effort and toil you finally reach consensus, then you can apply to the steering committee for funding to implement and deliver your change, by which stage -- to be blunt -- the funding is but a tiny fraction of the time, blood and sweat you have personally already invested.

Note that if you have a job with a high status role and excellent job security and believe open source to be a noble non-commercial hobby, then taking years to change a community consensus is far less important to you than if you have no job security, your job is menial and you just want -- and I hate to be so blunt about this -- the open source project leadership to proactively help you rather than aloofly stand apart until you've invested years of free unpaid effort for potentially no real gain. Those unpaid hours of effort could go on actually writing new code, in an environment more welcoming, perhaps even in a community you yourself create. Most with this opinion do not of course voice it, and simply silently leave the community for somewhere more contemporary and less hostile to evolution.

Let's look at what a modern open source project does differently: Django was first released in 2005, and it took just three years to establish the Django Software Foundation which operates the business side of Django the open source software project, specifically as a charity which promotes, supports and advances the Django web framework by (from its mission statement):

  • Supports development of Django by sponsoring sprints, meetups, gatherings and community events.
  • Promote the use of Django amongst the world-wide Web development community.
  • Protect the intellectual property and the framework's long-term viability (i.e. invest in blue sky new development).
  • Advance the state of the art in Web development.

In practice that means a constant cycle of acquiring and maintaining regular donations from the firms and users who use your free of cost open source software, whether from fee paying training courses and conferences, or simply through a donate button on the website, but mostly through investing effort to build networks and relationships with your biggest commercial users and leveraging those networks and relationships into a regular donations stream. Concomitant with that leveraging is a reverse leveraging by those sponsors on the future strategic direction of the open source project, so when this funding process is working well you the open source org gets funds to dispose upon a future vision of the open source project previously agreed (or at least discussed) with your major sponsors. This is why leveraging your open source project for funding is not blackmail but rather funded coevolution, despite what many in the C++ leadership might think.

In Django, you pitch your idea for some change, whether radical or not, to a small, contained authoritative leadership where the authority to decide and fund any initiative is clearly demarcated. They more often or not are acting really as a filter for appropriately repackaging and presenting ideas to the sponsors for funding, though they usually have some slush money around for the really radical experimental stuff (if it is cheap). The process for enacting change is therefore extremely well specified, and moreover because a central authority will approve or disapprove an idea quickly, you don't waste time pushing ideas without a chance. If they do approve an idea, you get actual true and genuine real support from the leadership instead of being cast alone into the wilderness to argue with a mailing list to build "consensus" for some idea.

Compare that to Boost. And Django is hardly the only business orientated open source project. Plone has a well established funding pipeline for regular sprints. Drupal is particularly aggressive, and indeed effectively runs a Kickstarter once per release to fund the sprint needed to make the release happen. Some might argue that these are all products rather than an umbrella of heterogeneous libraries and are therefore fundamentally different, if so consider the funding pipeline the Apache Software Foundation runs.

To sum up this point, back in the early 2000s at the beginning of Boost the improvements needed to the standard C++ standard libraries were obvious and therefore consensus was easy to obtain. This made the management processes developed back then tenable. Also, so many vested interests in inhibiting change or challenge weren't established yet, so Boost worked well and delivered an enormous contribution to the C++ TR1 (ten libraries) and C++ 11 (another ten libraries).

But open source, and the world, has moved on, and the current gold standard of open source practice is to run your open source project as a business. C++ has recently set up a Standard C++ Foundation which could prove less weak willed and ineffective than Boost's current leadership, but to date the only big risk that I know they've taken was to fund Eric Niebler's work on Ranges for C++, so I think the jury is still out on whether they will rise to the standards set by the Python Foundation, the Plone Foundation, the Django Foundation or even the Linux Foundation. Which brings me to the final major cause of why I think C++ is failing to become the best general purpose systems language available ...

  1. A pathological overemphasis on the primacy of ISO WG21 as the right place to lead out the evolution of C++

I don't know for sure, but I suspect that many of the problems I just outlined in the C++ ecosystem relative to its competitors and peers are recognised by some in the C++ leadership. Certainly, what's gone wrong with the Boost libraries and the refusal to lead library development as they once did has definitely come up in private conversations on multiple occasions. Perhaps as a consequence there has been a shift of moving new standard C++ development to occur under the umbrella of ISO WG21 which is the working group responsible for C++, and a concomitant reduction in those same engineers contributing their libraries to Boost for pre-standardisation testing as was historically done.

I hate to be blunt, but as a former ISO SC22 mirror convenor for the Republic of Ireland I will categorically state this: The International Standards Organisation is designed to standardise existing practice not design new standard practice. It has the wrong schedule, processes and organisation to develop new standard practice, however much money and resources you throw at it in funding special study groups, individuals to write feature prototypes, and people to act as champions of some feature at every meeting.

ISO, if anything, is really where you send your company's representatives to stop business damaging things being standardised (by your competitors). It has always been this when you reduce the purpose of the organisation to its fundamentals, and its configuration of one vote per country strongly favours multinational corporations who can field employees in many countries, and therefore gain power to influence standardisation decisions at a global level. And just to repeat myself, by "power to influence" I really mean "power to prevent bad ideas from being standardised" where bad ideas mean anything which could have a hard permanent effect on your profit line. This makes ISO a conservative enforcing body, and for the record that's a great thing and it's why ISO works well for the purpose of standardisation.

It also makes it a lousy place to invent new stuff. I mentioned earlier that attending WG21 has of late become particularly negative and exhausting, and a lot of that is because too many contentious decisions are being squeezed into a place ill suited to take contentious decisions. ISO meetings are supposed to be about:

  • Reviewing existing practice in an engineering field.
  • Debating if some existing practice is ready for standardisation.
  • If so, what minor trivia need fixing before standardisation.
  • If not, end of discussion and next item.

In other words, if a decision is "hard" at ISO, that means whatever it is isn't ready for standardisation. Debate -- at ISO level at least -- is concluded.

The problem of ill fit for anything innovative at ISO is exactly why ISO WG15 (POSIX) decided to relocate its decision making to the Austin Working Group. And even then the AWG almost exclusively reviews existing practice in POSIX implementations, debates if it's ready for standardisation, and if so formalises a regular report to send to ISO WG15 for country-level debate and standardisation. If you want a new item into POSIX the process is straightforward and uncontroversial:

  • Implement it in a POSIX implementation of your choice (usually Linux, sometimes FreeBSD).
  • Get people using it over a period of years.
  • Implement it in many POSIX implementations, or get someone else to do it for you because it was so popular in Linux.
  • Get more people using it in a cross-platform way for a period of years.
  • Ask the Austin Working Group to consider standardising it.
  • Spend another few years fine tuning the standard text and getting it signed off by the major POSIX implementations (this part is easy if they already have an implementation).
  • Profit!

Total time is typically five to ten years, but then adjusting POSIX is supposed to be extremely conservative and hard and rightly so.

It probably should be just as conservative and hard to change C++ language features, but I don't think it should be so for C++ standard libraries. If instead of throwing money inefficiently at trying to make WG21 dance other to its nature the Standard C++ Foundation instead funded the strategic directed evolution of quality C++ libraries just as the Python Foundation does for Python or the Plone Foundation does for Plone they would:

  • Replace Boost's leadership with something far more proactive, modern and capable of giving a good fight to the upstart systems programming languages.
  • Establish and maintain best practice for C++ library development with a funded model which can then be easily standardised by WG21 after a library becomes the established standard practice.
  • Ideally, the Standard C++ Foundation funded implementations would be donated as-is to each of the three major standard C++ library implementations (Dinkumware, libc++, libstdc++) to save them reinventing the wheel.
  • Unblock WG21 so they can actually get on with standardisation instead of ceaselessly and inefficiently and negatively arguing about standardisation.
  • Show the way forwards for the design of new language features instead of trying to design top down and ending up with a cancer like the original C++ concepts. I suppose I had better explain what I mean by a cancer in this context, so here goes. During the 2007-2008 push by WG21 to reach C++ 0x, items began to be cut for the first initial draft of what would become C++ 11 for purely political reasons. The big problem was that if you proposed some library X for standardisation, someone on ISO would say "library X would look completely different once expected new language feature Y is in the language, therefore library X shouldn't enter the standard right now". The most damaging expected new language feature Y was undoubtedly original C++ concepts as that killed off entire tracts of exciting C++ library standardisation, much to the often bitterness of those who had invested months to years of their spare time developing those libraries. After all, most new language features are developed by highly paid employees where it is their day jobs, whereas of the twenty or so Boost libraries now in C++ 11 were generally developed in the family time of enthusiasts who earned a fraction of the pay of those people on WG21 killing off their work, and to keep a chipper attitude whilst those who have not sacrificed shoot down often years of your work is not easy. No wonder we have seen a slow exodus of Boost old timers since the 2011 C++ standard, some leaving C++ completely for good for employment in large corporations which frown on employees having any interests outside the corporation, some disavowing Boost forever and having anything to do with Boost, and some no longer trying to get their libraries into Boost (i.e. past any form of review process) and preferring to house their C++ libraries elsewhere and specifically away from potential standardisation.

To conclude this rather long section, I believe that if C++ is to remain relevant and fresh in the 21st century, this is needed:

  • C++ needs to return back to basics, and finish becoming a complete superset for C for almost all users of C.
  • C++ needs to become the perfect neighbour for other programming languages, and their first choice as a systems programming glue.
  • Once that is achieved, a proper replacement for Microsoft COM is needed (hopefully reusing the ASIO/Networking TS event loop).
  • After that, my personal preference would be for an as-if everything inlined build system that eliminates the need for all other C++ build systems. I'll speak more on that idea below.
  • Boost's leadership or a replacement for Boost takes on the role of proactively deciding what new C++ libraries are needed to solve the strategic goals set and finding contractors to design, peer review and implement such solutions.
  • This work would be funded by a business orientated Boost or Boost-like open source organisation in collaboration with the Standard C++ Foundation.

TODO AFTER THIS

A brief and cynical history of Boost and its relationship to C++

Boost has been experiencing in recent years a number of problems surrounding its scalability as it grows and especially as it matures (specifically, maintenance and cultural fear of any substantial change). Historically speaking, 1999 Boost was first a proving ground for new C++ standard library ideas making better use of the 1998 C++ ISO standard, but from August 2000 onwards it also began to turn into a broken compiler workarounds layer which whilst initially was great for those with broken compilers, it incurred an enormous technical debt into the codebase which still weighs heavily upon anyone trying to change anything substantial affecting more than one library, including cultural beliefs in the importance of support of broken toolsets. Due to its roots as a proving ground for standard C++ libraries, Boost is unusually similar to the C++ standard library which is both good and bad: good in terms of the quality of design, implementation and testing - bad in terms of monolithicity, poor coupling management, and over-reliance on a single person in charge of each library (great for the library if that maintainer is active, not so great across libraries when maintainers must work together, terrible if a maintainer vanishes or departs).

Boost's "high point" was probably between 2001 when the modern peer review process was formalised (and has remained remarkably, perhaps unfortunately, unchanged since) when it had thirty three libraries and 2005 when the Boost Software Licence was completed, most of the famous Boost libraries had reached their final designs, and ten Boost libraries were accepted into the C++ TR1 standard when it had sixty-nine libraries.

Some may be surprised at the choice of 2005 given that BoostCon began in 2006, but if you examine the Wayback Machine for boost.org from 2005 onwards you will see a plateau was reached from 2005 onwards, and I'm going to claim that this is because Boost had achieved its original mission for the first time and had therefore taken its first step towards obsoleting itself. The push was now on for the C++ 0x ISO standard, and I think the period 2007-2008 was crucial because in my opinion it was during this period that many of the Boost old timers began to become bitter with the C++ library process as their items began to be cut for the first initial draft of what would become C++ 11 for purely political reasons. The big problem was that if you proposed some library X for standardisation, someone on ISO would say "library X would look completely different once expected new language feature Y is in the language, therefore library X shouldn't enter the standard right now". The most damaging expected new language feature Y was undoubtedly original C++ concepts as that killed off entire tracts of exciting C++ library standardisation, much to the often bitterness of those who had invested months to years of their spare time developing those libraries. After all, most new language features are developed by highly paid employees where it is their day jobs, whereas of the twenty or so Boost libraries in C++ 11 were generally developed in the family time of enthusiasts who earned a fraction of the pay of those killing off their work, and to keep a chipper attitude whilst those who have not sacrificed shoot down often years of your work is not easy.

2008 was the last time the Boost web site was in any way improved, and even before that stale content was no longer being removed or repaired as it once used to be, thus making the Boost web presence increasingly less authoritative and less a "one stop shop" for answers. We then began to see a slow exodus of Boost old timers, some leaving C++ completely for good for employment in large corporations which frown on employees having any interests outside the corporation, some disavowing Boost forever and having anything to do with Boost, and some no longer trying to get their libraries into Boost (i.e. past any form of review process) and preferring to house their C++ libraries elsewhere and specifically away from the Boost community. A too frequent common factor this author has noticed is the bitterness and anger regarding Boost and its community in some of the big names in Boost who were the minds behind some of its most successful libraries. The peer review process ground to a halt from the end of 2012 onwards, and by early 2014 a noticeable increase was obvious in the rate of decline of Boost since the 2011 ISO C++ standard began to be implemented in available toolchains.

Since 2014 measures have been taken to reinvigorate Boost by a number of actors including myself, but those measures are not germane to this essay. What I will say is that by Summer 2015, by the strictest measure of these things, perhaps as many as sixty of the one hundred and thirty or so Boost libraries could be considered undermaintained or not maintained. That suggests that Boost, under its current set of procedures, culture, management and infrastructure, is struggling to scale past about seventy libraries -- or put another way, the malaise and staleness within Boost started around the same time as the number of libraries in Boost passed about seventy in 2005. The scaling limit being around seventy is of particular significance because that is exactly around where the human cognitive limit for physically separated group sizes lands, and indeed military tactical units have been organised around groups of sixty to eighty soldiers since the Romans simply because trial and error found that number worked best. I am therefore going to assert, rather than claim, that under a volunteer based system where one individual is the maintainer of each library that Boost cannot healthily exceed about seventy libraries (i.e. seventy maintainers) as a single collection, and you will find all my efforts to reinvigorate Boost - including this Handbook - revolve around that assertion of there being a hard scaling limit to each collection.

What does this history have to do with defaulting to standalone capable modular C++ libraries?

The obvious answer is that if a library is capable of coexisting inside multiple collections of libraries including the collection of just itself alone, that enormously increases the scope and opportunity for working around the size limit of seventy. You could, as I have often proposed, have a v1.x collection of legacy C++ 03 Boost libraries and a v2.x collection of C++ 14 Boost libraries where there is some overlap between the collections in that some C++ 03 Boost libraries are available in both collections. One therefore raises the scaling limit from about seventy to potentially one hundred and twenty or so for two collections without suffering from undermaintenance, malaise or staleness as an organisation. If your library defaults to being standalone capable and modular, that enormously improves your ability to have your library coexist in multiple collections of libraries. You therefore are writing ''social code'', not ''a-social code''.

However, let's assume you don't buy the hard scaling limit to a collection size and instead have some more practical use scalability problems such as:

  • I would like Boost (as a whole) to have fewer system requirements (minimum compiler versions, minimum OS support etc). This is usually really an argument in favour of better support for either legacy compilers OR embedded or games systems, but do note the substantial distinction for later.
  • I would like my favourite Boost libraries to have fewer requirements on their dependencies including other Boost libraries, especially exact version requirements (i.e. I want my favourite Boost library to detect and work with multiple versions of Boost). This is usually an argument from those who experience problems with things mildly breaking in different places in each Boost release, and they end up having to mash up their own Boost distro made up of newer and older individual Boost libraries to get the stability they need.
  • I would like to download my favourite Boost libraries and only their strict dependencies within Boost without having to download or even consider during build or configuration any unnecessary other Boost libraries (i.e. the package manager argument).
  • I would like to use my favourite Boost library using the Standard C++ Library facilities that come with C++ 11 instead of being forced into using Boost near equivalents (i.e. I get annoyed dealing with mixes of std::future and boost::future).
  • I would like to drop my favourite Boost library/libraries into my project as a single giant include file with no need to worry about Boost.Build or any build system or even dealing with a Boost source control system.

As much as these often cited user problems might seem unrelated, they are in fact all due to the lack of:

  1. Modularity

Your users can download your library as a self contained distribution, and get immediately to work. The ideal of this form is a single standalone include file which contains everything the library user needs.

  1. Encapsulation

Thinking about and specifying your dependencies properly instead of just firing in some reference to boost::lib::foo and dragging in a whole library just for a single routine. C++ has also spectacularly failed to improve on Microsoft COM as the best available technology for fully encapsulating C++ libraries unfortunately, and as much as such work is "unsexy" the fact a proper C++ component and modularisation system doesn't have a study group at ISO WG21 is an appalling tragedy given the enormous and very well understood productivity gains you get from such a technology.

  1. Low cost trial

From a library user perspective, the biggest overwhelming incentive is usually to adopt "Not Invented Here" as that best ensures continued employment at least in the medium term. So as a library author, if you want anyone to use your library you need to eliminate as many reasons for a library user to find an excuse not to use your library as possible. One of the biggest excuses users will have is simply that of the cost of trying out your library for fit to solving a problem: is installation ready for development a single action? Does including this library quadruple my build times? Does code using this library ever crash, including the compiler? Do I have to click the mouse more than three times to find something in the documentation? If the answer is yes to any of those questions, your library does not have a low cost trial curve, and could even be a-social coding.

  1. Forcing choices onto the library user unnecessarily

When your library insists on only ever using boost::future and being incapable of using std::future, you force your library users to write boilerplate to convert between types of future because another library dependency insists on only ever using std::future. There is no good reason whatsoever why your library forces that choice on its users, except your laziness in not upgrading your library to make better use of the C++ 11 standard library.

Most of the earlier problems go away if you address these four issues, but you'll note that dependency management is essentially being statically done by hand by the library maintainer due to that lack of a modern equivalent to Microsoft COM I mentioned. The wisdom of dependency package managers in C++ 11/14 is exactly what I'll write about next.

18. COUPLING/SOAPBOX: Essay about wisdom of defaulting to standalone capable modular (Boost) C++ 11/14 libraries with no external dependencies

19. COUPLING/SOAPBOX: Essay about wisdom of dependency package managers in C++ 11/14

TODO

git submodules biicode etc

Note: See TracWiki for help on using the wiki.