Context Navigation

Changes between Version 32 and Version 33 of BestPracticeHandbook

Timestamp:: May 28, 2015, 6:10:27 PM (7 years ago)
Author:: Niall Douglas
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

BestPracticeHandbook

-              v32
+              v33
  }}}
  The `await` keyword is rather like range for loops in that it expands into a well specified boilerplate. Let's look at the above code but with some of the boilerplate inserted, remembering that C++ 1z futures now have continuations support via the `.then(callable)` member function, and bearing in mind this is a simplified not actual expansion of the boilerplate for the purposes of brevity (you can find the actual boilerplate expansion in the N-papers submitted to WG21 if you really want):
+ The `await` keyword is rather like C++ 11 range for loops in that it expands into a well specified boilerplate. Let's look at the above code but with some of the boilerplate inserted, remembering that C++ 1z futures now have continuations support via the `.then(callable)` member function, and bearing in mind this is a simplified not actual expansion of the boilerplate for the purposes of brevity (you can find the actual boilerplate expansion in the N-papers submitted to WG21 if you really want):
  {{{#!c++
+// Thanks to Gor Nishanov for checking this code for me!
 // This is some function which schedules some async operation whose result is returned via the future<int>
 extern std::future<int> async_call();
 …
  This boilerplate expanded version may still hurt the head, so here is essentially what happens:
 . We dynamically allocate a stack frame on entry, and that dynamic memory allocation sticks around until the resumable function completes.
 . We always construct a promise-future pair for the result (or whatever synchronisation object `coroutine_traits` says) on entry to the function.
 . If `async_call()` always returns a signalled future, we add the result to total, signal the future and return the future.
 . If `async_call()` ever returns an unsignaled future, we ask the unsignaled future to resume our detached path when it is signalled. We suspend ourselves, and return our future immediately. At some point our detached path will signal the future.
+. We dynamically allocate a stack frame on entry, and that dynamic memory allocation sticks around until the resumable function completes.
+. We always construct a promise-future pair for the result (or whatever synchronisation object `coroutine_traits` says) on entry to the function.
+. If `async_call()` always returns a signalled future, we add the result to total, signal the future and return the future.
+. If `async_call()` ever returns an unsignaled future, we ask the unsignaled future to resume our detached path when it is signalled. We suspend ourselves, and return our future immediately. At some point our detached path will signal the future.
  The `await_ready(), await_suspend()` and `await_resume()` functions are firstly looked up as member functions of the synchronisation type returned by the traits specialisation. If not present, they are then looked up as free functions within the same namespace as the synchronisation type with usual overload resolution.
 . `yield`: repeatedly suspends the execution of the current function with the value specified to yield, resuming higher up the call tree with that value output. Note that the name of this keyword may change in the future. Yield can be considered as more boilerplate sugar for a repeated construction of a promise-future pair then repeatedly signalled with some output value until the generating function (i.e. the function calling `yield`) returns, with the sequence of those repeated constructions and signals wrapped into an iterable such that the following just works:
+. `yield`: repeatedly suspends the execution of the current function with the value specified to yield, resuming higher up the call tree with that value output. Note that the name of this keyword may change in the future. Yield is implemented as yet more boilerplate sugar for a repeated construction of a promise-future pair then repeatedly signalled with some output value until the generating function (i.e. the function calling `yield`) returns, with the sequence of those repeated constructions and signals wrapped into an iterable such that the following just works:
  {{{#!c++
 …
  This just works because `generator<int>` provides an iterator at `generator<int>::iterator` and a promise type at `generator<int>::promise_type`. The iterator, when dereferenced, causes a single iteration of `fib()` between the `yield` statements and the value yielded is output by the iterator. Note that for each iteration, a new promise and future pair is created, then destroyed and created, and so on until the generator returns.
 All this is great if you are on Microsoft's compiler, but what about the rest of us before C++ 1z? Luckily [http://olk.github.io/libs/fiber/doc/html/ Boost has a conditionally accepted library called Boost.Fiber] which was one of the C++ 11/14 libraries reviewed above and this, with a few caveats, provides good feature parity with proposed C++1z coroutines at the cost of having to type out the boilerplate by hand. Boost.Fiber provides a mirror image of the STL threading library, so:
+All this is great if you are on Microsoft's compiler which has an experimental implementation of these proposed resumable functions, but what about the rest of us before C++ 1z? Luckily [http://olk.github.io/libs/fiber/doc/html/ Boost has a conditionally accepted library called Boost.Fiber] which was one of the C++ 11/14 libraries reviewed above and this, with a few caveats, provides good feature parity with proposed C++1z coroutines at the cost of having to type out the boilerplate by hand. Boost.Fiber provides a mirror image of the STL threading library, so:
 || `std::thread`             || => || `fibers::fiber`              ||
 …
 || `std::future<T>`          || => || `fibers::future<T>`          ||
 Rewriting the above example to use Boost.Fiber and Boost.Thread instead:
+Rewriting the above example to use Boost.Fiber and Boost.Thread instead, remembering that Boost.Thread's futures already provide C++ 1z continuations via `.then(callable)`:
 {{{#!c++
+// Thanks to Oliver Kowalke for checking this code for me!
 // This is some function which schedules some async operation whose result is returned via the future<int>
 extern boost::future<int> async_call();
 …
 }}}
 As you can see, there is an unfortunate amount of extra boilerplate to convert between Boost.Thread futures and Boost.Fiber futures, plus more boilerplate to make `accumulate()` into a resumable function -- essentially one must boilerplate out `accumulate()` as if it were a kernel thread complete with nested lambdas. Still, though, the above is feature equivalent to C++ 1z coroutines, but you have it now not years from now (for reference, if you want to write a generator which yields values to fibers in Boost.Fiber, simply write to some shared variable and notify a `boost::fibers::condition_variable` followed by a `boost::fibers::this_thread::yield()`, writing to a shared variable without locking is safe because Fibers are scheduled cooperatively).
+As you can see, there is an unfortunate amount of extra boilerplate to convert between Boost.Thread futures and Boost.Fiber futures, plus more boilerplate to make `accumulate()` into a resumable function -- essentially one must boilerplate out `accumulate()` as if it were a kernel thread complete with nested lambdas. Still, though, the above is feature equivalent to C++ 1z coroutines, but you have it now not years from now (for reference, if you want to write a generator which yields values to fibers in Boost.Fiber, simply write to some shared variable and notify a `boost::fibers::condition_variable` followed by a `boost::fibers::this_thread::yield()`, writing to a shared variable without locking is safe because Fibers are scheduled cooperatively on the thread on which they were created).
 So after all that, you might be wondering what any of this has to do with:
 …
 * Callbacks, including `std::function`.
+We'll take the last first. Traditionally if you needed to issue a callback to some user supplied `std::function` or even a C function pointer, if it was guaranteed lightweight you left that invocation inline to your code or if was guaranteed threadsafe you pushed it onto some thread pool to be executed later and so on. With resumable functions/coroutines or Boost.Fibers, you have a new option: ''execute the callback sometime soon after I exit on this thread''. And that opens a number of rather neato design opportunities perhaps illustrated by this legacy design pattern taken from proposed Boost.AFIO:
+We'll take the last first. Callbacks, in the form of an externally supplied C function pointer, are as old as the hills and in C++ are best represented as a `std::function` object. As a rule, the code which accepts externally supplied callbacks often has to impose various ''conditions'' on what the callback may do, so a classic condition imposed on callbacks is that reentrancy is not permitted (i.e. you may not use the object from which you are being called back) because maybe a mutex is being held, and reentering the same object would therefore deadlock. As with signal handlers on POSIX, if you wish to actually do anything useful you are then forced to use the callback to ''schedule'' the real callback work to occur elsewhere, where the classic pattern is scheduling the real work to a thread pool such that by the time the thread pool executes the real work implementation, the routine calling the callbacks will have completed and released any locks.
+However, scheduling work to thread pools is expensive. Firstly, there is a thread synchronisation which induces CPU cache coherency overheads, and secondly if the thread pool gets there too soon it will block on the mutex, thus inducing a kernel sleep which is several thousand CPU cycles. A much more efficient alternative therefore is to schedule the work to occur on the same thread after the thing doing the callback has exited and released any locks.
+This more efficient alternative is illustrated by this code taken from proposed Boost.AFIO:
 {{{#!c++
 …
 }}}
 What this does is lets you enqueue packaged tasks (here called enqueued tasks) into an `immediate_async_ops` accumulator. On destruction, it executes those stored tasks, setting their futures to any results of those tasks. What on earth might the use case be for this? AFIO needs to chain operations onto other operations, and if an operation is still pending one appends the continuation there, but if an operation has completed the continuation needs to be executed immediately. Unfortunately, in the core dispatch loop executing continuations there immediately creates race conditions, so what AFIO does is to create an `immediate_async_ops` instance at the very beginning of the call tree for any operation. Deep inside the engine, inside any mutexes or locks, it send continuations which must be executed immediately to the `immediate_async_ops` instance. Once the operation is finished and the stack is unwinding, just before the operation API returns to user mode code it destructs the `immediate_async_ops` instance and therefore dispatches any continuations scheduled there without any locks or mutexes in the way.
 This pattern is of course exactly what coroutines/fibers give us -- a way of scheduling code to run at some point not now but soon on the same thread. As with AFIO's `immediate_async_ops`, such a pattern can dramatically simplify a code engine implementation, and if you find your code expending much effort on dealing with error handling in a locking threaded environment where the complexity of handling all the outcome paths is exploding, you should very strong consider a coroutine based refactored design instead.
 Finally, what does making your code resumable ready have to do with i/o or threads? If you are not familiar with WinRT which is Microsoft's latest programming platform, well under WinRT ''nothing can block'' i.e. no synchronous APIs are available whatsoever. That of course renders most existing code bases impossible to port to WinRT, at least initially, but one interesting way to work around "nothing can block" is to write emulations of synchronous functions which dispatch into a coroutine scheduler instead. Your legacy code base is now 100% async, yet is written as if it never heard of async in its life. In other words, you write code which uses synchronous blocking functions without thinking about or worrying about async, but it is executed by the runtime as the right ordering of asynchronous operations automagically.
 What does WinRT have to do with C++? Well, C++ 1z should gain both coroutines and perhaps not long thereafter Networking which is really ASIO, and ASIO already supports coroutines via Boost.Coroutine and Boost.Fiber. So if you are doing socket i/o you can already do "nothing can block" in C++ 1z, or shortly after the 1z release. I'm hoping that AFIO will contribute asynchronous Filesystem and file i/o, and it is expected a medium term Boost.Thread rewrite should become resumable function friendly, so one could expect in not too distant a future that if you write exclusively using Boost facilities then your synchronously written C++ program could actually be entirely asynchronous in execution, just as on WinRT. That could potentially be huge as C++ suddenly becomes capable of Erlang type tasklet behaviour and design patterns, very exciting.
 Obviously everything I have just said should be taken with a pinch of salt as it all depends on WG21 decisions not yet made and a lot of Boost code not yet written. But I think this vision of the future is worth considering as you write your C++ 11/14 code today.
+What this does is lets you enqueue packaged tasks (here called enqueued tasks) into an `immediate_async_ops` accumulator. On destruction, it executes those stored tasks, setting their futures to any results of those tasks. What on earth might the use case be for this? AFIO needs to chain operations onto other operations, and if an operation is still pending one appends the continuation there, but if an operation has completed the continuation needs to be executed immediately. Unfortunately, in the core dispatch loop executing continuations there immediately creates race conditions, so what AFIO does is to create an `immediate_async_ops` instance at the very beginning of the call tree for any operation. Deep inside the engine, inside any mutexes or locks, it sends continuations which must be executed immediately to the `immediate_async_ops` instance. Once the operation is finished and the stack is unwinding, just before the operation API returns to user mode code it destructs the `immediate_async_ops` instance and therefore dispatches any continuations scheduled there without any locks or mutexes in the way.
+This is quite neat, and is very efficient, but it is also intrusive and requires all your internal APIs to pass around an `immediate_async_ops &` parameter which is ugly. This callback-on-the-same-thread pattern is of course exactly what coroutines/fibers give us -- ''a way of scheduling code to run at some point not now but soon on the same thread at the next ''''resumption point''''''. As with AFIO's `immediate_async_ops`, such a pattern can dramatically simplify a code engine implementation, and if you find your code expending much effort on dealing with error handling in a locking threaded environment where the complexity of handling all the outcome paths is exploding, you should very strong consider a deferred continuation based refactored design instead.
+Finally, what does making your code resumable ready have to do with i/o or threads? If you are not familiar with [http://en.wikipedia.org/wiki/Windows_Runtime WinRT] which is Microsoft's latest programming platform, well under WinRT ''nothing can block'' i.e. no synchronous APIs are available whatsoever. That of course renders most existing code bases impossible to port to WinRT, at least initially, but one interesting way to work around "nothing can block" is to write emulations of synchronous functions which dispatch into a coroutine scheduler instead. Your legacy code base is now 100% async, yet is written as if it never heard of async in its life. In other words, you write code which uses synchronous blocking functions without thinking about or worrying about async, but it is executed by the runtime as the right ordering of asynchronous operations automagically.
+What does WinRT have to do with C++? Well, C++ 1z should gain both coroutines and perhaps not long thereafter the Networking TS which is really ASIO, and ASIO already supports coroutines via Boost.Coroutine and Boost.Fiber. So if you are doing socket i/o you can already do "nothing can block" in C++ 1z. I'm hoping that AFIO will contribute asynchronous Filesystem and file i/o, and it is expected in the medium term that Boost.Thread will become resumable function friendly (Microsoft have a patch for Boost.Thread ready to go), so one could expect in not too distant a future that if you write exclusively using Boost facilities then your synchronously written C++ program could actually be entirely asynchronous in execution, just as on WinRT. That could potentially be ''huge'' as C++ suddenly becomes capable of Erlang type tasklet behaviour and design patterns, very exciting.
+Obviously everything I have just said should be taken with a pinch of salt as it all depends on WG21 decisions not yet made and a lot of Boost code not yet written. But I think this 100% asynchronous vision of the future is worth considering as you write your C++ 11/14 code today.
 == 18. COUPLING/SOAPBOX: Essay about wisdom of defaulting to standalone capable (Boost) C++ 11/14 libraries with no external dependencies ==
+If the last section was somewhat speculative due to the present uncertainties about the future developments in C++, these remaining two sections are almost entirely discussion pieces as they have no known good answers. Consider them therefore more as food for thought rather than recommendations.
+I'm going to argue in favour of defaulting your C++ 11/14 libraries to use no external dependencies i.e. any C++ headers not in your git repository and not in the STL, which includes Boost. External dependencies do ''''not'''' include any git submodules you may have in your git repository, so if your user types:
+{{{
+git clone http://yourlib
+cd yourlib
+git submodule update --init --recursive
+cmake
+}}}
+... then they have a complete, ready to build source code tree with no additional configuration required.
+The advantages of defaulting to no external dependencies are obvious:
+. Modularity: your users can download your library as a self contained distribution, and get immediately to work. [https://github.com/boostcon/cppnow_presentations_2015/raw/master/files/Large-Projects-and-CMake-and-git-oh-my.pdf As David Sankel would put it, this is being not anti-social in your coding].
+. Encapsulation: it forces you to think about your dependencies properly instead of just firing in some `boost::lib::foo` and dragging in a whole library just for a single routine. We did far too much of that historically in the Boost libraries, indeed even putting everything into the `boost` namespace rather than its own sub-namespace.
+. Build times: it lets you assemble all the header files in your library into a giant single include file which can then be distributed as a self standing single file drop in for your users, and which can be precompiled into a precompiled header such that your users experience greatly reduced build times.
+. Defaulting to using STL primitives such as `std::future` and `std::function` instead of alternatives greatly eases the lives of your users when they are trying to combine your library with another library in their application. One of the more tedious things in life is constantly having to invoke `std::async` to get `boost::future` and `std::future` to work together :(.
+TODO
 monolithic vs modular