If you want to run some work asynchronously, you have two basic choices, thread and async:
auto doWork = [] { /* ... */ }; // some async task
std::thread t { doWork };
auto fut = std::async (doWork);
The async approach is often better, for a few reasons:
get the result computed by the task (if we care about that)get, whereas
exceptions thrown when using thread will terminate the program.The OS only has a limited number of threads it can provide. If more threads are requested than the
system can provide, a system_error exception will be thrown, even if the task itself is
noexcept.
If you create more software threads than there are available hardware threads, the OS will be forced to time-slice the threads, introducing potentially costly context switches. Avoiding this is difficult because different systems will have different hardware and workloads.
Using std::async allows the system to schedule work using a system-wide thread pool, improving
load balancing across hardware threads (although this is not required). Using thread directly will
require you to deal with thread exhaustion and load balancing yourself, and your solution may not
interact well with similar systems in other software.
There are a few situations in which using thread directly makes sense:
native_handle.async allows accessing the result of an asynchronous task, and won’t terminate your program if
an exception is thrown, so it should generally be preferred over thread.thread directly requires consideration of thread exhaustion, oversubscription, load
balancing, and portability, so make sure you’ve considered all these aspects (and alternative
solutions) before using it.std::launch::async if asynchronicity is essentialThere are two launch policies:
async means the task will run on a different threaddeferred means the task will run synchronously when get is called on the future.If no launch policy is provided, the implementation is free to choose whether to run the task
async or deferred. This has some interesting implications: We don’t know when and where the task
will run, or even if it will run at all (if get isn’t called on some paths).
Using the default launch policy is only acceptable under the following circumstances:
get/wait.thread_local storage is accessed.get/wait are guaranteed to be called on the future.wait_for/wait_until takes deferred futures into account:
Using future.wait_for(foo) == std::future_status::ready as a loop predicate might never
terminate if the task is deferred, because wait_for will always return
future_status::deferred.
To check whether a future has a deferred task, you can check the result of wait_for(0s), and
then take different actions depending on whether the task is running asynchronously or not.
If any of those conditions cannot be met, you should explicitly request the async launch policy.
You can fairly trivially write a reallyAsync wrapper for std::async:
template <typename... Ts>
auto reallyAsync (Ts&&... ts) {
return std::async (std::launch::async, std::forward<Ts> (ts)...);
}
std::async and understand the
potential failure cases (thread_local storage access, timeout-based looping, failure to run the
task altogether).async launch policy when necessary (or use a wrapper function).std::threads unjoinable on all pathsthread objects can be in either ‘joinable’ or ‘unjoinable’ state.
A ‘joinable’ thread is a thread that is (or is waiting to be) running.
An ‘unjoinable’ thread might be:
std::thread (these don’t represent a thread of execution)std::thread (ownership of the thread of execution is transferred away)std::thread that has been joined (the thread of execution has finished running)std::thread that has been detached (the thread of execution is no longer owned by any
thread object)If the destructor of a thread in a joinable state is invoked, the program will be terminated. This
sounds bad, but the alternatives are worse:
A sudden program termination should be easier to debug than either of the other two cases.
To guarantee that a thread is made unjoinable on every path, you can write a straightforward RAII
wrapper for thread that joins/detaches in its destructor. Be careful using this though, for the
reasons mentioned earlier.
There’s no direct support in the C++ stdlib for ‘interruptible threads’, but they can be implemented by hand using the provided primitives.
std::thread objects unjoinable on all paths, ideally using RAII, but be aware that:
The result of a std::async call cannot be stored in an object associated with the callee, as the
callee might finish before get is called on the future. In this scenario, objects local to the
callee would be destroyed before they could be read.
Similarly, the result cannot be stored directly in the associated future, because this object might
be moved or converted to a shared_future, so there wouldn’t be a constant target address to
which to write the result.
Instead, there’s typically a shared object stored on the heap known as the ‘Shared State’ which
is referenced by both the promise that sends the result and the future which receives it.
The behaviour of a future’s destructor is determined by this shared state:
deferred task launched
by std::async blocks until the task completes. This is akin to joining the associated thread.std::async) this is akin to detaching the associated
thread. For deferred tasks, the task will not run.That is, async futures launched by std::async are a special case with different behaviour. This
is important if creating futures using other mechanisms, like promise or packaged_task.
std::future objects, try to ensure that they are created in a
consistent way, so that their destructors don’t block and cause unexpected performance issues.
std::async
blocks until the task completes.void futures for one-shot event communicationThere are a few ways to signal events between threads. One option might be to use a
std::condition_variable:
// shared state
std::condition_variable cv;
std::mutex m;
// event source
cv.notify_one();
// event receiver
{
std::unique_lock lk { m }; // unique_lock supports CTAD
cv.wait (lk);
}
This approach has some issues:
cv.wait, that task could
hang indefinitely.Another option is to use a boolean flag, set it from the signalling thread, and poll it from the waiting thread, but the polling is very wasteful.
The two approaches can be combined (i.e. set a bool flag and then notify the condvar, check the bool flag after waking) but that’s a lot of boilerplatey code that’s awkward to reuse cleanly.
A much cleaner approach is to to use a std::promise<void>:
// shared state
std::promise<void> p;
// event source
p.set_value();
// event receiver
p.get_future().wait();
This approach is more concise, will work even if the set_value is called before wait, and is
immune to spurious wakeups. However, it does require a heap allocation for the shared state, and the
system can only be used to signal a single event.
When combined with shared futures, this pattern can be used to send notifications to multiple threads at once:
std::promise<void> p;
auto sf = p.get_future().share();
std::vector<std::thread> vt;
for (auto i = 0; i != 2; ++i)
vt.emplace_back ([sf] { sf.wait(); });
p.set_value(); // all threads waiting on the shared future will be notified
for (auto& t : vt)
t.join();
std::atomic for concurrency, volatile for special memoryatomic and volatile have different meanings. Operations on atomic<T> objects are guaranteed
to seen as atomic from other threads. volatile does not give this guarantee. Multiple
simultaneous readers/writers of volatile storage is still a data race.
Using atomics (with default settings) also ensures sequential consistency, whereas volatile
does not, so the optimiser might do surprising things with code that just uses volatile.
So what is volatile actually good for? In normal code, the optimiser is allowed to remove
redundant loads and dead stores (series of non-interleaved reads or writes to an address) because
such code wouldn’t have any useful side-effects. However, for some types of special memory (e.g.
memory-mapped IO), addresses might be read/written by peripherals, so these chains of reads/writes
would no longer be redundant: reading the same address twice in a row might yield different
results, and setting the same address twice in a row might adjust settings on some peripheral.
volatile just tells the compiler not to remove ‘redundant’ loads and stores.
volatile:
std::atomic is for data accessed by multiple threads.volatile is for ensuring that reads/writes are not optimised away.