C++ Atomic operations についてのメモ

mutex が必要ないわけではないということ

atomic より粒度が大きいが現在でも有効な排他制御の手段。ということでいいかな。

粒度の細かい順に並べると

atomic_thread_fence, atomic_flag
atomic
mutex, condition_variable
lock_guard, unique_lock, once_flag, call_once
future, promise
packaged_task
async

かな。

メモリロケーションについて

適切なメモリモデルを持たない場合、プロセッサが一度にロード・ストア可能なビット幅（1ワード）に複数の変数が割り当てられていると、複数のプロセッサから異なる変数を更新しに行ったとしても data race が起こる。

たとえば1ワード4byteのプロセッサで char arr[4]; と定義して arr[0] と arr[3] に別々のプロセッサから書き込みを伴う同時アクセスがあると data race 発生。（適切なメモリモデルが定義されていない場合）

static char arr[4] = {0};
thread t1{[&]{arr[0] = 1;}};
thread t2{[&]{arr[3] = 1;}};
t1.join(); t2.join();
cout << "arr= {" << arr[0] << "," << arr[3] << "}\n";

出力は arr = {0,1}, arr = {1,0}, arr = {1,1} のいずれかになる。（適切なメモリモデルが定義されていない場合）

（試しにやってみたが {1,1}にしかならないなあ。適切に定義されているということか）

メモリロケーションとは:

算術型の変数一つ、ポインタ一つ、非ゼロの幅を持つビットフィールドの並び（但し、たぶん1ワード幅未満）が、それぞれ占有するメモリ領域のこと。幅がゼロのビットフィールドはメモリロケーションの区切りとして使える。

異なるメモリロケーションであれば、複数スレッドが互いに干渉することなく読み書きできる。

データ競合(data race)

複数のスレッドが同じメモリロケーションに同時にアクセスして、一方が書き込みであれば、data race が発生する。read-modify-write 問題。

メモリオーダーについて

Atomic operations ライブラリではデフォルトではアウトオブオーダー実行を抑制する。でいいのかな。つまりコードに書いてある順番で処理が実行される（ように見せる）

これは、デフォルトで、メモリーオーダー std::memory_order_seq_cst (sequentially consistent: 逐次一貫) に設定されているため。

高速化のため、ほかの memory_order も設定できる。という認識でいいかな。

http://en.cppreference.com/w/cpp/atomic/memory_order を少し翻訳（意訳）してみよう。

std::memory_order specifies how regular, non-atomic memory accesses are to be ordered around an atomic operation. Absent any constraints on a multi-core system, when multiple threads simultaneously read and write to several variables, one thread can observe the values change in an order different from the order another thread wrote them. Indeed, the apparent order of changes can even differ among multiple reader threads. Some similar effects can occur even on uniprocessor systems due to compiler transformations allowed by the memory model.

The default behavior of all atomic operations in the library provides for sequentially consistent ordering (see discussion below). That default can hurt performance, but the library's atomic operations can be given an additional std::memory_order argument to specify the exact constraints, beyond atomicity, that the compiler and processor must enforce for that operation.

std:: memory_order は、通常のメモリアクセス、つまり非アトミックなメモリアクセスが、アトミック操作を行う前後で、どのように並べ替えられるか（あるいは並べ替えられないか）を規定する。

メモリオーダーについて何の制約もないマルチコアシステムでは、複数のスレッドが複数の変数に同時にリード、ライトを行うとき、ある一つのスレッドが観測する変数の「変化の順番」は、変数を書いたスレッドが書き出した順番と必ずしも一致しない。

事実、読み出しを行う複数のスレッドの間では、変化の順番が見かけ上入れ替わる。

ユニプロセッサシステムでも、コンパイラによる順番の入れ替えが許されているメモリモデルを持つ場合、同じことが起こる。

ライブラリが提供するアトミック操作のデフォルトの振る舞いは、逐次一貫(sequentially consistent)である。

デフォルトの逐次一貫方式は、パフォーマンスを犠牲にしている。しかし、ライブラリのアトミック操作は std::memory_order 引数を取るので、逐次一貫以外の制約を与えることもできる。

コンパイラとプロセッサは操作をアトミックにするだけでなく、引数に与えたメモリオーダーの制約を適用してアトミック操作を行う責任を持つ。

memory_order_relaxed:

Relaxed operation: there are no synchronization or ordering constraints, only atomicity is required of this operation.

Relax操作：同期や順序の制約はない。アトミック性だけが、この操作に求められている。

memory_order_consume:

A load operation with this memory order performs a consume operation on the affected memory location: no reads in the current thread dependent on the value currently loaded can be reordered before this load. This ensures that writes to data-dependent variables in other threads that release the same atomic variable are visible in the current thread. On most platforms, this affects compiler optimizations only.

このメモリオーダーでのロードは、影響するメモリロケーションに対して「消費操作」を実行する。:

現在のスレッドでまだ読み込んでいない変更のうち、すでにロードされている値に影響するものが、ロードの前に読み込まれる。（RAMやCache上で値が変わっていたら、最新の値がそのスレッドに取り込まれる、ということらしい）

これは、同じアトミック変数を「解放(release)」した他のスレッド(複数可)の書き込みのうち、現在のスレッドに関係のある書き込みが、現在のスレッドで見えることを保証する。

多くのプラットフォームでは、これは、コンパイラの最適化にのみ影響する。

memory_order_acquire:

A load operation with this memory order performs the acquire operation on the affected memory location: no memory accesses in the current thread can be reordered before this load. This ensures that all writes in other threads that release the same atomic variable are visible in the current thread.

このメモリオーダーでのロードは、影響を受けるメモリロケーションの「獲得操作」を行います:

現在のスレッドでまだ読み込んでいない変更や書き込んでいない変更が、ロードの前に読み書きされる。（書き込みも？ no memory accesses と言っているからそうだろうなあ）
これは、同じアトミック変数を「解放(release)」した他のスレッド(複数可)のすべての書き込みが現在のスレッドで見えることを保証する。

memory_order_release:

A store operation with this memory order performs the release operation: no memory accesses in the current thread can be reordered after this store. This ensures that all writes in the current thread are visible in other threads that acquire the same atomic variable and writes that carry a dependency into the atomic variable become visible in other threads that consume the same atomic.

このメモリオーダーでのストアは、「解放操作」を行います。:

現在のスレッドでまだ読み込んでいない変更や書き込んでいない変更が、ストアのあとに読み書きされる。

これは、現在のスレッドによるすべての書き込みが、同じアトミック変数を「獲得」した他のスレッド(複数可)で見えることとを保証する。かつ、書き込みのうち、依存関係をアトミック変数に記録したものが、同じアトミック変数を「消費」した他のスレッド(複数可)で、見えることを保証する。

memory_order_acq_rel:

A read-modify-write operation with this memory order is both an acquire operation and a release operation. No memory accesses in the current thread can be reordered before this load, and no memory accesses in the current thread can be reordered after this store. It is ensured that all writes in another threads that release the same atomic variable are visible before the modification and the modification is visible in other threads that acquire the same atomic variable.

このメモリオーダーでの read-modify-write 操作は、「獲得操作」と「解放操作」の両方を行う。

現在のスレッドでまだ読み込んでいない変更や書き込んでいない変更が、ロードの前に読み書きされる。そしてストアのあとにまた読み書きが実施される。

これは、変更(modification)の前に、同じアトミック変数を「解放」した他のスレッドのすべての書き込みが見えることを保証し、現在のスレッドのすべての変更が、同じアトミック変数を「獲得」した他のスレットで見えることを保証する。

memory_order_seq_cst:

Same as memory_order_acq_rel, plus a single total order exists in which all threads observe all modifications (see below) in the same order.

memory_order_acq_rel と同じ。加えて、単一の全順序が存在する。
すなわち、すべてのスレッドにすべての変更が同じ順序で見える。