Hi
In the following code running the task5 routine from 2 thread simultaneously on a multiprocessor machine takes approx twice as long as running it from a single thread. The atomic loads/stores seem to influence each other (at least on x86/64) even though there is no explicit synchronization.
std::atomic<size_t> sdata; #pragma optimize( "", off ) void task5() { size_t acc; for (int i = 0; i < 1000000000; ++i) { sdata.store(i, std::memory_order_relaxed); acc += sdata.load(std::memory_order_relaxed); } std::cout << acc << std::endl; } #pragma optimize( "", on ) int main() { StopWatch sw; sw.Start(); std::thread trd1 = std::thread(task5); std::thread trd2 = std::thread(task5); // comment out to test with a single thread... trd1.join(); trd2.join(); // ... this one as well sw.Stop(); std::cout << "DONE " << sw.Get_ElapsedMilliseconds() << std::endl; return 0; }
Why are these threads blocking each other?