Hi
In the following code running the task5 routine from 2 thread simultaneously on a multiprocessor machine takes approx twice as long as running it from a single thread. The atomic loads/stores seem to influence each other (at least on x86/64) even though there is no explicit synchronization.
std::atomic<size_t> sdata;
#pragma optimize( "", off )
void task5()
{
size_t acc;
for (int i = 0; i < 1000000000; ++i)
{
sdata.store(i, std::memory_order_relaxed);
acc += sdata.load(std::memory_order_relaxed);
}
std::cout << acc << std::endl;
}
#pragma optimize( "", on )
int main()
{
StopWatch sw;
sw.Start();
std::thread trd1 = std::thread(task5);
std::thread trd2 = std::thread(task5); // comment out to test with a single thread...
trd1.join();
trd2.join(); // ... this one as well
sw.Stop();
std::cout << "DONE " << sw.Get_ElapsedMilliseconds() << std::endl;
return 0;
}
Why are these threads blocking each other?