|
| Memory barrier note |
|
|
|
|
| Messages |
|
Related Types |
This message was discovered on microsoft.public.dotnet.languages.csharp.
Responses highlighted in red are from those people who are likely to be able to contribute good, authoratitive information to this discussion. They include Microsoft employees, MVP's and others who IMHO contribute well to these kinds of discussions.
| William Stacey [MVP] (VIP) |
Here is an interesting writing on memory barriers. Not sure if this helps my understanding or raises more questions, but interesting...
http://discuss.develop.com/archives/wa.exe?A2=ind0203B&L=DOTNET&P=R375
-- William Stacey, MVP
|
|
|
| |
|
|
| |
| | |
| |
| Jon Skeet [C# MVP] (VIP) |
William Stacey [MVP] <Click here to reveal e-mail address> wrote: [Original message clipped]
Hmm... I've now read it through, and while it mostly confirms what I've understood before, I'm still not convinced I can see a problem in the first singleton implementation, because the variable is volatile.
A volatile write (like a lock release) ensures that no memory write can move after it. Thus the write to "val" shouldn't be able to move to after the write to "singleton".
I've already bothered Vance on more than one occasion, so I'm reluctant to do so again - anyone else have any insight into this?
-- Jon Skeet - <Click here to reveal e-mail address> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
|
|
|
| |
|
|
| |
| |
| Jon Skeet [C# MVP] (VIP) |
Jon Skeet [C# MVP] <Click here to reveal e-mail address> wrote: [Original message clipped]
Aha - I've now read the rest of the thread, and indeed I was right (as were others reading the article). Vance clears it up in another post:
<quote> Arrgg!
The original posting has a most unfortunate typo in it. The first example should have NO volatile variables in it. (This is what I get for cutting and pasting too much). The example with the volatile 'singleton' variable was ANOTHER way of fixing the memory issue. I toyed with explaining this fix, but decided against it becasue it is not as good as the MemoryBarrier() fix. This is becasue it forces to the JIT to do a read memory barrier many places it is not necessary (like the critial path that does not take locks).
Thus while this solves the problem, I do not recommend it as a solution. </quote>
Hooray - I can sleep easy once more :)
-- Jon Skeet - <Click here to reveal e-mail address> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
|
|
|
| |
|
|
| |
|
| |
| William Stacey [MVP] (VIP) |
line 1) System.Threading.Thread.WriteMemoryBarrier(); line 2) singleton = newObj;
So the above insures 110%, without question that the "if (singleton == null)" test will never find singleton to be a partially completed assignment and insures some kind of read barrier so that I read "singleton" ref correctly?
Put another way, is it possible to have a thread switch before line 2 has fully assigned newObj to singleton, which may cause thread1 to see singleton is not null, but ref is not right either, so error?
-- William Stacey, MVP
"Jon Skeet [C# MVP]" <Click here to reveal e-mail address> wrote in message news:Click here to reveal e-mail address... [Original message clipped]
|
|
|
| |
|
|
| |
| |
| Jon Skeet [C# MVP] (VIP) |
William Stacey [MVP] <Click here to reveal e-mail address> wrote: [Original message clipped]
Yes - but so would just
singleton = new Singleton();
when singleton is declared to be volatile.
[Original message clipped]
No. Reference assignments are always atomic, I believe. If they weren't that would be a serious security problem, IMO.
(I've only seen it specified when the memory is properly aligned, but as I say, anything else would be a huge security problem.)
-- Jon Skeet - <Click here to reveal e-mail address> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
|
|
|
| |
|
|
| |
| |
| William Stacey [MVP] (VIP) |
If I read following link (IBM) right, it says Vances code is classic example of a "fix" that is broke. He needs a read barrier to do it that way or just use the darn lock and do it right for all cases on all machines (it is really not that expensive) :) Cheers. http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
BTW. I would guess you have these, but here are some other links on this topic for others.
Brad Abrams blog http://blogs.msdn.com/brada/archive/2004/05/12/130935.aspx
Vance Morrison (CLR team) http://discuss.develop.com/archives/wa.exe?A2=ind0203B&L=DOTNET&P=R375
Exploring the Singleton Design Pattern http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbda/html/singletondespatt.asp
Scott Allen's blog http://odetocode.com/Blogs/scott/archive/2004/05/13/242.aspx
Chris Brumme http://weblogs.asp.net/cbrumme/archive/2003/05/17/51445.aspx http://blogs.gotdotnet.com/cbrumme/PermaLink.aspx/480d3a6d-1aa8-4694-96db-c69f01d7ff2b
The "Double-Checked Locking is Broken" Declaration http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
Various http://www.google.com/groups?q=g:thl1857306645d&dq=&hl=en&lr=&ie=UTF-8&selm=1998May28.082712%40bose.com&rnum=3
http://www.javaworld.com/javaworld/jw-02-2001/jw-0209-toolbox.html
================================== Alexei Zakharov Writes: # re: volatile and MemoryBarrier()... I think the implementation without using volatile is missing one memory barrier. According to http://www.google.com/groups?q=g:thl1857306645d&dq=&hl=en&lr=&ie=UTF-8&selm=1998May28.082712%40bose.com&rnum=3 memory barriers are required for both read and write code paths. The read path extracted from the code is:
if ( Singleton.value == null ) // false {// not executed } return Singleton.value;
There is no memory barrier on this path. In the CLR memory model as described in Chris Brumme's blog (http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx), only volatile loads are considered "acquire", but normal loads can be reordered.
The correct implementation will be:
public sealed class Singleton { private Singleton() {} private static Singleton value; private static object sync = new object();
public static Singleton Value { get { Singleton temp = Singleton.value; System.Threading.Thread.MemoryBarrier(); // this is important
if ( temp == null ) { lock ( sync ) { if ( Singleton.value == null ) { temp = new Singleton(); System.Threading.Thread.MemoryBarrier(); Singleton.value = temp; } } }
return Singleton.value; } } }
Let me expand on the performance of the two implementations of the double checked locking pattern. Obviously we want to make the read path faster and don't care about the write path because the write path is taken only once. The read path extracted from the code is:
// using volatile (Singleton.value is volatile) get { if ( Singleton.value == null ) { // ... not taken } return Singleton.value; }
// using memory barriers get { Singleton temp = Singleton.value; System.Threading.Thread.MemoryBarrier(); if ( temp == null ) { // ... not taken } return Singleton.value; }
The volatile load in the first code has the acquire semantics and is equivalent to the non-volatile load plus the memory barrier in the second code. There are two volatile loads in the first code and only one memory barrier in the second. So I expect the code with memory barriers to perform faster than the code that uses volatiles. But as any performance speculations it has to be taken with a grain of salt. I haven't done any measurements here. ==================================
-- William Stacey, MVP
"Jon Skeet [C# MVP]" <Click here to reveal e-mail address> wrote in message news:Click here to reveal e-mail address... [Original message clipped]
|
|
|
| |
|
|
| |
| | |
| |
| Scott Allen |
One more link to add to the list:
Raymond Chen: High-performance multithreading is very hard http://blogs.msdn.com/oldnewthing/archive/2004/05/28/143769.aspx
-- Scott
On Thu, 3 Jun 2004 18:34:01 -0400, "William Stacey [MVP]" <Click here to reveal e-mail address> wrote:
[Original message clipped]
-- Scott http://www.OdeToCode.com
|
|
|
| |
|
| |
| |
| William Stacey [MVP] (VIP) |
Thanks. And another that was interesting. He also concludes using the lock before the test is way to go. http://www.nwcpp.org/Downloads/2004/DCLP_notes.pdf
"Back Where We Started"...
public class Keyboard { private static pInstance = null; internal Keyboard() { } public static GetInstance() { lock(syncRoot) // read and write barrier { if (pInstance == null) pInstance = new Keyboard(); } return pInstance; } }
Note: Access to shared data is now inside a critical section. Conclusion: There is no portable way to implement DCLP in C++. This may apply to c#, not sure.
My Comments: On my slow box, 10million locks and releases takes 375ms. So is *not locking really worth the fuss or the danger in not doing something right using some other fancy method? I can't see it.
This begs a question. What about collections and syncing on an object (like syncRoot.) You protect concurrent access to internal array (for example, or queue), but you don't have all the shared data inside a critical section that I see. 1) lock(syncRoot) 2) array.Add(new object()) 3) unlock 4) lock(syncRoot) 5) return array[0]; 6) unlock
We don't lock the array, and we don't lock element 0 at all. element 0 has not been protected by any barrier that I can see, so is this subject to issues also?? Cheers!
-- William Stacey, MVP
"Scott Allen" <bitmask@[nospam].fred.net> wrote in message news:Click here to reveal e-mail address... [Original message clipped]
|
|
|
| |
|
|
| |
| |
| Jon Skeet [C# MVP] (VIP) |
William Stacey [MVP] <Click here to reveal e-mail address> wrote: [Original message clipped]
<snip>
Interesting. I'll have to read through it carefully to work out whether or not to include it in my list - it's not .NET-specific, so could be misleading - but may well have good general points to make.
[Original message clipped]
I'm not at all surprised that there's no portable way to do it in unmanaged C++ - basically you've got a potentially different memory model on every platform.
[Original message clipped]
How many processors does your box have though? Memory barriers and locks are *vastly* more expensive on multi-processor machines, I believe. I still advocate simplicity unless you're absolutely *sure* about the performance cost, but I'm just pointing out that there may well be more of a performance cost than the figures you quote imply.
[Original message clipped]
Element 0 has been protected by exactly the same memory barrier as everything else. There aren't different kinds of memory barrier - it's not like a memory barrier only applies to the object you lock on. It applies to *everything*.
-- Jon Skeet - <Click here to reveal e-mail address> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
|
|
|
| |
|
|
| |
| |
| William Stacey [MVP] (VIP) |
[Original message clipped]
You 100% on that? I thought I read that it had to do with individual memory locations, not sure. I assume you have a paper on this question, and would really appreciate the link. Cheers!
-- William Stacey, MVP
|
|
|
| |
|
|
| |
| |
| Scott Allen |
I wouldn't say that the memory barrier protects anything, per se. It's purpose is to place limitations on memory re-ordering.
Tell me what you think of this example:
class A { int x = 1; int y = 2; int z = 3;
public void Foo() { x = 0; y = z; } }
The sequence for Foo could be:
Write 0 to x Read z (3) Assign 3 to y
But the compiler or processor may decide it is better to do:
Read z (3) Write 0 to x Write 3 to y
Now if we wrote Foo as:
public void Foo() { x = 0; System.Threading.Thread.MemoryBarrier(); y = z; }
Now we are guaranteeing the write to x occurs (and is visible) before reading z.
Of course in this example nothing exciting happens if the re-ordering occurs, but I could imagine another contrived example where we add another method to class A which writes to z, then start two threads to work on the same instance. In this case the memory barrier could change the outcome of the value stored to y.
-- Scott
On Fri, 4 Jun 2004 19:11:22 -0400, "William Stacey [MVP]" <Click here to reveal e-mail address> wrote:
[Original message clipped]
-- Scott http://www.OdeToCode.com
|
|
|
| |
|
| |
| |
| William Stacey [MVP] (VIP) |
Not sure all this ordering stuff is the key (however is also important.) The memory barrier (AFAICT), flushes the cache on all processors. This makes sure that when you set a var on one processor, another processor will read that same memory location and not a cached version of that location that could be pointing to something else or null. The lock insures the ordering and sync, the barrier insures the cache does not become an issue. Please correct if in error. Cheers!
-- William Stacey, MVP
"Scott Allen" <bitmask@[nospam].fred.net> wrote in message news:Click here to reveal e-mail address... [Original message clipped]
|
|
|
| |
|
|
| |
| |
| Jon Skeet [C# MVP] (VIP) |
William Stacey [MVP] <Click here to reveal e-mail address> wrote: > Not sure all this ordering stuff is the key (however is also important.)
Ordering is *precisely* the point of memory barriers.
[Original message clipped]
Although that may be the physical affect, the effect in terms of the memory model is specified by ordering. It's not just how the compiler might reorder things - it's the order in which the operations *appear* to have taken place due to caching.
[Original message clipped]
Well, unfortunately it's not very clearly defined. The CLR has a very definite idea of two different types of barrier - a volatile read which has acquire semantics, and a volatile write which has release semantics. Each only affects things in one direction, however. The memory barrier you've described is a sort of bidirectional barrier, so that memory accesses can't move either side of it. It would help if the docs for MemoryBarrier defined it in CLR terms...
-- Jon Skeet - <Click here to reveal e-mail address> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
|
|
|
| |
|
|
| |
| |
| William Stacey [MVP] (VIP) |
I think the cpu instruction sent is the same to flush the cache (not one for read and one write.) If both, please advise or provide link for detail. Cheers.
-- William Stacey, MVP
"Jon Skeet [C# MVP]" <Click here to reveal e-mail address> wrote in message news:Click here to reveal e-mail address... [Original message clipped]
|
|
|
| |
|
|
| |
| |
| Jon Skeet [C# MVP] (VIP) |
William Stacey [MVP] <Click here to reveal e-mail address> wrote: [Original message clipped]
It may well be a single CPU instruction for x86, which has a fairly strong memory model. That's no guarantee about what will happen elsewhere though.
-- Jon Skeet - <Click here to reveal e-mail address> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
|
|
|
| |
|
|
| |
| |
| William Stacey [MVP] (VIP) |
I posted this to badbrams block and chrisbrumme blog. Post here to get more eyes.
Does this spin version work? Why or why not? Cheers!
public sealed class Singleton { private static int spinLock = 0; // lock not owned. private static Singleton value = null; private Singleton() {}
public static Singleton Value() { // Get spin lock. while ( Interlocked.Exchange(ref spinLock, 1) != 0 ) Thread.Sleep(0);
// Do we have any mbarrier issues? if ( value == null ) value = new Singleton();
Interlocked.Exchange(ref spinLock, 0); return value; } }
This would help answer a few related questions for me on how Interlocked works with mem barriers and cache, etc. TIA -- William
-- William Stacey, MVP
"Jon Skeet [C# MVP]" <Click here to reveal e-mail address> wrote in message news:Click here to reveal e-mail address... [Original message clipped]
|
|
|
| |
|
|
| |
| |
| Jon Skeet [C# MVP] (VIP) |
William Stacey [MVP] <Click here to reveal e-mail address> wrote: [Original message clipped]
I *suspect* it will work if Interlocked.Exchange performs a full bidirectional memory barrier (which it sounds like it does).
I suspect it forms no better than using a lock every time, but I guess that wasn't what you were interested in :)
-- Jon Skeet - <Click here to reveal e-mail address> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
|
|
|
| |
|
|
| |
| |
| Scott Allen |
On IA-32 architectures I'm pretty sure this would be using cmpxchg8b with a lock prefix for MP machines. This instruction provides an atomic read/compare/store operation and acts as a full memory barrier. A lock(syncRoot) would boil down to the same instruction.
-- Scott
On Thu, 10 Jun 2004 17:19:26 +0100, Jon Skeet [C# MVP] <Click here to reveal e-mail address> wrote:
[Original message clipped]
-- Scott http://www.OdeToCode.com
|
|
|
| |
|
|
| |
| |
| William Stacey [MVP] (VIP) |
Thanks Scott. Glad I posted this. You have any paper you write on this?
-- William Stacey, MVP
"Scott Allen" <bitmask@[nospam].fred.net> wrote in message news:Click here to reveal e-mail address... [Original message clipped]
|
|
|
| |
|
|
| |
| |
| Scott Allen |
No, I'm afraid not, but I'm sure you can find some if you dig around. There has to be someone left still slinging code in assembly - I gave it up about 7 years ago :)
One reason I remember the cmpxchg8 instruction so well is because it was the instruction involved in the dreaded Pentium F00F bug - you could lock up the CPU from user mode code:
http://www.google.com/search?hl=en&lr=&ie=UTF-8&q=cmpxchg8b+bug
-- Scott
On Thu, 10 Jun 2004 16:41:17 -0400, "William Stacey [MVP]" <Click here to reveal e-mail address> wrote:
>Thanks Scott. Glad I posted this. You have any paper you write on this?
-- Scott http://www.OdeToCode.com
|
|
|
| |
|
| |
|
|
| |
| William Stacey [MVP] (VIP) |
Also. So I take it (assuming my singleton example.) That I would also not have any issue with instance vars inside the singleton that where created during construction? Say a ref var that was another object. This interlocked "fence" should protect everything between the fence start and fence end (assuming no other lazy init is going on inside the first class)?
-- William Stacey, MVP
"Scott Allen" <bitmask@[nospam].fred.net> wrote in message news:Click here to reveal e-mail address... [Original message clipped]
|
|
|
| |
|
|
| |
|
|
| |
| William Stacey [MVP] (VIP) |
[Original message clipped]
Thanks Jon. That is what I hoped was going on. Otherwise I would be more confused.
[Original message clipped]
Other then the fact that this is non-blocking after the first creation of the singleton and comparechange is a faster then taking out a lock before every test. Not that I would normally do this, but helps in understanding some different threading problems. Cheers!
-- William Stacey, MVP
|
|
|
| |
|
|
| |
|
|
|
| | |
|
|
|
| |
| Scott Allen |
Hi William:
As Jon points out, it really is all about ordering. The lock can only ensure a consistent view of the memory if everyone follows the protocol: acquire lock, work with shared memory, release lock.
The problem with double check locking is that only one thread will ever follow the protocol, everyone else cheats and tries to look at shared memory without acquiring the same lock. Because of this we have to strictly control the ordering of the memory operations inside of the lock. Other threads will be peaking at our work while we still have a work in progress.
We can use a memory barrier to force a strong order - all memory writes will be seen by an external observer to happen in the same order as we programmed them. That's really what it's all about.
-- Scott
On Wed, 9 Jun 2004 20:18:27 -0400, "William Stacey [MVP]" <Click here to reveal e-mail address> wrote:
[Original message clipped]
|
|
|
| |
|
|
| |
| |
| William Stacey [MVP] (VIP) |
Thanks Scott. That helps.
-- William Stacey, MVP
"Scott Allen" <bitmask@[nospam].fred.net> wrote in message news:Click here to reveal e-mail address... [Original message clipped]
|
|
|
| |
|
|
| |
|
|
|
|
|
|
|
|
| |
| Jon Skeet [C# MVP] (VIP) |
William Stacey [MVP] <Click here to reveal e-mail address> wrote: [Original message clipped]
Do you mean the code with the explicit write barrier but no read barrier? I see what you're saying, but I'm not sure either way. The thing is, the reading thread can't have read (and therefore cached) any information about the object before it gets the reference - so I *think* it's okay so long as all the *writes* are performed in a way that ensures that all the information is available as soon as the reference itself becomes available.
I certainly favour the "use a lock" approach when using static initialisers doesn't quite have the desired semantics (for whatever reason).
[Original message clipped]
I certainly don't have *all* of them - I'll include various ones in the article.
<snip>
Cheers!
-- Jon Skeet - <Click here to reveal e-mail address> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
|
|
|
| |
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
BootFX
Reliable and powerful .NET application framework. |
|
|
|
|
|
|