Multimobile Development: Building Applications for any Smartphone
Memory barrier note
Messages   Related Types
This message was discovered on microsoft.public.dotnet.languages.csharp.
Responses highlighted in red are from those people who are likely to be able to contribute good, authoratitive information to this discussion. They include Microsoft employees, MVP's and others who IMHO contribute well to these kinds of discussions.

William Stacey [MVP] (VIP)
Here is an interesting writing on memory barriers. Not sure if this helps
my understanding or raises more questions, but interesting...

http://discuss.develop.com/archives/wa.exe?A2=ind0203B&L=DOTNET&P=R375

--
William Stacey, MVP

Reply to this message...
Vote that this is a GOOD answer...
 
 
    
Jon Skeet [C# MVP] (VIP)
William Stacey [MVP] <Click here to reveal e-mail address> wrote:
[Original message clipped]

Thanks for the link - one to add to the (currently empty) list of
resources at the end of my own threading article. It's *nearly*
finished now (ish!)

http://www.pobox.com/~skeet/csharp/multithreading.html" target="_blank">http://www.pobox.com/~skeet/csharp/multithreading.html

--
Jon Skeet - <Click here to reveal e-mail address>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Reply to this message...
Vote that this is a GOOD answer...
 
 
    
Jon Skeet [C# MVP] (VIP)
William Stacey [MVP] <Click here to reveal e-mail address> wrote:
[Original message clipped]

Hmm... I've now read it through, and while it mostly confirms what I've
understood before, I'm still not convinced I can see a problem in the
first singleton implementation, because the variable is volatile.

A volatile write (like a lock release) ensures that no memory write can
move after it. Thus the write to "val" shouldn't be able to move to
after the write to "singleton".

I've already bothered Vance on more than one occasion, so I'm reluctant
to do so again - anyone else have any insight into this?

--
Jon Skeet - <Click here to reveal e-mail address>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Reply to this message...
Vote that this is a GOOD answer...
 
 
    
Jon Skeet [C# MVP] (VIP)
Jon Skeet [C# MVP] <Click here to reveal e-mail address> wrote:
[Original message clipped]

Aha - I've now read the rest of the thread, and indeed I was right (as
were others reading the article). Vance clears it up in another post:

<quote>
Arrgg!

The original posting has a most unfortunate typo in it. The first
example should have NO volatile variables in it. (This is what I get
for cutting and pasting too much). The example with the volatile
'singleton' variable was ANOTHER way of fixing the memory issue. I
toyed with explaining this fix, but decided against it becasue it is
not as good as the MemoryBarrier() fix. This is becasue it forces to
the JIT to do a read memory barrier many places it is not necessary
(like the critial path that does not take locks).

Thus while this solves the problem, I do not recommend it as a
solution.
</quote>

Hooray - I can sleep easy once more :)

--
Jon Skeet - <Click here to reveal e-mail address>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Reply to this message...
Vote that this is a GOOD answer...
 
 
    
William Stacey [MVP] (VIP)
line 1) System.Threading.Thread.WriteMemoryBarrier();
line 2) singleton = newObj;

So the above insures 110%, without question that the "if (singleton ==
null)" test will never find singleton to be a partially completed assignment
and insures some kind of read barrier so that I read "singleton" ref
correctly?

Put another way, is it possible to have a thread switch before line 2 has
fully assigned newObj to singleton, which may cause thread1 to see singleton
is not null, but ref is not right either, so error?

--
William Stacey, MVP

"Jon Skeet [C# MVP]" <Click here to reveal e-mail address> wrote in message
news:Click here to reveal e-mail address...
[Original message clipped]

Reply to this message...
Vote that this is a GOOD answer...
 
 
    
Jon Skeet [C# MVP] (VIP)
William Stacey [MVP] <Click here to reveal e-mail address> wrote:
[Original message clipped]

Yes - but so would just

singleton = new Singleton();

when singleton is declared to be volatile.

[Original message clipped]

No. Reference assignments are always atomic, I believe. If they weren't
that would be a serious security problem, IMO.

(I've only seen it specified when the memory is properly aligned, but
as I say, anything else would be a huge security problem.)

--
Jon Skeet - <Click here to reveal e-mail address>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Reply to this message...
Vote that this is a GOOD answer...
 
 
    
William Stacey [MVP] (VIP)
If I read following link (IBM) right, it says Vances code is classic example
of a "fix" that is broke. He needs a read barrier to do it that way or just
use the darn lock and do it right for all cases on all machines (it is
really not that expensive) :) Cheers.
http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html

BTW. I would guess you have these, but here are some other links on this
topic for others.

Brad Abrams blog
http://blogs.msdn.com/brada/archive/2004/05/12/130935.aspx

Vance Morrison (CLR team)
http://discuss.develop.com/archives/wa.exe?A2=ind0203B&L=DOTNET&P=R375

Exploring the Singleton Design Pattern
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbda/html/singletondespatt.asp

Scott Allen's blog
http://odetocode.com/Blogs/scott/archive/2004/05/13/242.aspx

Chris Brumme
http://weblogs.asp.net/cbrumme/archive/2003/05/17/51445.aspx
http://blogs.gotdotnet.com/cbrumme/PermaLink.aspx/480d3a6d-1aa8-4694-96db-c69f01d7ff2b

The "Double-Checked Locking is Broken" Declaration
http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html

Various
http://www.google.com/groups?q=g:thl1857306645d&dq=&hl=en&lr=&ie=UTF-8&selm=1998May28.082712%40bose.com&rnum=3

http://www.javaworld.com/javaworld/jw-02-2001/jw-0209-toolbox.html

==================================
Alexei Zakharov Writes:
# re: volatile and MemoryBarrier()...
I think the implementation without using volatile is missing one memory
barrier. According to
http://www.google.com/groups?q=g:thl1857306645d&dq=&hl=en&lr=&ie=UTF-8&selm=1998May28.082712%40bose.com&rnum=3
memory barriers are required for both read and write code paths. The read
path extracted from the code is:

if ( Singleton.value == null ) // false
{// not executed }
return Singleton.value;

There is no memory barrier on this path. In the CLR memory model as
described in Chris Brumme's blog
(http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx), only volatile
loads are considered "acquire", but normal loads can be reordered.

The correct implementation will be:

public sealed class Singleton {
private Singleton() {}
private static Singleton value;
private static object sync = new object();

public static Singleton Value {
get {
Singleton temp = Singleton.value;
System.Threading.Thread.MemoryBarrier(); // this is important

if ( temp == null ) {
lock ( sync ) {
if ( Singleton.value == null ) {
temp = new Singleton();
System.Threading.Thread.MemoryBarrier();
Singleton.value = temp;
}
}
}

return Singleton.value;
}
}
}

Let me expand on the performance of the two implementations of the double
checked locking pattern. Obviously we want to make the read path faster and
don't care about the write path because the write path is taken only once.
The read path extracted from the code is:

// using volatile (Singleton.value is volatile)
get {
if ( Singleton.value == null ) {
// ... not taken
}
return Singleton.value;
}

// using memory barriers
get {
Singleton temp = Singleton.value;
System.Threading.Thread.MemoryBarrier();
if ( temp == null ) {
// ... not taken
}
return Singleton.value;
}

The volatile load in the first code has the acquire semantics and is
equivalent to the non-volatile load plus the memory barrier in the second
code. There are two volatile loads in the first code and only one memory
barrier in the second. So I expect the code with memory barriers to perform
faster than the code that uses volatiles. But as any performance
speculations it has to be taken with a grain of salt. I haven't done any
measurements here.
==================================

--
William Stacey, MVP

"Jon Skeet [C# MVP]" <Click here to reveal e-mail address> wrote in message
news:Click here to reveal e-mail address...
[Original message clipped]

Reply to this message...
Vote that this is a GOOD answer...
 
 
    
William Stacey [MVP] (VIP)
oops. Forgot two of the best ones....
Andrew Birrell (http://birrell.org/andrew/papers/)
"An Introduction to Programming with C# Threads"
http://research.microsoft.com/~birrell/papers/ThreadsCSharp.pdf (At
Microsoft)
http://birrell.org/andrew/papers/035-Threads.pdf" target="_blank">http://birrell.org/andrew/papers/035-Threads.pdf (First version at Compaq)

Andrew seems to suggest the only way to do this right is using a lock around
all the tests and assignments (which the correct memory barrier
implementation would do the same for you I guess, but need to think really
hard about it each time or each new twist.)

--
William Stacey, MVP

Reply to this message...
Vote that this is a GOOD answer...
 
 
    
Jon Skeet [C# MVP] (VIP)
William Stacey [MVP] <Click here to reveal e-mail address> wrote:
[Original message clipped]

I don't think he's right about that though. The .NET memory model
provides a memory barrier on a volatile read/write, whereas the Java
model doesn't. That's why the "various fixes" tried with Java don't
work - but I believe making the variable volatile in .NET *does*.

--
Jon Skeet - <Click here to reveal e-mail address>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Reply to this message...
Vote that this is a GOOD answer...
 
 
    
Scott Allen
One more link to add to the list:

Raymond Chen: High-performance multithreading is very hard
http://blogs.msdn.com/oldnewthing/archive/2004/05/28/143769.aspx

--
Scott

On Thu, 3 Jun 2004 18:34:01 -0400, "William Stacey [MVP]"
<Click here to reveal e-mail address> wrote:

[Original message clipped]

--
Scott
http://www.OdeToCode.com
Reply to this message...
Vote that this is a GOOD answer...
 
Really good experience at the Apple Store
MonoDroid – looking *awesome*
 
    
William Stacey [MVP] (VIP)
Thanks. And another that was interesting. He also concludes using the lock
before the test is way to go.
http://www.nwcpp.org/Downloads/2004/DCLP_notes.pdf

"Back Where We Started"...

public class Keyboard
{
private static pInstance = null;
internal Keyboard()
{
}
public static GetInstance()
{
lock(syncRoot) // read and write barrier
{
if (pInstance == null)
pInstance = new Keyboard();
}
return pInstance;
}
}

Note: Access to shared data is now inside a critical section.
Conclusion: There is no portable way to implement DCLP in C++. This may
apply to c#, not sure.

My Comments:
On my slow box, 10million locks and releases takes 375ms. So is *not
locking really worth the fuss or the danger in not doing something right
using some other fancy method? I can't see it.

This begs a question. What about collections and syncing on an object (like
syncRoot.) You protect concurrent access to internal array (for example, or
queue), but you don't have all the shared data inside a critical section
that I see.
1) lock(syncRoot)
2) array.Add(new object())
3) unlock
4) lock(syncRoot)
5) return array[0];
6) unlock

We don't lock the array, and we don't lock element 0 at all. element 0 has
not been protected by any barrier that I can see, so is this subject to
issues also?? Cheers!

--
William Stacey, MVP

"Scott Allen" <bitmask@[nospam].fred.net> wrote in message
news:Click here to reveal e-mail address...
[Original message clipped]

Reply to this message...
Vote that this is a GOOD answer...
 
 
    
Jon Skeet [C# MVP] (VIP)
William Stacey [MVP] <Click here to reveal e-mail address> wrote:
[Original message clipped]

<snip>

Interesting. I'll have to read through it carefully to work out whether
or not to include it in my list - it's not .NET-specific, so could be
misleading - but may well have good general points to make.

[Original message clipped]

I'm not at all surprised that there's no portable way to do it in
unmanaged C++ - basically you've got a potentially different memory
model on every platform.

[Original message clipped]

How many processors does your box have though? Memory barriers and
locks are *vastly* more expensive on multi-processor machines, I
believe. I still advocate simplicity unless you're absolutely *sure*
about the performance cost, but I'm just pointing out that there may
well be more of a performance cost than the figures you quote imply.

[Original message clipped]

Element 0 has been protected by exactly the same memory barrier as
everything else. There aren't different kinds of memory barrier - it's
not like a memory barrier only applies to the object you lock on. It
applies to *everything*.

--
Jon Skeet - <Click here to reveal e-mail address>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Reply to this message...
Vote that this is a GOOD answer...
 
 
    
William Stacey [MVP] (VIP)
[Original message clipped]

You 100% on that? I thought I read that it had to do with individual memory
locations, not sure. I assume you have a paper on this question, and would
really appreciate the link. Cheers!

--
William Stacey, MVP

Reply to this message...
Vote that this is a GOOD answer...
 
 
    
Scott Allen
I wouldn't say that the memory barrier protects anything, per se. It's
purpose is to place limitations on memory re-ordering.

Tell me what you think of this example:

class A
{
int x = 1;
int y = 2;
int z = 3;

public void Foo()
{
x = 0;
y = z;
}
}

The sequence for Foo could be:

Write 0 to x
Read z (3)
Assign 3 to y

But the compiler or processor may decide it is better to do:

Read z (3)
Write 0 to x
Write 3 to y

Now if we wrote Foo as:

public void Foo()
{
x = 0;
System.Threading.Thread.MemoryBarrier();
y = z;
}

Now we are guaranteeing the write to x occurs (and is visible) before
reading z.

Of course in this example nothing exciting happens if the re-ordering
occurs, but I could imagine another contrived example where we add
another method to class A which writes to z, then start two threads to
work on the same instance. In this case the memory barrier could
change the outcome of the value stored to y.

--
Scott

On Fri, 4 Jun 2004 19:11:22 -0400, "William Stacey [MVP]"
<Click here to reveal e-mail address> wrote:

[Original message clipped]

--
Scott
http://www.OdeToCode.com
Reply to this message...
Vote that this is a GOOD answer...
 
First volume of Multimobile Development nearly ready to go to press
A mention on Developing for the iPhone and Android: The pros and cons
 
    
William Stacey [MVP] (VIP)
Not sure all this ordering stuff is the key (however is also important.)
The memory barrier (AFAICT), flushes the cache on all processors. This
makes sure that when you set a var on one processor, another processor will
read that same memory location and not a cached version of that location
that could be pointing to something else or null. The lock insures the
ordering and sync, the barrier insures the cache does not become an issue.
Please correct if in error. Cheers!

--
William Stacey, MVP

"Scott Allen" <bitmask@[nospam].fred.net> wrote in message
news:Click here to reveal e-mail address...
[Original message clipped]

Reply to this message...
Vote that this is a GOOD answer...
 
 
    
Jon Skeet [C# MVP] (VIP)
William Stacey [MVP] <Click here to reveal e-mail address> wrote:
> Not sure all this ordering stuff is the key (however is also important.)

Ordering is *precisely* the point of memory barriers.

[Original message clipped]

Although that may be the physical affect, the effect in terms of the
memory model is specified by ordering. It's not just how the compiler
might reorder things - it's the order in which the operations *appear*
to have taken place due to caching.

[Original message clipped]

Well, unfortunately it's not very clearly defined. The CLR has a very
definite idea of two different types of barrier - a volatile read which
has acquire semantics, and a volatile write which has release
semantics. Each only affects things in one direction, however. The
memory barrier you've described is a sort of bidirectional barrier, so
that memory accesses can't move either side of it. It would help if the
docs for MemoryBarrier defined it in CLR terms...

--
Jon Skeet - <Click here to reveal e-mail address>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Reply to this message...
Vote that this is a GOOD answer...
 
 
    
William Stacey [MVP] (VIP)
I think the cpu instruction sent is the same to flush the cache (not one for
read and one write.) If both, please advise or provide link for detail.
Cheers.

--
William Stacey, MVP

"Jon Skeet [C# MVP]" <Click here to reveal e-mail address> wrote in message
news:Click here to reveal e-mail address...
[Original message clipped]

Reply to this message...
Vote that this is a GOOD answer...
 
 
    
Jon Skeet [C# MVP] (VIP)
William Stacey [MVP] <Click here to reveal e-mail address> wrote:
[Original message clipped]

It may well be a single CPU instruction for x86, which has a fairly
strong memory model. That's no guarantee about what will happen
elsewhere though.

--
Jon Skeet - <Click here to reveal e-mail address>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Reply to this message...
Vote that this is a GOOD answer...
 
 
    
William Stacey [MVP] (VIP)
I posted this to badbrams block and chrisbrumme blog. Post here to get more
eyes.

Does this spin version work? Why or why not? Cheers!

public sealed class Singleton
{
private static int spinLock = 0; // lock not owned.
private static Singleton value = null;
private Singleton() {}

public static Singleton Value()
{
// Get spin lock.
while ( Interlocked.Exchange(ref spinLock, 1) != 0 )
Thread.Sleep(0);

// Do we have any mbarrier issues?
if ( value == null )
value = new Singleton();

Interlocked.Exchange(ref spinLock, 0);
return value;
}
}

This would help answer a few related questions for me on how Interlocked
works with mem barriers and cache, etc. TIA -- William

--
William Stacey, MVP

"Jon Skeet [C# MVP]" <Click here to reveal e-mail address> wrote in message
news:Click here to reveal e-mail address...
[Original message clipped]

Reply to this message...
Vote that this is a GOOD answer...
 
 
    
Jon Skeet [C# MVP] (VIP)
William Stacey [MVP] <Click here to reveal e-mail address> wrote:
[Original message clipped]

I *suspect* it will work if Interlocked.Exchange performs a full
bidirectional memory barrier (which it sounds like it does).

I suspect it forms no better than using a lock every time, but I guess
that wasn't what you were interested in :)

--
Jon Skeet - <Click here to reveal e-mail address>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Reply to this message...
Vote that this is a GOOD answer...
 
 
    
Scott Allen
On IA-32 architectures I'm pretty sure this would be using cmpxchg8b
with a lock prefix for MP machines. This instruction provides an
atomic read/compare/store operation and acts as a full memory
barrier. A lock(syncRoot) would boil down to the same instruction.

--
Scott

On Thu, 10 Jun 2004 17:19:26 +0100, Jon Skeet [C# MVP]
<Click here to reveal e-mail address> wrote:

[Original message clipped]

--
Scott
http://www.OdeToCode.com
Reply to this message...
Vote that this is a GOOD answer...
 
 
    
William Stacey [MVP] (VIP)
Thanks Scott. Glad I posted this. You have any paper you write on this?

--
William Stacey, MVP

"Scott Allen" <bitmask@[nospam].fred.net> wrote in message
news:Click here to reveal e-mail address...
[Original message clipped]

Reply to this message...
Vote that this is a GOOD answer...
 
 
    
Scott Allen
No, I'm afraid not, but I'm sure you can find some if you dig around.
There has to be someone left still slinging code in assembly - I gave
it up about 7 years ago :)

One reason I remember the cmpxchg8 instruction so well is because it
was the instruction involved in the dreaded Pentium F00F bug - you
could lock up the CPU from user mode code:

http://www.google.com/search?hl=en&lr=&ie=UTF-8&q=cmpxchg8b+bug

--
Scott

On Thu, 10 Jun 2004 16:41:17 -0400, "William Stacey [MVP]"
<Click here to reveal e-mail address> wrote:

>Thanks Scott. Glad I posted this. You have any paper you write on this?

--
Scott
http://www.OdeToCode.com
Reply to this message...
Vote that this is a GOOD answer...
 
First chapters of Multimobile Development book now available on Apress Alpha program
iPad
 
    
William Stacey [MVP] (VIP)
Also. So I take it (assuming my singleton example.) That I would also not
have any issue with instance vars inside the singleton that where created
during construction? Say a ref var that was another object. This
interlocked "fence" should protect everything between the fence start and
fence end (assuming no other lazy init is going on inside the first class)?

--
William Stacey, MVP

"Scott Allen" <bitmask@[nospam].fred.net> wrote in message
news:Click here to reveal e-mail address...
[Original message clipped]

Reply to this message...
Vote that this is a GOOD answer...
 
 
    
William Stacey [MVP] (VIP)
[Original message clipped]

Thanks Jon. That is what I hoped was going on. Otherwise I would be more
confused.

[Original message clipped]

Other then the fact that this is non-blocking after the first creation of
the singleton and comparechange is a faster then taking out a lock before
every test. Not that I would normally do this, but helps in understanding
some different threading problems. Cheers!

--
William Stacey, MVP

Reply to this message...
Vote that this is a GOOD answer...
 
 
    
Scott Allen
On P4s and above there is:

    MFENCE (full memory barrier)
    LFENCE (read (load) barrier)
    SFENCE (write (store) barrier)

Note that none of these instructions "flush the cache" of the
processor they execute on or any other processor. They strongly order
instructions. It's up to the cache coherency protocols in Intel
systems to ensure consistency, barriers and locks don't mean "cache
flush".

See:
http://developer.intel.com/design/pentium4/manuals/253666.htm
and
http://developer.intel.com/design/pentium4/manuals/253667.htm

--
Scott

On Thu, 10 Jun 2004 15:40:15 +0100, Jon Skeet [C# MVP]
<Click here to reveal e-mail address> wrote:

[Original message clipped]

Reply to this message...
Vote that this is a GOOD answer...
 
New book project – Multimobile Development: Building Applications for any Smartphone
Dive into HTML5
 
    
Scott Allen
Hi William:

As Jon points out, it really is all about ordering. The lock can only
ensure a consistent view of the memory if everyone follows the
protocol: acquire lock, work with shared memory, release lock.

The problem with double check locking is that only one thread will
ever follow the protocol, everyone else cheats and tries to look at
shared memory without acquiring the same lock. Because of this we have
to strictly control the ordering of the memory operations inside of
the lock. Other threads will be peaking at our work while we still
have a work in progress.

We can use a memory barrier to force a strong order - all memory
writes will be seen by an external observer to happen in the same
order as we programmed them. That's really what it's all about.

--
Scott

On Wed, 9 Jun 2004 20:18:27 -0400, "William Stacey [MVP]"
<Click here to reveal e-mail address> wrote:

[Original message clipped]

Reply to this message...
Vote that this is a GOOD answer...
 
 
    
William Stacey [MVP] (VIP)
Thanks Scott. That helps.

--
William Stacey, MVP

"Scott Allen" <bitmask@[nospam].fred.net> wrote in message
news:Click here to reveal e-mail address...
[Original message clipped]

Reply to this message...
Vote that this is a GOOD answer...
 
 
    
Jon Skeet [C# MVP] (VIP)
William Stacey [MVP] <Click here to reveal e-mail address> wrote:
[Original message clipped]

Do you mean the code with the explicit write barrier but no read
barrier? I see what you're saying, but I'm not sure either way. The
thing is, the reading thread can't have read (and therefore cached) any
information about the object before it gets the reference - so I
*think* it's okay so long as all the *writes* are performed in a way
that ensures that all the information is available as soon as the
reference itself becomes available.

I certainly favour the "use a lock" approach when using static
initialisers doesn't quite have the desired semantics (for whatever
reason).

[Original message clipped]

I certainly don't have *all* of them - I'll include various ones in the
article.

<snip>

Cheers!

--
Jon Skeet - <Click here to reveal e-mail address>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Reply to this message...
Vote that this is a GOOD answer...
 
 
 
System.Threading.Interlocked
System.Threading.Thread




Multimobile Development: Building Applications for any Smartphone
Ad
BootFX
Reliable and powerful .NET application framework.
iOS, Android and Windows Phone Development Training and Consultancy
Hosted by RackSRV Communications
 
Multimobile Development: Building Applications for any Smartphone
Copyright © AMX Software Ltd 2008-2010. Portions copyright © Matthew Baxter-Reynolds 2001-2010. All rights reserved.
Contact Us - Terms of Use - Privacy Policy - 4.0.30129.1734