In this series, I'm going to be elucidating some common errors C# programmers make when working with multithreaded systems. I'll be exploring issues from a lower-level point-of-view, and not dabbling too much in higher-level abstractions and libraries (such as the
TPL and similar); hopefully the points raised will be pertinent to anyone hoping to write concurrent or parallelized applications.
Each post in the series so far has focused on a particular type of mistake and gone in-depth. In contrast for this last post, I'm going to talk a little about a broader mix of multithreading-related topics- basically stuff that wouldn't fit in the first three posts.
Therefore, you should expect a hodgepodge of much smaller summaries of various topics (i.e. the 'everything else') rather than one basic idea being explored in depth for this final post. So, without further delay, let's go!
One mistake I see fairly often on IRC, Discord, and other places is the use of a busy-wait in lieu of proper threading synchronization. Example:
private static string userInput;
private static void Example() {
userInput = null;
GetUserInput(); // tells the UI to request user input on the UI thread
while (userInput == null) ; // wait for user input
ProcessUserInput(userInput); // now use that input
}
View Code Fullscreen • "Busy Wait Synchronization"
In addition to being built on
some unsafe assumptions, the code above keeps the thread in
Example() spinning (around the
while loop) until the UI thread sets the user input value. The problem with this is that it actually takes 100% of a CPU core until that thread is rescheduled by the OS. Furthermore, if the thread is a
TaskPool thread then the task pool will likely (but not always) create another thread at some point to offset the long-running function. This in turn leads to
thread oversubscription which ultimately means reduced throughput. Finally, on a single-core machine, the example above is even
less useful, as the UI thread literally can not be truly running in parallel.
Some people are vaguely aware that 'busy-waiting'/'spinning' a thread is bad, and attempt to fix the problem with something like this:
private static string userInput;
private static void Example() {
userInput = null;
GetUserInput(); // tells the UI to request user input on the UI thread
while (userInput == null) { // wait for user input
Thread.Sleep(100);
}
ProcessUserInput(userInput); // now use that input
}
View Code Fullscreen • "Busy Wait Fix Attempt with Thread.Sleep"
It's not always this obvious (in fact most examples I've seen have been quite a bit more convoluted), however it almost always breaks down to something like the above code in essence. The problem is that although we're not wasting as much CPU time, we're still writing code that sloppily attempts to brute force or 'check' for a second thread's processing 10 times a second, rather than using the more explicit and less 'hacky' in-built synchronization constructs.
Thread.Sleep() is almost always the wrong choice for whatever it is you're writing. In fact, the only time I
ever use
Thread.Sleep() is:
To simulate 'work' (e.g. for a test or demonstration, but never for actual production code)
To create some sort of timer functionality or a thread that executes a function on an interval (even then you should prefer to use the inbuilt .NET timers unless you have specialized needs)
And really, that's it. I've also seen people use
Thread.Sleep() in a poor attempt to jury-rig themselves out of a race condition, for example:
private static SomeDataType someData;
private static void Example() {
someData = null;
GetDataOnNetworkThread(); // tell the network thread to write to someData
Thread.Sleep(100); // make sure we don't beat the network thread
ProcessData(someData); // use the data that the network thread wrote
}
View Code Fullscreen • "Using Thread.Sleep to Badly Fix a Race Condition"
If you write a
Thread.Sleep() in this manner, as a way of 'waiting' for another thread to finish its operation, you're really just sitting on a ticking timebomb. Not only is it potentially wasteful and
very unscaleable, there's absolutely no guarantee that it'll work. Suppose the 'network' thread in the example above doesn't get scheduled for 100ms (unlikely but not impossible), or perhaps it could be waiting for an I/O call to complete. Or perhaps some time-taking event occurs, like the GC pausing the application for >= 100ms, or even just simply the user's PC/server freezing up momentarily. This could have catastrophic results: The 'network' thread is no longer guaranteed to set 'someData' before the reader thread is resumed (because 100ms have already passed).
In short, writing a
Thread.Sleep() anywhere is often a
code smell,
especially if it's being used as some sort of multithreading synchronization mechanism. Instead, it's better to stick to the inbuilt sync primitives and higher-level types. See the great Threading in C# e-book by Joseph Albahari for further reading:
Basic Synchronization.
I've also seen the following mistake a few times:
private static void Example() {
var dataArray = new SomeDataType[400];
FillRawData(dataArray);
Parallel.For(0, 3, i => {
ProcessDataChunk(dataArray, 100 * i, 100 * (i + 1));
});
}
private static void ProcessDataChunk(SomeDataType[] dataArray, int startIndex, int endIndex) {
for (int i = startIndex; i < endIndex; ++i) {
dataArray[i].Process();
}
}
View Code Fullscreen • "Incorrect usage of Parallel.For"
Typically it's actually more experienced programmers who make this mistake. They're aware that they're programming for a quad-core machine and attempt to separate the data processing workload in to a maximum of four threads (in order to avoid over/undersubscription). However, the error is threefold:
Firstly,
Parallel.For already offers overloads that allows for specifically controlling this,
MaxDegreeOfParallelism can be set via a
ParallelOptions parameter.
Secondly, setting this explicitly in such a manner means you're writing code that will only work at its best on a specific architecture. If the code is run on an 8-core machine, or even a 128-core server, you'll be limited to only 4 threads.
Thirdly, and perhaps most importantly, writing code in this way defeats the
work stealing nature of the thread pool. Typically, unless the data being processed is highly uniform, it's unlikely that all four threads will complete at exactly the same time. For a large amount of data with the design demonstrated above, you could have 1 or more threads 'waiting' for the final thread to finish its particularly difficult workload. Instead, when writing the code as below,
Parallel.For's threads will keep 'stealing' work from the main queue as soon as they finish the current 'block'.
private static void Example() {
var dataArray = new SomeDataType[400];
FillRawData(dataArray);
Parallel.For(0, 400, i => {
dataArray[i].Process();
});
}
View Code Fullscreen • "Correct usage of Parallel.For"
This approach leads to a more fluid (and ultimately higher throughput) approach. The blocking-up/chunking size can be specified more explicitly with a
Parallel.ForEach call and a custom
Partitioner in advanced cases.
Another mistake I've seen is
accidental contention/
accidental synchronization. Sometimes it's obvious (e.g. using a
lock statement inside the
Parallel.For loop body), but sometimes not so obvious. Here's an example of something not so obvious:
private static void Example() {
var dataArray = new SomeDataType[400];
FillRawData(dataArray);
Parallel.For(0, 400, i => {
dataArray[i].Process();
Console.WriteLine(i + " => " + dataArray[i]);
});
}
View Code Fullscreen • "Accidental Introduction of Contention in Parallel.For"
Although obviously writing to console adds its own overhead, it should be noted that
WriteLine is a
static method call. Why is this important? Well, in .NET almost all static members are guaranteed to be threadsafe; and usually threadsafe implies some sort of synchronization. The
Console class is no exception. If for example your static call internally enters a
lock, you'll end up forcing all threads in the parallel loop to keep fighting with each other for control over that lock. Even if the synchronization mechanism for the given static type is less rudimentary, you'll still potentially be creating
unnecessary contention!
Finally, a note on
where to parallelize. In the following example, the programmer has made a mistake in choosing to parallelize the inner loop rather than the outer. Though sometimes it can be beneficial to parallelize an inner loop, in the following case it only serves to make the parallelization of the entire method very 'stop start'.
private static void Example() {
var dataArray = new SomeDataType[400000];
FillRawData(dataArray);
var processorAlgorithms = GetProcessorAlgorithms();
for (int i = 0; i < 400000; ++i) {
Parallel.For(0, processorAlgorithms.Count, a => {
dataArray[i].Process(processorAlgorithms[a]);
});
}
}
View Code Fullscreen • "Incorrect Approach to Parallelizing Nested Loops"
In the code above,
Parallel.For is called 400,000 times rather than just once. Not only is that going to create a lot of unnecessary overhead and garbage, it once again hampers the ability of the built-in ThreadPool to orchestrate itself properly (e.g. dynamically adjusting the number of threads to suit the system's needs).
Ideally most programmers should be using higher-level synchronization constructs anyway, but for those of us who have a need to use the
Monitor class's
Wait and
Pulse methods I have a small bit of advice.
Take the following example:
private static void ConsumerWaitForProducer(object writerLock) {
lock (writerLock) {
Monitor.Wait(writerLock);
}
}
View Code Fullscreen • "Incorrect Monitor.Wait Usage"
This is
potentially buggy! Yes, even from this simple code it's possible to tell. You see,
Wait/
Pulse are implemented using wait/ready queues. When you
Pulse a lock object, it's potentially possible that the woken thread (the one that was
Waiting but dequeued) is not the same one who actually went ahead and acquired the lock. This means that by the time the dequeued thread
actually acquires the lock the state it was relying on has been changed by a sneaky third thread. This is known as a
Spurious Wakeup.
Therefore, the canonical pattern to protect against this involves using a guard variable and a while loop, as follows:
private static void ConsumerWaitForProducer(object writerLock) {
lock (writerLock) {
while (!dataReadyForProcessing) {
Monitor.Wait(writerLock);
}
}
}
View Code Fullscreen • "Corrected Monitor.Wait Usage"
All that being said, I think I've read somewhere that .NET handles its own wait/ready queues and therefore this issue
may not exist. However, I can't find such a resource again and frankly I see no reason to 'risk it', so I will continue to treat non-while-loop-protected
Wait calls as buggy. Also in most real-world cases I find that I want some sort of guard condition or state check
anyway, so it simply becomes a case of just changing an
if statement to a
while.
Something else I've seen occasionally is data being accidentally 'leaked' out of threadsafe protection. Here's an example:
class ExampleClass {
List<SomeDataType> someData = new List<SomeDataType>();
readonly object dataLock = new object();
public List<SomeDataType> SomeData {
get {
lock (dataLock) {
return someData;
}
}
}
public void ProcessData() {
lock (dataLock) {
foreach (var data in somedata) data.Process();
}
}
public void AddData(List<SomeDataType> newData) {
lock (dataLock) {
someData.AddRange(newData);
}
}
public void SetData(List<SomeDataType> newData) {
lock (dataLock) {
someData = newData;
}
}
}
View Code Fullscreen • "Reference Leak Example"
Do you see the potential issue? The public
SomeData property is accidentally
leaking the protected data! Although the programmer has tried to be careful and locked over the
return someData statement, the actual
someData field has been
leaked via return through the getter. For example, someone could easily get a hold of the data (via
myExampleClass.SomeData) and start manipulating it without being protected by
dataLock. In fact, they couldn't protect it even if they wanted to, as
dataLock is a private member (which it should be)!
So what's the answer? The easiest fix is to
make a copy of the list:
class ExampleClass {
public List<SomeDataType> SomeData {
get {
lock (dataLock) {
return new List<SomeDataType>(someData);
}
}
}
}
View Code Fullscreen • "Reference Leak Fix for SomeData Property"
However, we still have an issue in our original code. Leaked references like this can also be leaked
before they're set. For example, imagine if someone uses our class like this:
public static void Example() {
var exampleClass = GetExampleClass();
List<SomeDataType> newData = new List<SomeDataType>();
exampleClass.SetData(newData);
DoLotsOfProcessing(newData);
}
View Code Fullscreen • "Insidious Leaked Reference Usage Example"
The call to
DoLotsOfProcessing may very well manipulate and access the
List that we just passed to our
ExampleClass, but once again, it won't be protected by
dataLock. The exact same issue arises from this mistake as with the leaky property. One way to fix this is once again to
copy the data:
class ExampleClass {
public void SetData(List<SomeDataType> newData) {
lock (dataLock) {
someData = new List<SomeDataType>(newData);
}
}
}
View Code Fullscreen • "Reference Leak Fix for SetData Method"
However, immutability is our friend when dealing with multithreaded code, and by making
someData readonly we would actually have protected ourselves against this mistake in the first place. We could, and perhaps should, have written
SetData like so:
class ExampleClass {
readonly List<SomeDataType> someData = new List<SomeDataType>();
public void SetData(List<SomeDataType> newData) {
lock (dataLock) {
someData.Clear();
AddData(newData);
}
}
}
View Code Fullscreen • "Reference Leak Alternative Fix for SetData Method"
Finally, there's a deeper issue to be aware of that isn't so easily fixed throughout all of this. Although we've now protected our
someData list effectively, we can't do anything about protecting against concurrent access of all the
SomeDataType objects it contains! For example, someone could still quite easily call
ExampleClass.SomeData to get a copy of the list, and then go ahead and begin modifying the contained elements (assuming they're reference types). This is a much harder problem to solve, but some good rules of thumb are:
Strongly prefer making the relevant types immutable (that way they can't be accidentally modified concurrently)
Isolate multithreaded parts of your codebase and design the interfaces to/from them very carefully
Try to separate concurrently-processed data and hide them away in private implementations. Return meaningful results that are second-hand calculations based on concurrently-accessed data, rather than exposing the data itself
I briefly touched on this in the previous post but didn't give much of an explanation because it's a very advanced subject and most programmers will never need to worry about it. Nonetheless, I said I'd go in to detail in a later post, so here we go.
I think most C# programmers "know" that word-sized-or-smaller variables are always atomic (e.g.
int is always atomic,
long is atomic on a 64-bit system). But actually, this is technically incorrect. The precise definition is that
word-sized-or-smaller variables are always atomic if properly aligned.
Alignment here refers to making sure that variables line-up on certain boundaries in memory. On x86 and derivatives this is mostly a performance optimisation, but it can also be a requirement for certain operations and on other CPU architectures. For managed code (i.e. plain C#) this is usually all taken care of for us by the CLR, so we can usually shorten the definition of atomic variables by removing the alignment stipulation.
Of course however, there are always exceptions. In the previous post I gave an example where a 64-bit long would exhibit nonatomic behaviour
even on a 64-bit system:
private static long* misalignedLongPtr = GetMisalignedHeapLong();
private static long* GetMisalignedHeapLong() {
const int MISALIGNMENT = -3;
const int X64_CACHE_LINE_SIZE = 64;
const int WIN_ALLOC_ALIGN = 8;
IntPtr reservedMemory = Marshal.AllocHGlobal(new IntPtr(sizeof(long) + X64_CACHE_LINE_SIZE + (sizeof(long) - 1) + WIN_ALLOC_ALIGN));
IntPtr cacheLineStart = (IntPtr) ((((long) reservedMemory + (X64_CACHE_LINE_SIZE - 1)) / X64_CACHE_LINE_SIZE) * X64_CACHE_LINE_SIZE);
return (long*) (cacheLineStart + MISALIGNMENT);
}
private static void WriterThreadEntry() {
while (true) {
*misalignedLongPtr = -1L;
*misalignedLongPtr = 1L;
}
}
private static void ReaderThreadEntry() {
while (true) {
var value = *misalignedLongPtr;
if (value != 1L && value != -1L) {
Console.WriteLine("Torn read! " + value);
}
}
}
View Code Fullscreen • "Evidence of Non-Atomicity Using Word-Sized Values"
By using pointers and off-managed-heap memory we're subverting the CLR's alignment mechanisms and therefore accessing a misaligned
long (for the curious, I've added a commented version of
GetMisalignedHeapLong explaining its function further below). However,
we don't actually have to write any unsafe code at all to produce this behaviour, we can simply ask the CLR not to bother aligning our types:
private static AlignedData[] alignedData = new AlignedData[10];
private static MisalignedData[] misalignedData = new MisalignedData[10];
struct AlignedData {
public byte B;
public long L;
}
[StructLayout(LayoutKind.Sequential, Pack = 1)]
struct MisalignedData {
public byte B;
public long L;
}
private static void WriterThreadEntry() {
while (true) {
alignedData[DATA_INDEX].L = -1L;
misalignedData[DATA_INDEX].L = -1L;
alignedData[DATA_INDEX].L = 1L;
misalignedData[DATA_INDEX].L = 1L;
}
}
private static void ReaderThreadEntry() {
while (true) {
var alignedValue = alignedData[DATA_INDEX].L;
var misalignedValue = misalignedData[DATA_INDEX].L;
if (alignedValue != 1L && alignedValue != -1L) {
Console.WriteLine("Aligned value was torn! " + alignedValue);
}
if (misalignedValue != 1L && misalignedValue != -1L) {
Console.WriteLine("Misaligned value was torn! " + misalignedValue);
}
}
}
View Code Fullscreen • "Evidence of Non-Atomicity Using Explicitly Misaligned Structs"
Using the code above we can see line after line of
Misaligned value was torn! 16777215 or similar, but not a single line indicating the
aligned value being torn. In order to get this behaviour I pinned the two arrays on the managed heap and selected a 'dangerous' value for
DATA_INDEX, but in real-world code there's nothing to stop the stars aligning and this effect actually being replicated for real.
The 'trick' that makes accessing
misalignedData[DATA_INDEX].L nonatomic is in specifying a
Pack value of
1 for
MisalignedData. This tells the compiler/CLR/JIT
not to add 'packing' (unused bytes) between the fields that usually makes them aligned on byte-boundaries of their own size (e.g. 4 byte boundary for
int, 2 byte boundary for
short, etc.). It also instructs the compiler/CLR/JIT not to add packing at the
end of the struct, which usually ensures that an array of that struct will have every element correctly aligned.
In both this example and the one before (with
GetMisalignedHeapLong) the ultimate issue comes from the fact that our data is
straddling a CPU cache line. The CPU core will tend to move memory in and out of the cache in 'lines' in order to manipulate the data within. So, if we accidentally 'span' our data across a cache-line boundary the CPU is forced to do a read-modify-write operation in at least 2 separate steps. This therefore breaks atomicity. Below I have copied
GetMisalignedHeapLong but with comments explaining how it works to create a
long that exhibits this behaviour:
// Returns a pointer to a long that spans two cache lines (e.g. it flows over the end of one cache line on to the next)
private static long* GetMisalignedHeapLong() {
const int MISALIGNMENT = -3; // We eventually want to move the long 3 bytes backwards past the start of the target cache line
const int X64_CACHE_LINE_SIZE = 64; // The size of a cache line on my target architecture
const int WIN_ALLOC_ALIGN = 8; // The default alignment that the windows allocator provides for all memory allocations
// Reserve enough memory so that we can place the beginning of the long where we need to.
// We need:
// * sizeof(long) bytes to fit the long itself,
// * Potentially the size of a cache line in the worst case in order to be able to move our long pointer back to the previous one,
// * Enough room to move the long back further up to (sizeof(long) - 1) bytes, depending on our MISALIGNMENT value
// * An additional 8 bytes to account for the windows allocator's built-in alignment
IntPtr reservedMemory = Marshal.AllocHGlobal(new IntPtr(sizeof(long) + X64_CACHE_LINE_SIZE + (sizeof(long) - 1) + WIN_ALLOC_ALIGN));
// Find the start of a cache line. This will be the last byte-address in our reserved memory for which (address % 64) == 0
// We're using the fact that integer division discards remainder/fractional values to find this boundary
IntPtr cacheLineStart = (IntPtr) ((((long) reservedMemory + (X64_CACHE_LINE_SIZE - 1)) / X64_CACHE_LINE_SIZE) * X64_CACHE_LINE_SIZE);
// Return a pointer that points to where the cache line starts minus 3 bytes (according to MISALIGNMENT)
return (long*) (cacheLineStart + MISALIGNMENT);
}
View Code Fullscreen • "GetMisalignedHeapLong Explained"
If you don't fully understand everything I've said in this section don't worry, here's all you need to know:
As long as you're in managed land only and not using any structs for which a custom StructLayout has been specified, you don't need to worry about this.
If you are working with native code or data that is coming from an external source (e.g. not the managed heap), it pays to be extra careful about assuming atomicity, but...
It's very rare that you should be relying on atomic access with no other threading synchronization anyway (as discussed in the
previous post) so it's very unlikely you'll ever come across this issue.
Finally, misaligned data hurts performance so it's a good idea to align it anyway, regardless of multithreading concerns. Most compilers, allocators, and runtimes out there in the world know this and do their best to make aligned values the default.
So that's it, and this post concludes my "Common Multithreading Mistakes in C#" series. It was probably the least self-coherent in the series but that's the nature of 'everything else' posts. :) Nonetheless, stay tuned as I'll likely be posting more about multithreading at some point in occasional posts, as well as discussing more advanced C# concepts and performance related topics in the future. And as always, have a great day!