Ben Bowen's Blog [Xenoprimate] • Home / Blog • • About • • Subscribe •

Common Multithreading Mistakes in C# - I: Incorrect Granularity 

In this series, I'm going to be elucidating some common errors C# programmers make when working with multithreaded systems. I'll be exploring issues from a lower-level point-of-view, and not dabbling too much in higher-level abstractions and libraries (such as the TPL and similar); hopefully the points raised will be pertinent to anyone hoping to write concurrent or parallelized applications.

What Is Granularity? 

In case you're a non-native English speaker or you just don't know the word, granularity is a word derived from granular, and it refers to the "fineness" something is broken up in to. In programming parlance, granularity often refers to how finely segregated different portions of your codebase or data storage layout are. For example, if you split your code in to lots of small parts, your codebase is very granular. However, in the context of this blog post, we will be talking about granularity in the context of units of multi-threaded work, and specifically the granularity with which we synchronize (lock over) shared memory.

When programmers begin learning about multithreading they often learn about lock (or synchronized in Java) first, before all other forms of threading synchronization. Frequently programmers who are less experienced with multithreading find themselves unsure of where to synchronize code and end up either overcompensating, resulting in insufficient locking granularity, or not synchronizing enough, resulting in erroneous locking granularity.

Threading Synchronization 

Most (but not all) programmers are vaguely aware that when running multiple threads concurrently, it is often required that the programmer take steps to ensure that the concurrent operations (e.g. concurrent threads of execution) interact with each other in the way the programmer intended. For example, take the following code. It spawns four threads to read words from a pre-filled list and then populate a dictionary with the word count for each word. It also checks every word to see if it's in a list of curse words, and if so, it will print a warning.

private static List<string> wordList;
private static List<string> curseWords;
private static int currentWordIndex;
private static Dictionary<string, int> wordCountDict;

public static void CalculateWordCounts() {
	wordList = GetWordList();
	curseWords = GetCurseWords();
	currentWordIndex = 0;
	wordCountDict = new Dictionary<string, int>();

	Thread threadA = new Thread(ThreadDoWork);
	Thread threadB = new Thread(ThreadDoWork);
	Thread threadC = new Thread(ThreadDoWork);
	Thread threadD = new Thread(ThreadDoWork);
	threadA.Start();
	threadB.Start();
	threadC.Start();
	threadD.Start();
	threadA.Join();
	threadB.Join();
	threadC.Join();
	threadD.Join();
}

private static void ThreadDoWork() {
	bool atLeastOneWordRemaining;
	int thisWordIndex = currentWordIndex;
	currentWordIndex = currentWordIndex + 1;
	if (thisWordIndex >= wordList.Count) atLeastOneWordRemaining = false;
	else atLeastOneWordRemaining = true;

	while (atLeastOneWordRemaining) {
		string thisWord = wordList[thisWordIndex];
		bool firstOccurrenceOfWord = !wordCountDict.ContainsKey(thisWord);

		if (curseWords.Contains(thisWord)) Console.WriteLine("Curse word detected!");
		if (firstOccurrenceOfWord) wordCountDict.Add(thisWord, 1);
		else wordCountDict[thisWord] = wordCountDict[thisWord] + 1;

		thisWordIndex = currentWordIndex;
		currentWordIndex = currentWordIndex + 1;
		if (thisWordIndex >= wordList.Count) atLeastOneWordRemaining = false;
	}
}
View Code Fullscreen • "Multithreaded Word Counter"
In the code above, the word-counting process is started when CalculateWordCounts is called. CalculateWordCounts spawns four new threads, and then starts them- each will then start processing at ThreadDoWork.

In ThreadDoWork each thread takes one word at a time from the global list of words in wordList. Each time a word is 'selected', the currentWordIndex is incremented by 1, and then the thread in question checks to see if the word has already been added in to the wordCountDict. If it has, it increments the word count by one, otherwise it adds a new key/value pair with the word as the key and a value of 1 (as well as checking to see if it's a curse word). Once the currentWordIndex exceeds the length of wordList, the while-loop is exited and the method returns (and the thread exits).

The code above, however, has a problem: It is not synchronized (and therefore not thread-safe). For example, imagine what might happen if two threads simultaneously want to update the count for a previously unencountered word. It's entirely possible that both threads will perform the if check for a pre-existing key at the same time (which will indicate no existing key for the given word), and then both try to add a new key/value pair for the given word, which will result in an exception.

In fact, that's just one of many possible ways the code above can go wrong- not to mention that the standard collections in .NET are not designed for free access from multiple threads in such a manner. Other things that could go wrong include: The same word in wordList being read multiple times, words being skipped entirely, incorrect word counts in the final dictionary, and theoretically even one or more threads being stuck in an infinite loop.

Insufficient Locking Granularity 

The lock keyword in C# provides a mutual-exclusion semantic (via a monitor lock) over an area of code. In simpler terms, this means that a piece of code can be guaranteed to only ever be run by one thread at a time. To do this, we must create a global (shared) object to be locked over.

Using this construct, we can make sections of code 'critical'. Programmers who are unsure or overly safe may mark too much of the parallelized work as critical, and therefore reduce the throughput of the program to something worse that single-threaded. This is insufficient granularity in locking, and the most obvious example would be something like this:

private static readonly object wordCountCalculatorSyncObj = new object();

private static void ThreadDoWork() {
	lock (wordCountCalculatorSyncObj) {
		bool atLeastOneWordRemaining;
		int thisWordIndex = currentWordIndex;
		currentWordIndex = currentWordIndex + 1;
		if (thisWordIndex >= wordList.Count) atLeastOneWordRemaining = false;
		else atLeastOneWordRemaining = true;

		while (atLeastOneWordRemaining) {
			string thisWord = wordList[thisWordIndex];
			bool firstOccurrenceOfWord = !wordCountDict.ContainsKey(thisWord);

			if (curseWords.Contains(thisWord)) Console.WriteLine("Curse word detected!");
			if (firstOccurrenceOfWord) wordCountDict.Add(thisWord, 1);
			else wordCountDict[thisWord] = wordCountDict[thisWord] + 1;

			thisWordIndex = currentWordIndex;
			currentWordIndex = currentWordIndex + 1;
			if (thisWordIndex >= wordList.Count) atLeastOneWordRemaining = false;
		}
	}
}
View Code Fullscreen • "Insufficient Granularity with Locks"
Notice that the entire implementation of ThreadDoWork is now wrapped in a lock (wordCountCalculatorSyncObj) statement. This essentially means that only one thread can be executing the code inside ThreadDoWork at any one time. When another thread reaches the lock statement, it will have to wait until the first thread has exited that same lock statement on its execution stack.

What this means is that although the code is now thread-safe (and will provide the correct answer), the word-count-calculator is now essentially single-threaded again, but with the additional overhead of creating and joining threads. It would have been better to simply write this code single-threaded.

This is a surprisingly common error. Frequently on the C# IRC channels I see newcomers to multithreading make this error- and in fact a recent thread on the C# subreddit also followed suit (I should note that my intention is not to shame or admonish the programmers who make these mistakes- simply to point out that it's a common one).

Usually, insufficient locking granularity does not cause incorrect behaviour however, it only reduces the throughput of the program. Nonetheless, take the following singleton event manager for a nuclear reactor, which is meant to take Actions to be dispatched on certain events, and also invoke those Actions when the respective events are raised:

public static class ReactorEventDispatcher {
	private static readonly object eventDispatcherSyncObj = new object();

	private static event Action startingUp;
	private static event Action heatCritical;
	private static event Action safetyShutdown;

	public static event Action StartingUp {
		add {
			lock (eventDispatcherSyncObj) {
				startingUp += value;
			}
		}
		remove {
			lock (eventDispatcherSyncObj) {
				startingUp -= value;
			}
		}
	}

	public static event Action HeatCritical {
		add {
			lock (eventDispatcherSyncObj) {
				heatCritical += value;
			}
		}
		remove {
			lock (eventDispatcherSyncObj) {
				heatCritical -= value;
			}
		}
	}

	public static event Action SafetyShutdown {
		add {
			lock (eventDispatcherSyncObj) {
				safetyShutdown += value;
			}
		}
		remove {
			lock (eventDispatcherSyncObj) {
				safetyShutdown -= value;
			}
		}
	}

	internal static void TriggerStartUp() {
		lock (eventDispatcherSyncObj) {
			if (startingUp != null) startingUp();
		}
	}

	internal static void TriggerHeatCritical() {
		lock (eventDispatcherSyncObj) {
			if (heatCritical != null) heatCritical();
		}
	}

	internal static void TriggerSafetyShutdown() {
		lock (eventDispatcherSyncObj) {
			if (safetyShutdown != null) safetyShutdown();
		}
	}
}
View Code Fullscreen • "Nuclear Reactor Insufficient Granularity"
The programmer who wrote this code has had thread-safety in mind and locked around all non-private methods of the class. However, synchronizing around unknown extraneous code (e.g. the Actions supplied to our various events) is always a disaster waiting to happen in a free-threaded environment. Imagine if someone, a year later, writes a line of code using this event manager like so:

ReactorEventDispatcher.HeatCritical += () => {
	var triggerShutdownTask = Task.Run((Action) ReactorEventDispatcher.TriggerSafetyShutdown);

	OtherLongRunningSafetyCode();

	triggerShutdownTask.Wait();
};
View Code Fullscreen • "Nuclear Reactor Deadlock"
Do you spot the issue? When someone calls TriggerHeatCritical on the ReactorEventDispatcher, the eventDispatcherSyncObj will be locked over. Then, the lambda we wrote above will be executed, which attempts to start a new task to call TriggerSafetyShutdown. Assuming that new task runs on a second thread, we may have accidentally created a deadlock: TriggerSafetyShutdown can never be entered because the lock is still held by the first thread in TriggerHeatCritical, and it will never be relinquished because the lambda we submitted will forever be awaiting the completion of the task that is running TriggerSafetyShutdown. Oops. Boom! :P

In this particular example, we could solve this issue by taking a local copy of the event delegate in question before triggering it, thereby increasing the granularity of our locking:

internal static void TriggerSafetyShutdown() {
	Action safetyShutdownLocal;
	lock (eventDispatcherSyncObj) {
		safetyShutdownLocal = safetyShutdown;
	}
	if (safetyShutdownLocal != null) safetyShutdownLocal();
}
View Code Fullscreen • "Deadlock Solution"
However, there's no magic solution when writing locks and thread-safe code: The only way to write secure multithreaded code is to really sit down and think about where exactly you need locks or other thread-safety constructs (more of those later). When writing locks, it is important to really ruminate on exactly what could go wrong- think about what might happen if any line is executed on one thread at the same time as any other.

Erroneous Locking Granularity 

An equally prevalent mistake is one of the locking implemented being too granular, instead of locking over too much. This is usually done in the pursuit of reducing contention (and thereby increasing throughput), but accidentally exposes holes in the thread-safety of the code. Although the written code may complete faster, it is liable to giving the wrong result; hence the term erroneous granularity in locking.

Let's take our word counter's ThreadDoWork method as an example; and look at one way we could try to improve our locking strategy:

private static readonly object wordCountCalculatorSyncObj = new object();

private static void ThreadDoWork() {
	bool atLeastOneWordRemaining;
	int thisWordIndex = currentWordIndex;
	lock (wordCountCalculatorSyncObj) {
		currentWordIndex = currentWordIndex + 1;
	}
	if (thisWordIndex >= wordList.Count) atLeastOneWordRemaining = false;
	else atLeastOneWordRemaining = true;

	while (atLeastOneWordRemaining) {
		string thisWord = wordList[thisWordIndex];
		lock (wordCountCalculatorSyncObj) {
			bool firstOccurrenceOfWord = !wordCountDict.ContainsKey(thisWord);
		}

		if (curseWords.Contains(thisWord)) Console.WriteLine("Curse word detected!");
		
		if (firstOccurrenceOfWord) wordCountDict.Add(thisWord, 1);
		else {
			lock (wordCountCalculatorSyncObj) {
				wordCountDict[thisWord] = wordCountDict[thisWord] + 1;
			}
		}
		

		thisWordIndex = currentWordIndex;
		lock (wordCountCalculatorSyncObj) {
			currentWordIndex = currentWordIndex + 1;
		}
		if (thisWordIndex >= wordList.Count) atLeastOneWordRemaining = false;
	}
}
View Code Fullscreen • "Erroneous Locking Granularity"
In this iteration of ThreadDoWork, the code is locked around three key areas. The first (and last) lock in the code above attempts to ensure that only one thread at a time can update currentWordIndex. The second lock tries to make sure that only one thread can see the same word as the first occurrence of that word by locking over the call to wordCountDict.ContainsKey. And the third lock tries to ensure that the update of the word count for the given word, when it isn't the first occurrence of that word, is done one thread at a time.

Unfortunately, every locking attempt implemented in the code above is faulty. We often see errors like this when a programmer is trying to reduce contention (e.g. increase throughput/performance). In trying to reduce the amount of code that can only be run by a single thread at a time, it is possible to accidentally throw away thread safety. Here is where the examples of erroneous granularity in the code above can be found:

In the first and last lock, the programmer has attempted to ensure that only one thread at a time can update currentWordIndex. This is a good idea; however, simply locking over the update of any state in shared memory is not sufficient. For example, it's still perfectly possible for all four threads to reach the line thisWordIndex = currentWordIndex at the same time, and count the same word multiple times. There's also nothing stopping one thread from reading a value from currentWordIndex at the same time as another is updating it, because the lock is only over the update of the variable.
The next lock is attempting to ensure that only one thread can see the same word as the first occurrence of that word, by locking over the check to see if wordCountDict already contains said word. However, this lock is essentially useless, as all it does is force all threads to do the ContainsKey check one-by-one, but does nothing to prevent all threads seeing the dictionary as not containing the key in question.
Finally, the implementation of the last lock is the right idea, in that it forces updates to an existing word in the dictionary to be done by one thread at a time (if you're not sure why this is necessary, it's because the line wordCountDict[thisWord] = wordCountDict[thisWord] + 1 actually does two actions: It first reads the value in wordCountDict[thisWord], then attempts to store that value plus one. Without the lock, it's possible that two threads read the same value at the same time, say for example 3, and then both try to store 3+1 simultaneously, instead of 3+1 and then 4+1). However, it needs to be combined with the prior lock to ensure that adding a new word doesn't result in an exception when we add a key that already exists (because it was added by a previous thread moments ago).

A corrected version of the code above would look like so:

private static readonly object wordCountCalculatorSyncObj = new object();

private static void ThreadDoWork() {
	bool atLeastOneWordRemaining;
	int thisWordIndex;
	lock (wordCountCalculatorSyncObj) {
		thisWordIndex = currentWordIndex;
		currentWordIndex = currentWordIndex + 1;
	}
	if (thisWordIndex >= wordList.Count) atLeastOneWordRemaining = false;
	else atLeastOneWordRemaining = true;

	while (atLeastOneWordRemaining) {
		string thisWord = wordList[thisWordIndex];

		if (curseWords.Contains(thisWord)) Console.WriteLine("Curse word detected!");

		lock (wordCountCalculatorSyncObj) {
			bool firstOccurrenceOfWord = !wordCountDict.ContainsKey(thisWord);
		
			if (firstOccurrenceOfWord) wordCountDict.Add(thisWord, 1);
			else wordCountDict[thisWord] = wordCountDict[thisWord] + 1;
			thisWordIndex = currentWordIndex;
			currentWordIndex = currentWordIndex + 1;
		}
		if (thisWordIndex >= wordList.Count) atLeastOneWordRemaining = false;
	}
}
View Code Fullscreen • "Corrected Locking"
In this version, we've made sure that only one thread can read and update currentWordIndex in one complete (atomic) pass. We've also locked over the entire checking-and-updating routine for the word count dictionary.

Although the code above is now threadsafe (and will provide the correct answer), it may actually run slower than the example given for insufficient granularity, where the entire body of ThreadDoWork was locked over. This is because despite increasing our lock granularity, we will now have four threads continually stopping and starting and waiting for each other, while the majority of the code is still forced to be run one thread at a time. Compare that with the previous iteration which just allowed a single thread to 'chew through' the entire word list.

In multithreaded application development parlance, we can say that our final refactor above has introduced too much contention between all the threads. With the way we've written our code at the moment, there's no way to reduce the contention and therefore increase the performance of the code. However, this is a symptom of writing code in a single-threaded style and then trying to add threading after. In the next post, we'll explore different ways to lay out the word-count-calculator that allow us to see some real performance gains.

As always, any comments, clarifications, or criticisms are welcome. Have a great day. :)


Edit 10th Aug '16: Fixed the incorrect usage of a non-existent Length property on List to Count and fixed one other minor compilation error.
Edit 14th Oct '16: Fixed a minor compilation error (usage of var on an uninitialized variable). Thanks to @deccer on the C# Discord for spotting it.