Three Garbage Examples

C# (and any language that runs on the CLR) is a garbage-collected language, meaning that objects that have no references to them remaining will have their memory reclaimed at some point in the future. Creating too much garbage (by creating too many ephemeral objects or over-using the new keyword) can induce the garbage-collector too frequently, slowing down the entire application.

Often the source of the garbage is obvious, and culpable code can be refactored in such a way that memory is reused or not so trivially thrown away. However, I've compiled three more esoteric and perhaps unexpected examples of code that generates a lot of garbage below. Enjoy!

The Nullable Paradox

Code

Consider the following code:

private static void DoTestA() {
    int? maybeInt = null;
    for (int i = 0; i < 100000; ++i) {
        DoSomething(maybeInt);
    }
}
 
private static void DoTestB() {
    int? maybeInt = 3;
    for (int i = 0; i < 100000; ++i) {
        DoSomething(maybeInt);
    }
}
 
private static void DoSomething(object o) {
    // Do something...
}

View Code Fullscreen • "The Nullable Paradox Problem Code"
Do either test methods produce garbage? If so, which one do you think produces more? Do they both create garbage equally?

Answer

DoTestA creates 0 bytes of garbage.
DoTestB creates 2.344 MiB of garbage.

Explanation

Because objects are usually reference types and placed on the managed heap, the code that deals with our object "o" in DoSomething will be compiled to treat it as such.

However, maybeInt is a value type of type Nullable<int> (int? is syntax sugar for Nullable<int>) and passed on the stack. So, for the compiled code to work correctly, the value must be boxed and put in a temporary object on the heap. Once DoSomething returns, the temporary object is no longer referenced anywhere and becomes garbage. But because we're doing this 100,000 times (in a loop), this quickly adds up, and we end up with over 2MiB of garbage.

What's somewhat paradoxical about this is that DoTestA doesn't suffer from this problem, despite also appearing to pass a Nullable<int> to DoSomething.

This is because what's actually happening when you pass a nullable value as an object is that the CLR will either pass null or a boxed value, depending on whether your nullable has a value set. Because null is simply a null reference, no boxing is required in the first case.

Workaround

The best way to solve this problem is to generify the DoSomething method:

private static void DoSomething<T>(T o) {
    // Do something...
}

View Code Fullscreen • "Generified DoSomething"
Now maybeInt will be passed to DoSomething as a Nullable<int>, and due to generic reification, DoSomething will be compiled specially for Nullable<int> to take advantage of its stack-based nature.

If you can't change the target method you can also pre-box your value type outside of the loop:

private static void DoTestB() {
    int? maybeInt = 3;
    object maybeIntAsObj = maybeInt;
    for (int i = 0; i < 100000; ++i) {
        DoSomething(maybeIntAsObj);
    }
}

View Code Fullscreen • "Nullable Paradox Alternative Workaround"
This way we've boxed the value once, before the loop begins, and can pass that same 'box' to DoSomething for each iteration.

I'm not sure why the compiler can't make this optimisation itself, interestingly. It may just be that this isn't considered a particularly worthwhile thing to implement.

Purchasing Power

Code

Once again, consider the following code:

static readonly IDictionary<User, IEnumerable<Purchase>> userPurchases;
 
private static void DoTest() {
    foreach (var kvp in userPurchases) {
        PrintUserDetails(kvp.Key);
        foreach (var purchase in kvp.Value) {
            PrintPurchaseDetails(purchase);
        }
    }
}

View Code Fullscreen • "Purchasing Power Problem Example"
Assume that userPurchases has been instantiated and filled with 100,000 users worth of data, each with 200 purchases.

PrintUserDetails and PrintPurchaseDetails simply write to a log file and are not the source of any garbage.

How much garbage would you expect to see generated from this method?

Answer

DoTest creates 840 KiB of garbage.

Explanation

The reason for this is that the collection (userPurchases) is a dictionary whose values are an IEnumerable<T> rather than simply a concrete/actual collection type. As you probably know, IEnumerable<T> is an interface.

When you write a foreach loop, the compiler essentially performs an act of compile-time duck typing, and looks for method called GetEnumerator on the collection type. It then uses the enumerator returned by that method to enumerate through all the values in the collection.

When we dive in to the .NET source for example on List<T> we can see that GetEnumerator returns an "Enumerator", which is a struct declared inside the List class itself.

public Enumerator GetEnumerator() {
    return new Enumerator(this);
}
 
public struct Enumerator : IEnumerator<T>, System.Collections.IEnumerator {
    // ...
}

View Code Fullscreen • "List<T>.Enumerator"
The reason this is a struct is so that you can go around creating enumerators all over the code (via GetEnumerator) without creating a load of short-lived objects.

However, somewhat ironically, we accidentally defeat that optimisation when we use the any collection type through an interface. GetEnumerator on IEnumerable<T> looks like this:

IEnumerator<T> GetEnumerator();

View Code Fullscreen • "IEnumerable<T>.GetEnumerator()"
This means that the struct Enumerator given to us by the underlying List<T> implementation (or whichever collection type you're using) will now be boxed so as to be usable through a reference to an IEnumerator<T>.

When you're iterating through a single collection this doesn't add up to much, but if you have a collection of collections you'll create garbage for each inner collection; and that's how DoTestB() ends up creating almost a meg of garbage doing nothing more than iterating!

Workaround

...There isn't one that I can think of, really.

If possible, you can cast back to the concrete type in the foreach construct:

private static void DoTestB() {
    foreach (var kvp in userPurchasesB) {
        PrintUserDetails(kvp.Key);
        foreach (var purchase in (List<Purchase>) kvp.Value) {
            PrintPurchaseDetails(purchase);
        }
    }
}

View Code Fullscreen • "Purchasing Power Workaround"
Of course, the fact that you're using an interface in the first place likely implies that you don't know the concrete type, and therefore this isn't an option.

Unfortunately, I've tried 100 different approaches to solving this and I came up with nothing. I think if you wanted to go wild you could build an assembly in memory/generate a method with the IL generator that knew the concrete types and did the entire iteration without going through an interface, but that would turn in to another blog post of its own (one I might write in the future though!).

The best advice I can give is to perhaps compromise a little and change your IEnumerable<T>s in to IList<T>s. Although this is technically bad practice, it allows you to at least replace the foreach loop with a garbage-friendly for:

private unsafe static void DoTestB() {
    foreach (var kvp in userPurchasesB) {
        PrintUserDetails(kvp.Key);
        for (int i = 0; i < kvp.Value.Count; ++i) PrintPurchaseDetails(kvp.Value[i]);
    }
}

View Code Fullscreen • "Purchasing Power Alternative Workaround"

Unnecessary Delegation

Code

Consider:

private static event Action SomeEvent;
 
private static void DoTest() {
    for (int i = 0; i < 10000; ++i) {
        SomeEvent += DoSomething;
        SomeEvent -= DoSomething;
    }
}
 
private static void DoSomething() { }

View Code Fullscreen • "Unnecessary Delegation Problem Example"
In this instance we are subscribing to the same event ten thousand times with the same handler (DoSomething()), and then removing it again. Although obviously you wouldn't normally write code like this, it is quite common to have code that adds a handler to an event and removes it again repeatedly over time as other prerequisites change. And although it may not be 10,000 times in one loop, it's not uncommon to have multiple objects all doing the same thing frequently.

So, how much garbage do you think is being generated?

Answer

DoTest creates 1.256 MiB of garbage.

Explanation

When we use the += operator to chain delegate calls to existing events or delegates, the compiler actually rewrites the code to look like this:

private static void DoTest() {
    for (int i = 0; i < 10000; ++i) {
        SomeEvent += new Action(DoSomething);
        SomeEvent -= new Action(DoSomething);
    }
}

View Code Fullscreen • "Event Subscription Actual Code Post-Compilation"
In fact, in the compiled IL we can see two lines like so:

newobj instance void [mscorlib]System.Action::.ctor(object, native int)

View Code Fullscreen • "Event Subscription In MSIL"
Essentially, every time we add or remove a delegate to/from an event we create a new Action (or similar) object! This isn't specific to events, either. The following code also generates over half a meg of garbage:

private static void DoTest() {
    for (int i = 0; i < 10000; ++i) {
        if (PerformIntegerOperation(i, Double) < 0) Console.WriteLine("xyz");
    }
}
 
private static int PerformIntegerOperation(int input, Func<int, int> operation) { return operation(input); }
 
private static int Double(int x) { return x * 2; }

View Code Fullscreen • "Example of Simple Delegate Use Creating Garbage"

Workaround

Where possible, 'cache' the delegate:

private static void DoTest() {
    Action doSomethingWrapper = new Action(DoSomething);
    for (int i = 0; i < 10000; ++i) {
        SomeEvent += doSomethingWrapper;
        SomeEvent -= doSomethingWrapper;
    }
}

View Code Fullscreen • "Unnecessary Delegation Workaround"
If this isn't possible... Well, there's not much to be done actually. If this is killing your app, the best idea might be to actually deliberately hold on to the Action (or whatever) objects (by storing them in a collection somewhere) and then clearing the collection and invoking GC.Collect() when appropriate. At least that way, the garbage (and subsequent collection) is controlled.

Final Remarks

It's worth remembering that these are patterns to be aware of, but don't take these cautionary examples too far. A lot of the code above is also highly idiomatic and will make absolutely no difference to the performance of your application when used here and there. One thing worth noting is that I had to put a lot of the suspect code in a loop to really demonstrate any meaningful amount of garbage.

And on that point, it's also very important to stress that the numbers above are only really replicable if you also use:

The version of .NET I was using (4.6)
The version of C# I was using (5)
The build configuration I was building (optimised / Release)
The target platform I was targeting (x64)
The GC + CLR configuration that I was using

...And other factors.

All that being said, it's still good to be aware of these fun 'gotchas', especially if you're writing high-performance or latency-sensitive code.

Do you have any examples of code that generates more garbage than would first be expected? Or perhaps you know of a better workaround for some of the examples I've given above? If so, please do leave a comment below. :)

Table of Contents

Post Details

Recommended Posts

The Nullable Paradox

Code

Answer

Explanation

Workaround

Purchasing Power

Code

Answer

Explanation

Workaround

Unnecessary Delegation

Code

Answer

Explanation

Workaround

Final Remarks

Read Next: Fun With __makeref