In my game engine, the following code was called once every frame for every material in the game. We had an average of about 500 materials in most scenes; so on a high-end machine this one function was being called about
100,000 times a second. C# can definitely handle it (ask any high-scale web developer), but as you can imagine the method had to be fast (especially as we wanted to support average-to-low-end desktops):
(Don't spend too long looking at it- it's just here for demonstration :))
private void RenderCache_IterateMaterial(int materialIndex) {
// Set up context variables
KeyValuePair<Material, ModelInstanceManager.MIDArray> currentKVP = currentInstanceData[materialIndex];
Material currentMaterial = currentKVP.Key;
ModelInstanceManager.MIDArray currentMID = currentKVP.Value;
// Skip this material if it or its shader are disposed
if (currentMaterial.IsDisposed || currentMaterial.Shader.IsDisposed) return;
// Skip this material if we're not using it
bool inUse = false;
for (int i = 0; i < currentMID.Length; ++i) {
if (currentMID.Data[i].InUse) {
inUse = true;
break;
}
}
if (!inUse) return;
// Prepare shader according to material params, and switch to it or update it
if (lastSetFragmentShader != currentMaterial.Shader || lastFrameNum != frameNum) {
lastSetFragmentShader = currentMaterial.Shader;
lastFrameNum = frameNum;
QueueShaderSwitch(lastSetFragmentShader);
}
var queuedSRP = currentMaterial.FragmentShaderResourcePackage;
if (lastSetFragmentShader == geomFSWithShadowSupport) {
if (modifiedSRP == null) modifiedSRP = new ShaderResourcePackage();
modifiedSRP.CopyFrom(queuedSRP);
modifiedSRP.SetValue((ResourceViewBinding) lastSetFragmentShader.GetBindingByIdentifier("ShadowMap"), previousShadowBufferSRV);
queuedSRP = modifiedSRP;
}
QueueShaderResourceUpdate(lastSetFragmentShader, queuedSRP);
// Filter & sort
if (materialFilteringWorkspace == null || materialFilteringWorkspace.Length < currentCache.NumModels) {
materialFilteringWorkspace = new List<Transform>[currentCache.NumModels];
for (int i = 0; i < materialFilteringWorkspace.Length; ++i) materialFilteringWorkspace[i] = new List<Transform>();
}
for (int i = 0; i < materialFilteringWorkspace.Length; ++i) materialFilteringWorkspace[i].Clear();
SortByProximityToCamera(currentMID);
uint numInstances = 0U;
for (uint i = 0U; i < currentMID.Length; ++i) {
ModelInstanceData curMID = sortedModelData[i];
if (!curMID.InUse) continue;
SceneLayer layer = currentSceneLayers[curMID.SceneLayerIndex];
if (layer == null || !layer.GetRenderingEnabled() || !addedSceneLayers.Contains(layer)) continue;
if (curMID.ModelIndex == __VEGG_MH.ModelIndex && currentCache.ID == __VEGG_MH.GeoCacheID) {
int instanceIndex = 0;
for (int j = 0; j < currentMID.Length; ++j) {
if (currentMID.Data[j].Transform == curMID.Transform) {
instanceIndex = j;
break;
}
}
Quaternion rot = Quaternion.IDENTITY;
foreach (var kvp in __VEGG_MIH_ARR) {
if (kvp.Key.InstanceIndex == instanceIndex) {
rot = kvp.Value;
break;
}
}
materialFilteringWorkspace[curMID.ModelIndex].Add(curMID.Transform.RotateBy(rot));
}
else materialFilteringWorkspace[curMID.ModelIndex].Add(curMID.Transform);
++numInstances;
}
// Concatenate & queue render commands
if (instanceConcatWorkspace == null || instanceConcatWorkspace.Length < numInstances) {
instanceConcatWorkspace = new Matrix[numInstances << 1]; // x2 so we don't create loads of garbage if the count keeps increasing by 1
}
uint instanceStartOffset = RenderCache_IterateMaterial_ConcatReserve(numInstances);
uint nextWorkspaceIndex = 0;
uint outVBStartIndex, outIBStartIndex, outVBCount, outIBCount;
for (uint mI = 0U; mI < materialFilteringWorkspace.Length; ++mI) {
List<Transform> filteredTransformList = materialFilteringWorkspace[mI];
int numFilteredTransforms = filteredTransformList.Count;
if (numFilteredTransforms == 0) continue;
currentCache.GetModelBufferValues(mI, out outVBStartIndex, out outIBStartIndex, out outVBCount, out outIBCount);
QueueRenderCommand(RenderCommand.DrawIndexedInstanced(
(int) outVBStartIndex,
outIBStartIndex,
outIBCount,
nextWorkspaceIndex + instanceStartOffset,
(uint) numFilteredTransforms
));
for (int iI = 0; iI < numFilteredTransforms; ++iI) {
if (mI == __EGGHACK_MH.ModelIndex && currentCache.ID == __EGGHACK_MH.GeoCacheID) {
instanceConcatWorkspace[nextWorkspaceIndex++] = filteredTransformList[iI].RotateBy(__EGGHACK_ROT).AsMatrixTransposed;
}
else instanceConcatWorkspace[nextWorkspaceIndex++] = filteredTransformList[iI].AsMatrixTransposed;
}
}
RenderCache_IterateMaterial_Concat(instanceConcatWorkspace, instanceStartOffset, numInstances);
}
View Code Fullscreen • "Material Iteration"
Unfortunately, at one point, profiling indicated that almost half of our CPU budget was being spent in this one method. Although I expected the method to certainly show up on the radar, I really needed to reduce that down to at most ~10%. In fact, one simple change got me almost all the way there! Notice the following line:
for (int i = 0; i < materialFilteringWorkspace.Length; ++i) materialFilteringWorkspace[i].Clear();
View Code Fullscreen • "Clearing Material Filtering Workspaces"
It might not look like much, but this was the cause of most of the slowdown!
materialFilteringWorkspace is an array of
List<T>s. To find out why this was giving us so much grief, let's look at the actual implementation of
List<T>.Clear():
// Clears the contents of List.
public void Clear() {
if (_size > 0)
{
Array.Clear(_items, 0, _size); // Don't need to doc this but we clear the elements so that the gc can reclaim the references.
_size = 0;
}
_version++;
}
View Code Fullscreen • ".NET List<T> Clear"
(c) Microsoft .NET; https://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs,ca7bce81a50b0aeb
The critical element here is that the inbuilt
List<T> must clear the array that it wraps when we call
.Clear(). As the in-code comment alludes to, the reason for this is to make sure that references to items we've placed in the list's internal backing array (usually via
Add()) are no longer considered reachable to the garbage collector.
Unfortunately, an array zero/clear is not a zero-cost operation. For most everyday usages it is absolutely neglible, but our
List<T> array '
materialFilteringWorkspace' was quite long usually (somewhere around 600 usually, sometimes over 1000)
and although each list would usually only have a length of around <= 10 items; some would be >= 100.
This means that 100,000 times a second, we were zeroing the memory for at least 600 lists. It's at this point that
List<T>.Clear() becomes a bit too slow. :)
One real solution to this is to re-architect the way the code works. Obviously, clearing a list 60M times a second (and therefore zeroing array memory 60M times a second) isn't really a situation you want to be in too often anyway. However, sometimes part of being a solo engine/game programmer means knowing how to cut corners.
Luckily for me, the actual types that I'm storing in these lists are
structs, and there's no danger of lingering references clogging up my memory. This meant there was a
fast fix: I could just write my own
List<T> stub replacement to drop-in:
public sealed class FastClearList<T> where T : struct {
private T[] storage;
private int counter;
public int Count {
get {
return counter;
}
}
public FastClearList() {
storage = new T[4];
counter = 0;
}
public void Add(T item) {
if (counter >= storage.Length) {
var newStorage = new T[storage.Length << 1];
Array.Copy(storage, 0, newStorage, 0, storage.Length);
storage = newStorage;
}
storage[counter++] = item;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void Clear() {
counter = 0;
}
public T this[int index] {
[MethodImpl(MethodImplOptions.AggressiveInlining)]
get {
return storage[index];
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
set {
storage[index] = value;
}
}
}
View Code Fullscreen • "Fast Clear List"
This is a totally minimal
List<T> replacement (doesn't even have a
Remove()!) with one key difference:
Clear() doesn't clear the backing array. By replacing usages of
List<T>s with
FastClearList<T>s in the code above, I managed to reduce
RenderCache_IterateMaterial's overhead almost entirely.
Finally, a point to note: Although I have the generic constraint
where T : struct for my
FastClearList<T>, struct types in C# can still themselves contain fields of managed types. So if you're wondering why
FastClearList<T> isn't used more widely- that may be why (also because not many people will come across zeroing memory as a performance issue, I imagine!).