Best Unity Performance tips for C# from a Cloud Developer

If you’re looking to improve the performance of your C# Unity game, there are many resources out there that talk about performance in Unity. These usually focus on things like better usage of the Unity APIs, better managing your objects, and reducing the size of your resources. Some articles also discuss better ways to write your scripts – recommendations such as caching your components are quite common.

In this article, I’d like to focus on something else – pure C# performance tips, optimizations tips and improvements to C# LINQ performance issues.

Working in Azure, with code that rungs on millions of cores, allows me to see the impact of even tiny changes on the bottom line. A single piece of code can be responsible for a huge bottleneck, hogging the CPU, or enslaving the Garbage Collector.

I’ll organize these tips into sections:

Methodology

For the purpose of this article, I’ll be using a small script as the basis of my performance tests. The script will perform all tests one after the other, and I’ll use Unity’s Profiler, with the Deep Profile option enabled, to analyze the results:

public class PerfTest : MonoBehaviour
{
    void Update()
    {
        // RunTest1()
        // RunTest2()
    }
}

Depending on the scenario, I might be looking at the “clock time” of each test scenario, or on the amount of memory allocated (and therefore probably also the number of GC calls or duration).

While the exact numbers you might see if you reproduce these tests will probably be different, the exact performance numbers are not critical – what we care about, it how much we can reduce/improve these numbers.

C# Strings creation

Working with string is quite simple in C#. However, generating new strings dynamically, especially long strings, is quite expensive. Let’s take a look at a native implementation – given a list of numbers, create a comma-separated strings:

private string StringConcat()
{
    string result = "";
    for (int i = 0; i < count; i++)
    {
        result += "Test " + i + ",";
    }
    return result;
}

This implementation creates a substring “Test X,” for every number, it also creates the temporarily accumulated string “Test 1,Test 2, Test 3,” all the way to the end, when we reach the final result. This is extremely inefficient – it allocates a ton of temporary strings which the GC needs to collect.

A much better implementation is to use StringBuilder, which is designed especially for these cases:

private string StringBuilder()
{
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < count; i++)
    {
        sb.Append("Test " + i + ",");
    }
    return sb.ToString();
}

The keen reader would notice that we’re still creating the temporary strings here. StringBuilder has an AppendFormat method, which can help with that:

private string StringBuilderAppendFormat()
{
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < count; i++)
    {
        sb.AppendFormat("Test {0},", i);
    }
    return sb.ToString();
}

At this point, we can make one additional improvement, but we’ll have to choose – if we care more about memory, we can avoid creating a new StringBuilder on every invocation. This will require clearing the StringBuilder before/after its use, which means that we’ll “waste” some extra CPU:

private string StringBuilderReuse()
{
    resuableStringBuilder.Clear();
    for (int i = 0; i < count; i++)
    {
        resuableStringBuilder.AppendFormat("Test {0},", i);
    }
    return resuableStringBuilder.ToString();
}

Here are my results from running all 4 tests in Unity Profiler:

Test ScenarioGC AllocGC Alloc RatioTime msTime ms Ratio
StringConcat8.4 MB1.0005.151.00
StringBuilder115.8 KB0.0133.680.71
StringBuilderAppendFormat69.4 KB0.0082.400.46
StringBuilderReuse36.9 KB0.0042.190.41

As you can string, even the naive usage of StringBuilder gives a huge reduction in both memory allocations and CPU.

Use a struct instead of creating strings for Dictionary keys

A common way of using Dictionary<TKey, TValue>, is to use strings as the key, and construct those strings whenever you access the dictionary. For example:

public string GetKey(GameObject obj)
{
    return obj.Type().Name + ":" + obj.Name;
}

var objectToTypeAndName = new Dictionary<string, GameObject>();
objectToTypeAndName[GetKey(obj)] = obj;

As we just saw, creating lots of string is very inefficient. However here, we have an additional issue – comparison of the strings is relatively CPU intensive. That means that all the dictionary accesses are not very efficient.

A much more efficient (memory and CPU -wise) approach is to create a Key struct, which implements the IEquatable<T> interface:

public struct Key : IEquatable<Key>
{
    public string Type;
    public string Name;

    public Key(string type, string name)
    {
        this.Type = type;
        this.Name = name;
    }

    public override bool Equals(object obj)
    {
        if (obj is Key)
        {
            return this.Equals((Key)obj);
        }
        return false;
    }

    public bool Equals(Key other)
    {
        return this.Type.Equals(other.Type) && this.Name.Equals(other.Name);
    }

    public override int GetHashCode()
    {
        var h1 = this.Type.GetHashCode();
        var h2 = this.Name.GetHashCode();
        return (((h1 << 5) + h1) ^ h2);
    }
}

Obviously, you can create a generic class to support things other than strings. And some implementations that support more than 2 “parts” for the key.

Use efficient string methods overloads

String methods have multiple overloads, some of them more efficient than others. The difference between those overloads might have a big impact.

String.Equals() and String.Compare()

Whenever you’re calling String.Compare() or String.Equals() without a StringComparison argument, the following is used – StringComparison.CurrentCulture.

Not only this might result in unexpected behavior (e.g., different cultures might have different handling for numbers), this comparison type is not very efficient, as it need to consider and consult the culture information for each character comparison.

Whenever possible, pass the StringComparison.Ordinal (or StringComparison.OrdinalIgnoreCase) argument, as that will result in much better performance.

This is especially true when you use a constant value for one of the string arguments. For example, when checking if a URI’s schema equals HTTPS instead of doing:

string.Equals(“https”, uri.Schema)

use:

string.Equals(“https”, uri.Schema, StringComparison.OrdinalIgnoreCase)

The first calls the current CultureInfo.Compare() method, while the second does a fast bytes comparison (for those interested, it checks 40 bytes at a time, comparing 4 bytes by doing int-comparison, which supports SIMD when available).

Using String.Split()

The way string.Split() is implemented, is that it allocates an array of ints for the indexes of the separators, and an array of strings for the substrings. This means that every invocation of string.split() allocates memory, which puts pressure on the GC.

When you need a single substring from the original string, a much better approach is to find the separator’s index, and the following separator index, and then invoke string.Substring() to retrieve the part that you need. When you know the expected number of results (e.g, splitting a string in two parts), it’s much better to find the separator’s index, and take the 2 substrings.

If you have a dedicated pattern, such as a relative path, don’t just split it by /, and take the parts – you’re allocating a lot of extra substrings which you don’t need. Create your own dedicated helper method, such that matches your requirement.

Lastly, if you decide that you indeed need the “Split” method, notice that the method expects an array of separators. First, always prefer the char[] variant. Secondly, avoid allocating a new array for the separators on every invocation. Even when you call text.Split('X'), you’re actually allocating a char[] behind the scenes (since the method accepts a params char[] and the compiler compiles your code into text.Split(new char[]{'​​​​X'}​​​​​​). Lastly, if you know how many substrings you expect back, provide that information to the Split method (in the “count” argument) – it will allow the implementation to allocate arrays of the correct size, thus avoiding under/over allocation.

Using String.Replace()

Whenever you’re calling string.Replace(), many allocations happen behind the scenes, including an integer array (similar to string.Split()) and the new final string. Of course, this is in addition to the copying of the characters from the source string to the returned value.

This makes the following pattern, of calling string.Replace() multiple times trying to “normalize” a string, very inefficient:

input = input.Replace("_", "-");
input = input.Replace("@", "-");
input = input.Replace("/", "-");

The above results in many allocations, including all the temporary strings which are allocated and then thrown away. A slightly more efficient version of the above would be to use the “char” overload of string.Replace():

input = input.Replace('_', '-');
input = input.Replace('@', '-');
input = input.Replace('/', '-');

While this version is more efficient a per invocation, it still results in the intermediate strings. A much better approach would be to write your own helper method, which will create a new StringBuilder, scan the original string one character at a time, and either append the character, or its replacement to the StringBuilder.

Working with Enums

Enums are an easy way to make your code more readable, and provide some type-safety. However, Enums’ methods are based on reflection code, which is well known to be inefficient:

  • Whenever you call ToString() on an Enum value, a new string is created with the expected value. As we’ve seen above, strings have their own costs.
  • Calls to ToString(), GetNames(), IsDefined() and other methods are based on reflection. What’s worse, those values are not cache, which means you’re paying for every invocation.
  • Parsing Enum values relies on string.Split() which is inefficient in itself.

So, what can you do?

There are multiple solutions available. First, you can avoid some of the above issues by caching the methods results – for example, getting the list of Enum names only once, and keeping it in a static field.

A more complete solution is to use helper classes which expose extension methods for each Enum. The extension methods can implement a more efficient version of the method you need. For example, we can create a FastToString() method, which returns const strings:

public static class GameStateExtensions
{
    public static string FastToString(this GameState gameState)
    {
        switch (gameState)
        {
            case GameState.Running: return "Running";
            case GameState.Paused: return "Paused";
            case GameState.GameOver: return "GameOver";
            case GameState.MainMenu: return "MainMenu";
            default: return "[Unknown]";
        }
    }
}
Test ScenarioGC AllocGC Alloc RatioTime msTime ms Ratio
EnumToString39.1 KB1.008.161.000
EnumFastToString0 KB0.000.110.013

As you can see, the differences in performance are huge and well worth it. If you’re working with a lot of Enums, I’d recommend generating the extension classes using some tool, such as a TT template, or Roslyn Source Generators.

LINQ is not quick

Using LINQ allows developers to write code very easily, and achieve more in a short period. This is achieved by adding an abstraction layer, and relying on the IEnumerable<T> interface. However, this abstraction comes at a great performance price. LINQ tends to allocate a lot of memory and is not very efficient CPU-wise. In addition, the added abstraction layer disregards proper data structure and algorithms usage.

In the following sections we’ll see some of the pitfalls of LINQ, and how to avoid them.

Lazy evaluation sometimes means multiple evaluations

One of the advantages of LINQ is its lazy-evaluation nature – you evaluate your collections only when you use them. On the flip side, it also means that you pay for the evaluation every time you iterate over your collection.

Let’s take a look at an example.

We’ll create an IEnumerable<int> with the numbers from 0 to 1000, and check the maximum number 10 times. We’ll do the same for a list. Notice we’ll still be using the LINQ Max() method, which is probably not optimized, but is good enough for now.

private static void RangeIEnumerable()
{
    var enumeration = Enumerable.Range(0, 1000);
    for (int i= 0; i < 10;  i++)
    {
        long count = enumerable.Max();
    }
}

private static void RangeList()
{
    var list = Enumerable.Range(0, 1000).ToList();
    for (int i = 0; i < 10; i++)
    {
        var count = list.Max();
    }
}

While the RangeList method will allocate more memory (the actual list), the RangeIEnumerable method will have to generate the enumerable collection 10 times, which is very inefficient. Here are the results:

Test ScenarioGC AllocGC Alloc RatioTime msTime ms Ratio
RangeEnumerable436 B1.00905.861.000
RangeList4.4 KB10.097.610.008

As you can see, the difference is staggering.

Turning O(N) to O(N2)

Another impact of using LINQ is that the methods implementations are not optimized for the data structures. What this means, is that we risk taking an O(N) algorithm, and perform it in O(N2).

Let’s take a look. Let’s assume we have a collection of important vectors. At some point, we get a list of user-provided vectors, and we want to filter just the important ones. We might write our code like this (obviously, I’m generating dummy data here, and running the same code 10 times per frame, to make the numbers easier to measure):

HashSet<Vector3> importantVectorsHashSet = Enumerable.Range(0, 100).Select(x => x * Vector3.one).ToHashSet();

private static void ContainsIEnumerable(IEnumerable<Vector3> userProvidedVectors, IEnumerable<Vector3> importantVectors)
{
    for (int i = 0; i < 10; i++)
    {
        var important = userProvidedVectors.Where(v => importantVectors.Contains(v)).ToList();
    }
}

ContainsIEnumerable(userProvided, importantVectorsHashSet);

The call to importantVectors.Contains(v) is actual an O(N) invocation – it iterates over all the values of importantVectors, “ignoring” the fact that it’s an HashSet object, which provides an efficient Contains method. This is performed for every item in userProvidedVectors, which turns the whole method to O(N2).

The simple change of passing importantVectors as an HashSet<Vector3> can provide a big performance boost:

private static void ContainsHashSet(IEnumerable<Vector3> userProvidedVectors, HashSet<Vector3> importantVectors)
{
    for (int i = 0; i < 10; i++)
    {
        var important = userProvidedVectors.Where(v => importantVectors.Contains(v)).ToList();
    }
}

Now, the call to importantVectors.Contains(v) is running in O(1), bringing back the algorithm to the expected O(N).

Test ScenarioGC AllocGC Alloc RatioTime msTime ms Ratio
ContainsIEnumerable17.5 KB1.002.751.00
ContainsHashSet17.5 KB1.001.520.55

Smart usage of Regexes

It’s well known that Regexes are a performance bottleneck – they allocate a lot, and are slow to execute. While Jamie Zawinski’s quote is frequently used when talking about Regex – Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems. – the idea behind it is not to completely avoid Regexes, but to not overuse them.

Don’t use Regex for simple scenarios

Sometimes, you’ll see code that just wants to perform some basic validation on its input. For example, ensure that all characters in a string are digits. How would you implement that using Regex?

string pattern = @"^[\d]+$";
Regex.Matches(input, pattern)

This is simple, elegant, and probably a terrible way to implement this. A much more efficient solution would be to implement your own helper method that checks the characters in the input one by one, and checks if they’re all digits. Using a regular expression to perform such a simple validation is overkill. Even much more complex rules can be implemented using char’s IsDigit/IsLetterOrDigit and other similar methods.

Avoid creating many Regex instances

The Regex class has an internal cache mechanism, that avoids objects initialization and regular expression compilation. However, the cache is used only when using one of Regex‘s static methods (like the above usage of Matches(). The means that developers should either use those static methods, or create and reuse a single instance of the Regex object for each well-known regular expression.