C# Collections Performance

Become a developer with our complete learning paths

This article is a general guide for C# collections performance optimization. C# collections are essential for storing and manipulating large amounts of data efficiently. However, not all collections are created equal when it comes to performance. In this article, we explore the performance characteristics of some of the most commonly used C# collections, including their time complexity for different operations, the reasons behind their performance, their enumeration costs, and how they are sorted. We also provide tips and techniques for improving the performance of collections in C#.

General Tips for Improving C# Collections Performance

Here are some tips to improve performance when working with collections in C#:

Use the appropriate collection type for the specific task at hand.
Minimize the number of operations that require resizing the collection.
Avoid unnecessary copying of elements between collections.
Use capacity estimates to initialize collections to the appropriate size.
Consider using structs instead of classes for small collections to avoid the overhead of heap allocation.
Understand the reasons behind the performance characteristics, enumeration costs, and sorting methods of each collection to choose the right collection for the job and ensure optimal performance.
Use foreach loops instead of for loops whenever possible, as they are often faster and more readable.

Using IEnumerable

One way to improve iteration performance is to use yield return to lazily generate the items in the enumeration. This can be more efficient than generating the entire sequence upfront because it allows you only to generate the needed items. yield return requires the return type to be IEnumerableand use foreach instead of for loops to iterate the result. Doing so can result in cleaner and more efficient code. Let’s see an example.

Here’s an example of using yield return with IEnumerable<T> to lazily generate items in the enumeration:

public static IEnumerable<int> GenerateNumbers(int count) 
{ 
    for (int i = 0; i < count; i++)
       yield return i; 
}

In this example, the GenerateNumbers method returns an IEnumerable<int> that generates the numbers from 0 to count - 1. The yield return statement is used to return each number in turn, allowing the caller to consume them one at a time.

When this method is called, the numbers are generated lazily, meaning that they are only created as they are needed. This can be more efficient than generating the entire sequence upfront, especially if count is very large, and the caller only needs to consume a small portion of the sequence.

Here’s an example of how this method could be consumed using a foreach loop:

foreach (int number in GenerateNumbers(10))
{
    Console.WriteLine(number);
}

In this example, the foreach loop is used to iterate over the numbers generated by the GenerateNumbers method. Each number is printed to the console as it is generated.

Using yield return with IEnumerable<T> can be a powerful technique for generating sequences of items lazily and efficiently. It allows you to generate only the items that are needed, which can be particularly useful for large or expensive sequences.

Choosing the Right Data Structure

Choosing the appropriate collection data structure for the specific task can improve performance in several ways. For example, arrays can be a good choice for fixed-size collections that need to be accessed frequently by index, while lists can be a good choice for sequences of elements that need to be added or removed frequently. LinkedList<T> is a good choice for the rapid insertion or deletion of elements in the middle of the list. SortedList<TKey, TValue> and Dictionary<TKey, TValue> are better for collections of unique elements that need to be accessed or retrieved frequently. HashSet<T> is a good choice for collections of unique elements that need to be added, retrieved, or searched frequently. Understanding the reasons behind the performance characteristics, enumeration costs, and sorting methods of each collection can help you choose the right collection for the job and ensure optimal performance. Let’s see the specifications of some collections and their performance impact.

Arrays

Arrays are the simplest form of collections in C#. They are fixed in size and can only store elements of the same type. Accessing elements by index is very fast with a time complexity of O(1), because elements are stored contiguously in memory. However, adding or removing elements from an array can be slow if the array needs to be resized because a new block of memory needs to be allocated, and the elements need to be copied to the new block, with a time complexity of O(n). Enumerating over an array is fast with a time complexity of O(n) because the elements are stored contiguously in memory.

Arrays are often used when we know the size of the collection in advance and need to access elements frequently by their index. Since the elements are stored in contiguous memory locations, they can be accessed in constant time. However, inserting or deleting elements from an array requires the entire array to be shifted, which can be time-consuming.

Lists

Lists, such as List<T> and ArrayList, are dynamic arrays that can grow or shrink as needed. Adding and removing elements from a list is fast with a time complexity of O(1), because the underlying array is resized automatically when needed. Accessing elements by index is also fast, with a time complexity of O(1), because the elements are stored contiguously in memory. However, searching for an element in a list can be slow, especially if the list is very large because the list needs to be traversed sequentially, with a time complexity of O(n). Enumerating over a list is fast with a time complexity of O(n), because the elements are stored contiguously in memory.

Lists are often used when we need to add or remove elements frequently, and we need to know the size of the collection in advance. Since the underlying array is resized automatically, we don’t need to worry about the collection size. However, searching for an element in a list can be slow, especially for large lists, because we need to traverse the list sequentially.

LinkedList<T>

The LinkedList<T> collection is a doubly linked list. Adding and removing elements from a LinkedList<T> is fast, even in the middle of the list, with a time complexity of O(1), because only the previous and next nodes need to be updated. However, accessing elements by index is slow because the list needs to be traversed sequentially, with a time complexity of O(n). The reason for this is that each node only contains a reference to the next and previous nodes, not to the nodes in between. Enumerating over a linked list is fast with a time complexity of O(n) because each node contains a reference to the next node.

LinkedLists are often used when we need to add or remove elements frequently, especially in the middle of the collection. Since only the previous and next nodes need to be updated, adding or removing elements is fast. However, accessing elements by index can be slow, as we need to traverse the list sequentially.

SortedList<TKey, TValue>

The SortedList<TKey, TValue> collection is a sorted dictionary that stores key-value pairs. Adding and retrieving elements from a SortedList<TKey, TValue> is fast, as long as the keys are well distributed, with a time complexity of O(log n). The reason for this is that the collection uses a binary search algorithm to find the correct position for the elements. SortedList<TKey, TValue> is sorted based on the keys provided to the collection. However, iterating over the elements in a SortedList<TKey, TValue> can be slow, especially if the list is very large, with a time complexity of O(n). Enumerating over a sorted list is fast with a time complexity of O(n) because the elements are stored contiguously in memory.

SortedList<TKey, TValue> is often used when we need to store key-value pairs in sorted order and access them frequently by their key. Since the collection uses a binary search algorithm to find the correct position for the elements, accessing and retrieving elements is fast. However, iterating over the elements in the collection can be slow, especially for large collections. SortedList<TKey, TValue> uses binary tree under the hood. You can learn more about binary trees in this article.

Dictionary<TKey, TValue>

The Dictionary<TKey, TValue> collection is a hash table that stores key-value pairs. Adding and retrieving elements from a Dictionary<TKey, TValue> is fast, as long as the keys are well distributed, with a time complexity of O(1). The reason for this is that the collection uses a hash function to compute the element’s index in the underlying array. However, iterating over the elements in a Dictionary<TKey, TValue> can be slow, especially if the hash table is very large, with a time complexity of O(n). Enumerating over a dictionary is fast with a time complexity of O(n), because the elements are stored unordered.

Dictionary<TKey, TValue> is often used when we need to store key-value pairs, and we need to access them frequently by their key. Since the collection uses a hash function to compute the element’s index, accessing and retrieving elements is fast. However, iterating over the elements in the collection can be slow, especially for extensive collections.

HashSet<T>

The HashSet<T> collection is a set that stores unique elements. Adding and retrieving elements from a HashSet<T> is fast as long as the hash function is well distributed, with a time complexity of O(1). The reason for this is the same as for Dictionary<TKey, TValue>. However, iterating over the elements in a HashSet<T> can be slow, especially if the set is very large, with a time complexity of O(n). Enumerating over a hash set is fast with a time complexity of O(n) because the elements are stored in an unordered way.

HashSet<T> is often used when we need to store a collection of unique elements and access them frequently. Since the collection uses a hash function to compute the index of the element, adding, retrieving, and searching for elements is fast. However, iterating over the elements in the collection can be slow, especially for large collections.

Non-Generic C# Collections Performance

Non-generic collections, such as ArrayList and Hashtable, can be less efficient than generic collections because they require type-casting, which can result in runtime errors and performance overhead. When using non-generic collections, it’s essential to ensure that the elements added to the collection are of the correct type. However, generic collections, such as List<T> and Dictionary<TKey, TValue>, are strongly typed and don’t require type-casting, making them more efficient and less error-prone. It’s generally recommended to use generic collections whenever possible for better type safety and performance.

Complexity Analysis

Finally, I’ll also add a more complete list of C# collections and the time complexity of their essential operations to help you make better decisions when choosing collection types. If you want to skyrocket your C# career, check out our powerful ASP.NET full-stack web development course that also covers test-driven development. You can also learn more about reducing time complexity of loops in this article.

Array

Add: O(1)

List

Add: O(1) amortized, O(n) worst-case
Remove: O(1) best-case, O(n) worst-case (counting the navigation to the node)
Navigate by index: O(1)
Insert: O(n) worst-case
IndexOf: O(n) worst-case

ImmutableList

Add: O(log n)
Remove: O(log n)
Access by index: O(log n)
Enumerator: O(n)
Insert: O(n) worst-case

Dictionary

Add: O(log n)
Lookup: O(log n)

ImmutableDictionary

Add: O(log n)
Lookup: O(log n)

LinkedList

Add: O(1)
Remove: O(1)
Insertion: O(1), and if lookups or movements are needed it can be O(n)

Queue

Enqueue: O(1) amortized, O(n) worst-case
Enumeration: O(n)

ImmutableQueue

Add: O(1)

Stack

Push: O(1) amortized, O(n) worst-case
Enumeration: O(n)

You can read more about stacks in this article.

ImmutableStack

Push: O(1)
Enumeration: O(n)

ObservableCollection

Add: O(1) best-case, O(1) average-case
Remove: O(1) best-case, O(1) average-case, O(n) worst-case
Access: O(1) best-case, O(1) average-case, O(n) worst-case

SortedDictionary

Add: O(log n) amortized, O(n log n) worst-case

ImmutableSortedDictionary

Add: O(log n)

HashSet

Add: O(1) amortized, O(n) worst-case

ImmutableHashSet

Add: O(log n)

SortedSet

Add: O(log n) amortized, O(n) worst-case

ImmutableSortedSet

Add: O(log n)

Conclusion

The performance of C# collections depends on the specific use case. Arrays are good choices for fixed-size collections that need to be accessed frequently by index. Lists and ArrayList are good choices for sequences of elements that need to be added or removed frequently. LinkedList<T> is a good choice for rapid insertion or deletion of elements in the middle of the list. SortedList<TKey, TValue> and Dictionary<TKey, TValue> are better for collections of unique elements that need to be accessed or retrieved frequently. HashSet<T> is a good choice for collections of unique elements that need to be added, retrieved, or searched frequently. Understanding the reasons behind the performance characteristics, enumeration costs, and sorting methods of each collection can help you choose the right collection for the job and ensure optimal performance. By the way, did you know that we offer a unique online course that boosts your C# career? Check it out here!

Learn more

Lost in coding? Discover our Learning Paths!