Strings are a fundamental component in software development. Most programming languages have the concept of a string. In .NET, strings are different than any other primitive type. Let us look at what makes string’s unique and why they should be implemented differently.
Modern programming languages categorize data types as either reference or value types. Variables of value types directly contain the value of the data type. Variables of reference types, however, store a reference to the value. Value types store data on the stack whereas reference types store data on the heap.
Strings are commonly mistaken for value types but, they are the only built-in type in .NET that is not a value type. The string is a reference type.
But why? What is unique about strings that require them to be stored differently? Performance has a lot to do with it. When value types are passed around, the data is copied. Below is a couple of examples of when this would occur.
- Passing a variable into a function or constructor
- Assigning a value to another variable or property
All other primitive types, such as int, bool, or long, have a defined size and are relatively compact. On the other hand, strings are dynamic in the amount of data they consume and can be very large. Copying large sets of data around in an application will hurt performance. On the other hand, a pointer is a 32/64 bit memory address and thus more efficient to copy.
For example, an int requires 4 bytes, whereas the string “Please excuse my dear aunt Sally” requires 80 bytes. This string is relatively small, and as you can imagine, with no hard limit on the length, strings can get huuuuuge very quickly.
Strings are also immutable. Strings get stored in a consecutive block of data on the heap. Allowing them to change dynamically would be a memory management nightmare. In this case, if a string were expanded, blocks of data in surrounding locations would need to be relocated. As you can imagine, this would be extremely inefficient. If the inverse occurred, you would have fragmentation.
To address this problem, strings are designed to be immutable in .NET. Any time a string is altered, a new instance gets created. This strategy deviates significantly from all other reference types in .NET.
We can see this in action by running the following code snippet in Visual Studio.
var singular = "Robert"; // 0x0000020252992DE8 var plural = singular; // 0x0000020252992DE8 plural = plural + "s"; // 0x0000020252997FB0
Visual Studio allows us to view the memory address of a variable, which I have placed next to each line in the comments. As you can see, initially, the two variables (singular, plural) are referencing the same memory address. This is the expected behavior for reference types. In the last line, we see that merely concatenating an “s” to the end of our variable changes the referenced address. This example highlights the immutability of strings.
As discussed, strings are immutable in .NET. Any time a string gets altered, a new instance is created. When frequent changes are made, we can have a lot of wasted memory. Depending on the string’s size and the number of modifications, these memory allocations can directly impact performance. To avoid the unnecessary overhead, we can use the StringBuilder class in these situations.
The StringBuilder class allows the “building” of a string without creating redundant data allocations. Once the string has been prepared, the StringBuilder outputs a concatenated string. See an example on Microsoft’s docs here.
The string is a unique type in .NET and frequently gets confused as a value type. In addition to the following characteristics, remember to use the StringBuilder class when frequent concatenations are made!
- Strings are reference types
- Strings are immutable
- The default value of strings is null