Strings are a fundamental component in software development. Most programming languages have the concept of a string. In .NET, strings are a different than any other primitive type. Lets take a look at what makes them unique and why they are implemented differently.
Modern programming languages categorize data types as either reference or value types. Variables of value types directly contain the value of the data type. Variables of reference types however, store a reference to the value. Value types store data on the stack whereas reference types store data on the heap.
Strings are commonly misunderstood as a value type however, they are the only built-in type in .NET that is not a value type. The string is a reference type.
But why? What is special about strings that requires them to be stored differently?
When value types are passed around, the data is copied. Below is a couple examples of when this would occur.
- Passing a variable into a function or constructor
- Assigning a value to another variable or property
All other primitive types, such as int, bool, or long, have a defined size and are relatively compact. Strings on the other hand are dynamic in the amount of data they consume and can be very large in size. Copying large sets of data around in an application will have a negative impact on performance. A pointer on the other hand is a 32/64 bit memory address and thus more efficient to copy.
For example, an int requires 4 bytes whereas the string “Please excuse my dear aunt Sally” requires 80 bytes. This is a relatively small string and as you can imagine with no hard limit on the length, strings can get huuuuuge very quickly.
Strings are also immutable. String data is stored in a consecutively block of data on the heap. Allowing strings to change dynamically would be a memory management nightmare. In this case, if a string were expanded, blocks of data in surrounding locations would need to be relocated. As you can imagine, this would be extremely inefficient. If the inverse occurred, you would have fragmentation.
To address this problem, strings are designed to be immutable in .NET. This means any time a string is altered, a new instance is created. This deviates significantly from all other reference types in .NET.
We can see this in action by running the following code snippet in Visual Studio.
var singular = "Robert"; // 0x0000020252992DE8 var plural = singular; // 0x0000020252992DE8 plural = plural + "s"; // 0x0000020252997FB0
Visual Studio allows us to view the memory address of a variable which I have placed next to each line in comments. As you can see, initially the two variables (singular, plural) are referencing the exact same memory address. This is the expected behavior for reference types. In the last line, we see merely concatenating a “s” to the end of our variable changes the referenced address. This highlights immuability of the string type.
As discussed, strings are immutable in .NET. This means any time a string is altered, a new instance is created. When frequent changes are being made to a string, this can lead to a lot of waste. Depending on the size of the string and the number of modifications that are made, it can have a direct impact on performance. To avoid this unnecessary overhead, it is recommended to use the StringBuilder class in these situations.
The StringBuilder class allows the manipulation or “building” of a string without creating redundant instances of data. Once modifications are completed, the StringBuilder can be directly converted to a string. See an example on Microsoft’s docs here.
The string is a unique type in .NET and frequently gets confused as a value type. In addition to the following characteristics, remember to use the StringBuilder class when frequent modifications are being made!
- Strings are reference types
- Strings are immutable
- The default value of strings is null