This is based on an email I send my .NET team at work
Let’s take a deep dive into one of the most beloved portions of the .NET base class library: the venerable StringBuilder.
Timur Guev has a neat article looking into the internals of System.Text.StringBuilder.
When .NET was first released, the implementation was mostly the same as List: whenever it ran out of space, it would double its capacity, copy the existing text, and start working on the new, bigger buffer.
In .NET 4, they changed the internals to instead be a Linked List of buffers.
This is sometimes called a Rope. As you
Append() text to the builder,
eventually its current buffer runs out of space, so a new one is allocated, and
a pointer is kept in the internal book-keeping. In this algorithm, there is no
need for copying strings around during the Append operation. However, when it
becomes time to create a real
string object, StringBuilder has to allocate a
big enough block of memory, then walk the linked list copying each buffer to the
They did this because the most common use-case for StringBuilder is some kind of
tight loop that calls
Append() a bunch of times, before finally grabbing the
resulting string with
ToString(). The new algorithm is much better designed
for this scenario. There is a lot less copying of character bytes, and no extra
allocations of arrays for strings. This reduces CPU time and garbage collection
However, other methods of StringBuilder suffer:
Remove() etc incur
extra bookkeeping and copying operations compared to the previous implementation
that just kept the entire data in an array. Also, the final
ToString() call is
slower, because it has to allocate a new string for the result and copy the data
into it. Prior to .NET 4, the StringBuilder could just return a pointer to its
This is a really good example of encapsulation. The external interface of StringBuilder did not change at all in .NET 4, though its internals were completely reworked to target a different performance profile.
It’s also a good example of the tradeoffs involved in performance work. The framework designers decided it was better overall to improve the most common use case, even if some other scenarios would suffer.
Check out the link for a deeper look and some performance timings the author did to demonstrate the tradeoffs.