Friday Links 0.0.22 - StringBuilder Updates

This is based on an email I send my .NET team at work

Happy Friday,

Let’s take a deep dive into one of the most beloved portions of the .NET base class library: the venerable StringBuilder.

StringBuilder: the Past and the Future

Timur Guev has a neat article looking into the internals of System.Text.StringBuilder.

When .NET was first released, the implementation was mostly the same as List: whenever it ran out of space, it would double its capacity, copy the existing text, and start working on the new, bigger buffer.

In .NET 4, they changed the internals to instead be a Linked List of buffers. This is sometimes called a Rope. As you Append() text to the builder, eventually its current buffer runs out of space, so a new one is allocated, and a pointer is kept in the internal book-keeping. In this algorithm, there is no need for copying strings around during the Append operation. However, when it becomes time to create a real string object, StringBuilder has to allocate a big enough block of memory, then walk the linked list copying each buffer to the output.

They did this because the most common use-case for StringBuilder is some kind of tight loop that calls Append() a bunch of times, before finally grabbing the resulting string with ToString(). The new algorithm is much better designed for this scenario. There is a lot less copying of character bytes, and no extra allocations of arrays for strings. This reduces CPU time and garbage collection pressure.

However, other methods of StringBuilder suffer: Insert(), Remove() etc incur extra bookkeeping and copying operations compared to the previous implementation that just kept the entire data in an array. Also, the final ToString() call is slower, because it has to allocate a new string for the result and copy the data into it. Prior to .NET 4, the StringBuilder could just return a pointer to its internal buffer.

This is a really good example of encapsulation. The external interface of StringBuilder did not change at all in .NET 4, though its internals were completely reworked to target a different performance profile.

It’s also a good example of the tradeoffs involved in performance work. The framework designers decided it was better overall to improve the most common use case, even if some other scenarios would suffer.

Check out the link for a deeper look and some performance timings the author did to demonstrate the tradeoffs.