Discussion on the influence of memory fragments on the performance of C # StringBuilder

Time:2021-1-24

The internal of StringBuilder is a semi-automatic linked list composed of multi segment char, so frequent modification of StringBuilder from the middle will separate the original continuous memory into multiple segments, thus affecting the read / traverse performance.

The performance of continuous memory and discontinuous memory is poor, which may be as high as 1600 times.

background

Most users who use StringBuilder want to use StringBuilder to splice HTML / JSON templates, assemble dynamic SQL and other normal operations. However, in some special scenarios, such as writing language services for a programming language or writing a rich text editor, StringBuilder still has a place to use. It can be modified by the two methods of insert / remove.

test method

Talk is cheap, show me the code:


int docLength = 10000;
void Main()
{
  (from power in Enumerable.Range (1, 16)
  let mutations = (int) Math.Pow (2, power)
  select new
  {
    mutations,
    PerformanceRatio = Math.Round (GetPerformanceRatio (docLength, mutations), 1)
  }).Dump();
}

float GetPerformanceRatio (int docLength, int mutations)
{
  var sb = new StringBuilder ("".PadRight (docLength));
  var before = GetPerformance (sb);
  FragmentStringBuilder (sb, mutations);
  var after = GetPerformance (sb);
  return (float) after.Ticks / before.Ticks;
}

void FragmentStringBuilder (StringBuilder sb, int mutations)
{
  var r = new Random(42);
  for (int i = 0; i < mutations; i++)
  {
    sb.Insert (r.Next (sb.Length), 'x');
    sb.Remove (r.Next (sb.Length), 1);
  }
}

TimeSpan GetPerformance (StringBuilder sb)
{
  var sw = Stopwatch.StartNew();
  long tot = 0;
  for (int i = 0; i < sb.Length; i++)
  {
    char c = sb[i];
    tot += (int) c;
  }
  sw.Stop();
  return sw.Elapsed;
}

For this code, please note the following:

  • Through. Padright (n), we can directly create a blank string of length N, which can be replaced by new string (‘, n);
  • At the new random (42), I assigned a random factor to ensure that the position of the separation is exactly the same after each separation, which is conducive to the control group;
  • I made 2 ^ 1 ~ 2 ^ 16 modifications to the string, and compared the performance differences after so many modifications;
  • I use sb [i] to access the locations in StringBuilder one by one to highlight the memory discontinuity.

Running results

mutations PerformanceRatio
2 1
4 1
8 1
16 1
32 1
64 1.1
128 1.2
256 1.8
512 5.2
1024 19.9
2048 81.3
4096 274.5
8192 745.8
16384 1578.8
32768 1630.4
65536 930.8

It can be seen that if a large number of modifications are made in the middle of StringBuilder, its performance will drop sharply. Note that in the case of 32768 modifications, the traversal performance will be as poor as 1630.4 times!

Solutions

If you must use StringBuilder, you can consider re creating a new StringBuilder after modifying it for a certain number of times, so as to obtain the best memory continuity during access

void FragmentStringBuilder (StringBuilder sb, int mutations)
{
  var r = new Random(42);
  for (int i = 0; i < mutations; i++)
  {
    sb.Insert (r.Next (sb.Length), 'x');
    sb.Remove (r.Next (sb.Length), 1);
    
    //Key points
    const int defragmentCount = 250;
    if (i % defragmentCount == defragmentCount - 1)
    {
      string buf = sb.ToString();
      sb.Clear();
      sb.Append(buf);
    }
  }
}

As mentioned above, after 250 times of modification, the original StringBuilder will be deleted, and then a new StringBuilder will be created. The running effect is as follows:

mutations PerformanceRatio
2 1.2
4 0.7
8 1
16 1
32 1
64 1.1
128 1.2
256 1
512 1
1024 1
2048 1
4096 1.1
8192 1.5
16384 1.3
32768 1
65536 1

It can be seen that in almost all cases, the problem of access performance caused by memory discontinuity should be solved. At the same time, 250 may be a relatively reasonable number to achieve a balance between insertion performance and query / traversal performance.

Reflection and summary

As we all know, because of the immutability of string, it will waste a lot of memory when splicing a large number of strings. But using StringBuilder also needs to understand its structure.

It’s not without reason that StringBuilder makes a chained structure like this. If we consider the insertion performance, the chained interface is the best. But if we consider the query performance, the chain structure is very disadvantageous. If we design a non chain structure, when inserting from the middle, the memory space of StringBuilder may not be enough, so we need to reallocate the memory, which is equivalent to reducing StringBuilder to string, so we completely lose the advantage of StringBuilder for “frequent insertion”.

This article is actually a very special example. In reality, in addition to language services and editors, there are few scenarios that require frequent insertion and frequent modification. If you want to make it easier, StringBuilder is a conditional and appropriate solution. A more suitable solution is, of course, a special data structure — piecetable. In the vscode editor, Microsoft uses this data structure to ensure the editing performance of large files, and has achieved very good results. Please refer to the link: text buffer reimplementation.

Here is the article about the impact of memory fragmentation on performance of StringBuilder. For more information about memory fragmentation of StringBuilder, please search previous articles of developer or continue to browse the following articles. I hope you can support developer more in the future!