前陣子怒讀一波演算法入門書,而排序是每本演算法書永不缺席的章節:Bubble Sort、Selection Sort、Insertion Sort、Heap Sort、Merge Sort、Quick Sort... 多不勝數(維基百科整理的更多)。看完這一堆排序演算法不免好奇:那 .NET 的 Array.Sort() 方法是用哪一種呢?未查先猜,一定不是 Bubble 或 Selection 這類效率較差的 O(n2),感覺 Quick Sort - O(n log n) 的可能性最高。

原以為要花一番功夫找答案,意外發現本草綱目就有記載:

This method uses the introspective sort (introsort) algorithm as follows:

  • If the partition size is less than or equal to 16 elements, it uses an insertion sort algorithm.
  • If the number of partitions exceeds 2 * LogN, where N is the range of the input array, it uses a Heapsort algorithm.
  • Otherwise, it uses a Quicksort algorithm.

This implementation performs an unstable sort; that is, if two elements are equal, their order might not be preserved. In contrast, a stable sort preserves the order of elements that are equal.
This method is an O(n log n) operation, where n is length.

.NET Framework 4 and earlier versions used only the Quicksort algorithm.

.NET 4 (含)之前是用 Quick Sort,.NET 4.5+、.NET Core 乃至於 .NET 6+ 則是用 Introspective Sort(Introsort)。

Introsort(内省排序) 是一種混合型排序演算法,視情況採用 Quicksort、Insertion Sort 或 Heapsort,以求高效率處理一般情境,在最壞情境也能保持 O(n log n)。.NET 實作規則是:當 Partition Size 小於或等於 16 個元素時,採用 Insertion Sort,若 Partition 數量大於 2 * Log N (N 為陣列長度) 時改用 Heapsort,其他的狀況則使用 Quicksort。

.NET 身為開源專案,一大好處是允許我們在原始碼裡挖呀挖呀挖,觀摩世界頂尖的程式寫法。我在 mscorlib Array.cs (.NET Core+ 為 Array.cs大同小異) 找到 Array.Sort() 的實作邏輯,簡化並加上註解後如下:

// 排序,由 left 開始,長度為 length
internal void Sort(int left, int length)
{
#if FEATURE_CORECLR
    // .NET Core 用 IntrospectiveSort
    IntrospectiveSort(left, length);
#else
    // 4.5+ 用 IntrospectiveSort,否則用 DepthLimitedQuickSort
    if (BinaryCompatibility.TargetsAtLeast_Desktop_V4_5)
        IntrospectiveSort(left, length);
    else
        DepthLimitedQuickSort(left, length + left - 1, IntrospectiveSortUtilities.QuickSortDepthThreshold);
#endif
}

// Introsort 排序
private void IntrospectiveSort(int left, int length)
{
    if (length < 2) return; // 長度為 1 時,不用排序
    // 深度限制 2 log n,呼叫 Introsoft 實作函式
    IntroSort(left, length + left - 1, 2 * IntrospectiveSortUtilities.FloorLog2(keys.Length));
}

// Introsort 實作函式
private void IntroSort(int lo, int hi, int depthLimit)
{
    while (hi > lo)
    {
        // 計算分割大小 (比較範圍的元素個數)
        int partitionSize = hi - lo + 1;
        // IntrospectiveSortUtilities.IntrosortSizeThreshold = 16
        if (partitionSize <= IntrospectiveSortUtilities.IntrosortSizeThreshold)
        {
            if (partitionSize == 1) return; // 元素個數為 1 時,不用排序
            if (partitionSize == 2) // 元素個數為 2 時,直接比對二者由小到大排序
            {
                SwapIfGreaterWithItems(lo, hi);
                return;
            }
            if (partitionSize == 3) // 元素個數為 3 時,直接比對三者由小到大排序
            {
                SwapIfGreaterWithItems(lo, hi-1);
                SwapIfGreaterWithItems(lo, hi);
                SwapIfGreaterWithItems(hi-1, hi);
                return;
            }
            // 元素個數大於 3,使用InsertionSort
            InsertionSort(lo, hi);
            return;
        }
        // 元素個數大於 16,且深度限制為 0,使用 HeapSort  
        if (depthLimit == 0)
        {
            Heapsort(lo, hi);
            return;
        }
        // 深度限制減 1
        depthLimit--;
        // 取 pivot 值並分割,遞迴呼叫 IntroSort
        int p = PickPivotAndPartition(lo, hi);
        // 對右半部遞迴呼叫 IntroSort
        IntroSort(p + 1, hi, depthLimit);
        // 處理左半部
        hi = p - 1;
    }
}
// Quicksort Partition 演算法,分割出小於 pivot 及大於 pivot 的兩區塊
private int PickPivotAndPartition(int lo, int hi)
{
    // lo, hi 取中間數 mid
    int mid = lo + (hi - lo) / 2;
    // 第 lo, mid, hi 個物件三者由小到大排序
    SwapIfGreaterWithItems(lo, mid);
    SwapIfGreaterWithItems(lo, hi);
    SwapIfGreaterWithItems(mid, hi);
    // 第 mid 個為 pivot
    Object pivot = keys[mid];
    Swap(mid, hi - 1); // 把 pivot 放到最右邊
    // 由 lo 處理到 hi -1 個
    // 雙循環法(two-pointer approach)
    int left = lo, right = hi - 1;  

    while (left < right)
    {
        // left 開始向右找第一個大於等於 pivot 的物件
        while (comparer.Compare(keys[++left], pivot) < 0) ;
        // right 開始向左找第一個大於等於 pivot 的物件
        while (comparer.Compare(pivot, keys[--right]) < 0) ;
        // 若 left >= right,左邊的都比 pivot 小,右邊的都比 pivot 大,跳出迴圈
        if(left >= right)
            break;
        // 否則交換 left, right 物件
        Swap(left, right);
    }

    // 將 pivot 放到中間(left 與 right 相遇處))
    Swap(left, (hi - 1));
    return left;
}

關鍵邏輯位於 IntroSort() 函式。當元素個數小於等於 IntrosortSizeThreshold = 16 時用 InsertionSort 排序;depthLimit 初值為 2 * IntrospectiveSortUtilities.FloorLog2(keys.Length),每遞迴一次減 1,預設呼叫 PickPivotAndPartition() 取 pivot 並分區成左右兩區(這就是快速排序演算法),但若 depthLimit 已遞增到 0 (換言之,超過 2 * log n 層)便改用 Heapsort,這部分完全吻合文件的說明。而我從中學到一個小技巧:IntroSort() 中偵測 partitionSize 小於等於 16 時應改用 InsertionSort,但函式中針對兩個及三個元素,寫死用一次及三次 SwapIfGreaterWithItems 比大小搞定排序,有點像初學者面對輸入 a, b, c 排序練習題列舉各種 if 組合的硬幹解法,用了如下的兩兩相比寫死的邏輯,理由應是在簡單情境寫死比較置換比 InsertionSort 更有效率:

SwapIfGreaterWithItems(lo, hi-1);
SwapIfGreaterWithItems(lo, hi);
SwapIfGreaterWithItems(hi-1, hi);

依據官方文件,IntroSort 的 Big O 為 O(n log n),不如就寫段程式實證順便練功。我用 BenchmarkDotnet 跑測試,這回多學到一些技巧:包含用兩個 Params 組合出 0, 50, 100, 150, 200, ... 900, 950 等 20 種陣列長度變化,每次測試跑 2000 次 Array.Sort();而準備 int[] 陣列的部分,我學會用 GlobalSetup,將前置作業排除在測試範圍之外。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Engines;

namespace dotnet_sort_big_o
{
    [SimpleJob(warmupCount: 1, iterationCount: 20)]
    public class TestRunner 
    {
        [Params(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)]
        public int LenD1 { get; set; }
        [Params(0, 5)]
        public int LenD2 { get; set; }
        public int Len => (LenD1 * 10 + LenD2) * 10;

        public List<int[]> Arrays = new List<int[]>();

        [GlobalSetup]
        public void Setup() 
        {
            for (int i = 0; i < 2000; i++)
            {
                var rnd = new Random(42);
                var array = Enumerable.Range(0, Len).Select(i => rnd.Next()).ToArray();
                Arrays.Add(array);
            }            
        }
        [Benchmark]
        public void Test() 
        {          
            foreach (var array in Arrays)
            {
                Array.Sort(array);
            }
        }
    }
}

除了測 50、100 到 950,我另外用 [Params(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)] 跟 [Params(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)] 加測了陣列長度 0 ~ 99 的數字。得到結果如下:

陣列長度 0 ~ 99

| Method | LenD1 | LenD2 |       Mean |     Error |     StdDev |
|------- |------ |------ |-----------:|----------:|-----------:|
|   Test |     0 |     0 |   5.710 μs | 0.0277 μs |  0.0319 μs |
|   Test |     0 |     1 |   5.639 μs | 0.0217 μs |  0.0232 μs |
|   Test |     0 |     2 |  18.898 μs | 0.0463 μs |  0.0495 μs |
|   Test |     0 |     3 |  20.918 μs | 0.1085 μs |  0.1249 μs |
|   Test |     0 |     4 |  28.577 μs | 0.0607 μs |  0.0699 μs |
|   Test |     0 |     5 |  30.959 μs | 0.0853 μs |  0.0982 μs |
|   Test |     0 |     6 |  32.239 μs | 0.0772 μs |  0.0889 μs |
|   Test |     0 |     7 |  33.922 μs | 0.1303 μs |  0.1394 μs |
|   Test |     0 |     8 |  35.290 μs | 0.1047 μs |  0.1206 μs |
|   Test |     0 |     9 |  37.107 μs | 0.0809 μs |  0.0831 μs |
|   Test |     1 |     0 |  38.854 μs | 0.0933 μs |  0.1074 μs |
|   Test |     1 |     1 |  40.312 μs | 0.1251 μs |  0.1441 μs |
|   Test |     1 |     2 |  42.148 μs | 0.1277 μs |  0.1471 μs |
|   Test |     1 |     3 |  44.713 μs | 0.0750 μs |  0.0803 μs |
|   Test |     1 |     4 |  46.130 μs | 0.2715 μs |  0.3018 μs |
|   Test |     1 |     5 |  47.742 μs | 0.0728 μs |  0.0779 μs |
|   Test |     1 |     6 |  50.353 μs | 0.0936 μs |  0.1041 μs |
|   Test |     1 |     7 |  85.679 μs | 0.2042 μs |  0.2352 μs |
|   Test |     1 |     8 |  86.939 μs | 0.2040 μs |  0.2349 μs |
|   Test |     1 |     9 |  90.382 μs | 0.4898 μs |  0.5640 μs |
|   Test |     2 |     0 |  93.215 μs | 0.7015 μs |  0.8078 μs |
|   Test |     2 |     1 |  97.718 μs | 0.4894 μs |  0.5636 μs |
|   Test |     2 |     2 |  99.189 μs | 0.2937 μs |  0.3382 μs |
|   Test |     2 |     3 | 100.638 μs | 0.3644 μs |  0.4051 μs |
|   Test |     2 |     4 | 103.858 μs | 0.3496 μs |  0.4026 μs |
|   Test |     2 |     5 | 106.663 μs | 0.1750 μs |  0.2015 μs |
|   Test |     2 |     6 | 108.944 μs | 0.4984 μs |  0.5740 μs |
|   Test |     2 |     7 | 110.750 μs | 0.4879 μs |  0.5619 μs |
|   Test |     2 |     8 | 115.393 μs | 0.2587 μs |  0.2875 μs |
|   Test |     2 |     9 | 118.191 μs | 0.4107 μs |  0.4729 μs |
|   Test |     3 |     0 | 117.828 μs | 0.2634 μs |  0.2927 μs |
|   Test |     3 |     1 | 124.647 μs | 1.1958 μs |  1.3291 μs |
|   Test |     3 |     2 | 127.935 μs | 0.5391 μs |  0.5768 μs |
|   Test |     3 |     3 | 132.580 μs | 1.3971 μs |  1.6089 μs |
|   Test |     3 |     4 | 158.547 μs | 0.5920 μs |  0.6335 μs |
|   Test |     3 |     5 | 190.755 μs | 0.5570 μs |  0.6415 μs |
|   Test |     3 |     6 | 194.004 μs | 0.7728 μs |  0.8900 μs |
|   Test |     3 |     7 | 196.353 μs | 0.2862 μs |  0.3296 μs |
|   Test |     3 |     8 | 200.442 μs | 1.0078 μs |  1.1606 μs |
|   Test |     3 |     9 | 202.665 μs | 0.3765 μs |  0.4185 μs |
|   Test |     4 |     0 | 208.344 μs | 0.8387 μs |  0.9658 μs |
|   Test |     4 |     1 | 210.350 μs | 1.1501 μs |  1.3244 μs |
|   Test |     4 |     2 | 217.761 μs | 1.5537 μs |  1.7892 μs |
|   Test |     4 |     3 | 221.748 μs | 0.9031 μs |  1.0400 μs |
|   Test |     4 |     4 | 225.691 μs | 2.0334 μs |  2.1757 μs |
|   Test |     4 |     5 | 224.136 μs | 1.0479 μs |  1.2067 μs |
|   Test |     4 |     6 | 227.448 μs | 0.8845 μs |  0.9831 μs |
|   Test |     4 |     7 | 229.919 μs | 0.4677 μs |  0.5386 μs |
|   Test |     4 |     8 | 234.653 μs | 1.2766 μs |  1.4701 μs |
|   Test |     4 |     9 | 235.705 μs | 0.4590 μs |  0.5101 μs |
|   Test |     5 |     0 | 243.805 μs | 2.2094 μs |  2.5444 μs |
|   Test |     5 |     1 | 248.619 μs | 2.7381 μs |  2.8118 μs |
|   Test |     5 |     2 | 256.643 μs | 1.7531 μs |  1.7217 μs |
|   Test |     5 |     3 | 255.253 μs | 5.0313 μs |  5.7940 μs |
|   Test |     5 |     4 | 251.219 μs | 1.2352 μs |  1.4225 μs |
|   Test |     5 |     5 | 254.731 μs | 0.9442 μs |  1.0873 μs |
|   Test |     5 |     6 | 259.010 μs | 0.5908 μs |  0.6567 μs |
|   Test |     5 |     7 | 261.960 μs | 0.5854 μs |  0.6264 μs |
|   Test |     5 |     8 | 264.810 μs | 0.9071 μs |  1.0446 μs |
|   Test |     5 |     9 | 264.848 μs | 1.0344 μs |  1.1497 μs |
|   Test |     6 |     0 | 270.207 μs | 1.8606 μs |  2.1427 μs |
|   Test |     6 |     1 | 274.148 μs | 0.5923 μs |  0.6083 μs |
|   Test |     6 |     2 | 299.419 μs | 1.3444 μs |  1.4943 μs |
|   Test |     6 |     3 | 281.999 μs | 1.0593 μs |  1.1774 μs |
|   Test |     6 |     4 | 283.342 μs | 0.6659 μs |  0.6838 μs |
|   Test |     6 |     5 | 285.894 μs | 0.8600 μs |  0.8831 μs |
|   Test |     6 |     6 | 314.262 μs | 3.4290 μs |  3.8114 μs |
|   Test |     6 |     7 | 292.985 μs | 1.1344 μs |  1.3063 μs |
|   Test |     6 |     8 | 327.162 μs | 2.2583 μs |  2.4164 μs |
|   Test |     6 |     9 | 365.468 μs | 2.0848 μs |  2.4008 μs |
|   Test |     7 |     0 | 415.833 μs | 1.9440 μs |  2.2388 μs |
|   Test |     7 |     1 | 426.045 μs | 1.7358 μs |  1.7825 μs |
|   Test |     7 |     2 | 437.984 μs | 5.1031 μs |  5.8768 μs |
|   Test |     7 |     3 | 442.754 μs | 4.4155 μs |  4.9079 μs |
|   Test |     7 |     4 | 463.368 μs | 1.8091 μs |  2.0108 μs |
|   Test |     7 |     5 | 450.403 μs | 7.6363 μs |  8.7940 μs |
|   Test |     7 |     6 | 450.331 μs | 1.7208 μs |  1.9127 μs |
|   Test |     7 |     7 | 461.146 μs | 6.0132 μs |  6.9248 μs |
|   Test |     7 |     8 | 482.994 μs | 2.9871 μs |  3.3201 μs |
|   Test |     7 |     9 | 473.706 μs | 5.5566 μs |  6.1762 μs |
|   Test |     8 |     0 | 472.404 μs | 3.2371 μs |  3.7278 μs |
|   Test |     8 |     1 | 478.718 μs | 4.4685 μs |  4.7812 μs |
|   Test |     8 |     2 | 498.847 μs | 4.3461 μs |  5.0050 μs |
|   Test |     8 |     3 | 479.974 μs | 3.8859 μs |  4.4750 μs |
|   Test |     8 |     4 | 492.948 μs | 6.9081 μs |  7.9554 μs |
|   Test |     8 |     5 | 496.778 μs | 6.7481 μs |  7.7711 μs |
|   Test |     8 |     6 | 526.092 μs | 6.0473 μs |  6.4706 μs |
|   Test |     8 |     7 | 505.734 μs | 3.4814 μs |  3.5752 μs |
|   Test |     8 |     8 | 507.748 μs | 5.8875 μs |  6.7800 μs |
|   Test |     8 |     9 | 508.006 μs | 7.1174 μs |  8.1964 μs |
|   Test |     9 |     0 | 528.209 μs | 7.0112 μs |  8.0741 μs |
|   Test |     9 |     1 | 507.113 μs | 4.7240 μs |  5.4401 μs |
|   Test |     9 |     2 | 507.939 μs | 1.4786 μs |  1.7028 μs |
|   Test |     9 |     3 | 512.430 μs | 1.2267 μs |  1.2598 μs |
|   Test |     9 |     4 | 549.581 μs | 7.7179 μs |  8.8880 μs |
|   Test |     9 |     5 | 524.297 μs | 3.0229 μs |  3.4812 μs |
|   Test |     9 |     6 | 525.200 μs | 3.4227 μs |  3.6623 μs |
|   Test |     9 |     7 | 531.837 μs | 7.3588 μs |  8.4744 μs |
|   Test |     9 |     8 | 560.267 μs | 6.2295 μs |  7.1739 μs |
|   Test |     9 |     9 | 558.204 μs | 8.9318 μs | 10.2858 μs |

陣列長度 0, 50, 100, ... 到 950

| Method | LenD1 | LenD2 |         Mean |       Error |      StdDev |       Median |
|------- |------ |------ |-------------:|------------:|------------:|-------------:|
|   Test |     0 |     0 |     5.641 μs |   0.0080 μs |   0.0233 μs |     5.642 μs |
|   Test |     0 |     5 |   247.922 μs |   1.1649 μs |   3.4346 μs |   247.744 μs |
|   Test |     1 |     0 |   551.422 μs |   2.0030 μs |   5.7469 μs |   550.354 μs |
|   Test |     1 |     5 | 1,087.550 μs |   2.8918 μs |   8.3436 μs | 1,086.365 μs |
|   Test |     2 |     0 | 1,401.926 μs |  12.6731 μs |  37.1681 μs | 1,395.687 μs |
|   Test |     2 |     5 | 1,737.395 μs |  19.1032 μs |  56.3262 μs | 1,715.036 μs |
|   Test |     3 |     0 | 2,562.890 μs |  15.5050 μs |  45.2287 μs | 2,560.392 μs |
|   Test |     3 |     5 | 3,016.327 μs |  14.8594 μs |  43.8132 μs | 3,004.896 μs |
|   Test |     4 |     0 | 3,297.364 μs |  43.5566 μs | 128.4277 μs | 3,328.002 μs |
|   Test |     4 |     5 | 3,689.898 μs |  53.7011 μs | 158.3390 μs | 3,673.356 μs |
|   Test |     5 |     0 | 4,251.811 μs |  76.0871 μs | 224.3444 μs | 4,206.466 μs |
|   Test |     5 |     5 | 4,790.137 μs |  81.7200 μs | 240.9532 μs | 4,689.864 μs |
|   Test |     6 |     0 | 6,383.151 μs | 117.7184 μs | 347.0954 μs | 6,307.797 μs |
|   Test |     6 |     5 | 6,804.387 μs | 143.6865 μs | 423.6629 μs | 6,705.737 μs |
|   Test |     7 |     0 | 6,753.684 μs | 145.7984 μs | 425.3008 μs | 6,543.097 μs |
|   Test |     7 |     5 | 7,571.630 μs | 131.1472 μs | 386.6903 μs | 7,540.884 μs |
|   Test |     8 |     0 | 7,975.766 μs | 177.8472 μs | 524.3863 μs | 7,991.149 μs |
|   Test |     8 |     5 | 8,227.062 μs | 134.0590 μs | 395.2760 μs | 8,290.509 μs |
|   Test |     9 |     0 | 9,092.593 μs | 153.8763 μs | 453.7077 μs | 9,094.892 μs |
|   Test |     9 |     5 | 8,485.788 μs |  52.1685 μs | 153.0012 μs | 8,495.694 μs |

接著我想將結果轉成圖表。如果是以前,我會毫不猶豫寫幾行 C# 或 PowerShell 搞定,但在 2023 年練習用 AI 解決才是王道。試了 ChatGPT 跟 Bing Chat,調了幾版 Prompt,終於達成任務,花費時間比自己寫程式解析還久,但這不算選錯方法,而是我的吟唱技巧待加強:
(筆記:我的最終版 Prompt 為「請由以下文字取出 LenD1,LenD2, Mean,計算轉為CSV,只需要兩欄,第一欄位為 LenD1*100+LenD2*10,第二欄位取 Mean 不要 μs,資料間以","分隔」)

將資料貼入 Excel,順便練習繪成 X/Y 線圖,證明 Array.Sort() 執行時間大致吻合 N * Log(N, 2) 曲線!

心滿意足結束這回合綜合格鬥技演練。

Study of which sorting algorithm .NET Array.Sort() uses and benchmark test to observe its Big-O.


Comments

# by 佛弟子文獻學者孫守真任真甫

菩薩慈悲:末學也常是Bing大菩薩與chatGPT大菩薩混合著用,有時也請教YouChat大菩薩。感恩感恩 讚歎讚歎 南無阿彌陀佛 https://you.com/search?q=who+are+you&tbm=youchat&cfr=chat 現在則又有了 Bard大菩薩 (但目前只能用美語對話) https://bard.google.com/ 阿彌陀佛

# by Huang

吟唱技巧待加強....XD...和AI說話像在唸魔法咒語

Post a comment