【茶包射手筆記】Dictionary.Add()效能問題調查
2 |
依據MSDN文件:
If Count is less than the capacity, this method approaches an O(1) operation. If the capacity must be increased to accommodate the new element, this method becomes an O(n) operation, where n is Count.
只要Dictionary資料筆數沒超過Capacity(Capacity可在建構時指定),資料新增屬O(1)操作,每次呼叫的執行速度不因Dictonary資料量多寡改變,理論上為定值。
網友Cb Atm提供了一則打破O(1)規則的奇妙案例。經我簡化改寫如下:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
namespace DictKey
{
class Program2
{
const int CAPACITY = 20000;
public struct Index
{
public string StkNo;
public byte TypeCode;
public int Idx;
}
static Random rnd = new Random(32767);
static List<Index> stocks =
Enumerable.Range(0, CAPACITY).Select(
o => new Index()
{
Idx = o,
StkNo = rnd.Next(1000, 1011).ToString(),
TypeCode = (byte)rnd.Next(0, 10)
}).ToList();
static void Test1()
{
var dictStruct = new Dictionary<Index, int>(CAPACITY);
Action<int> test = (count) =>
{
Stopwatch sw = new Stopwatch();
dictStruct.Clear();
sw.Start();
for (var i = 0; i < count; i++)
{
dictStruct.Add(stocks[i], i);
}
sw.Stop();
Console.WriteLine("{1:n0} items: {0:n0}ms",
sw.ElapsedMilliseconds, dictStruct.Count);
};
test(10000);
test(CAPACITY);
}
static void Main(string[] args)
{
Test1();
Console.Read();
}
}
}
程式宣告一個Index Struct作為Dictionary的TKey型別,測試時使用亂數產生兩萬筆資料(但StkNo屬性值限定介於1000-1010,TypeCode限定0-9,Idx欄位取流水號具唯一性),分別測試Dictionary<Index, int>.Add()一萬筆及兩萬筆,依O(1)理論,後者的時間應為前者兩倍,事實不然:
10,000 items: 1,993ms
20,000 items: 7,869ms
塞入兩萬筆耗時為塞入一萬筆的四倍左右,為什麼?
做了一個小改寫,移除StkNo亂數取值的範圍限制,改成StkNo = rnd.Next().ToString()(實際範圍放大成int.MinValue到int.MaxValue),得到完全不同結果!
10,000 items: 6ms
20,000 items: 5ms
除了執行速度加快數百倍,加入一萬筆與加入兩萬筆的時間不分上下。(表示Add()所耗時間已非瓶頸,差異微小到可被忽略)
再試改寫成StkNo = rnd.Next(1000,1100).ToString(),讓StkNo介於1000到1099,結果則為:
10,000 items: 247ms
20,000 items: 877ms
速度比最初版本快8-9倍,塞入兩萬筆的時間為塞入一萬筆的3.5倍。
過去對雜湊表原理了解有限,歸納可知StkNo多樣性影響效能甚鉅,但不知其所以然。爬文查到Struct作為TKey時,鍵值比對時與GetHashCode()有關,但再多理論都比不上Source Code權威及清楚明瞭。使用JustDecompile揭開Dictionary<TKey, TValue> .Add()神祕面紗:
private void Insert(TKey key, TValue value, bool add)
{
int num;
if (key == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
}
if (this.buckets == null)
{
this.Initialize(0);
}
int hashCode = this.comparer.GetHashCode(key) & 2147483647;
int length = hashCode % (int)this.buckets.Length;
int num1 = 0;
for (int i = this.buckets[length]; i >= 0; i = this.entries[i].next)
{
if (this.entries[i].hashCode == hashCode &&
this.comparer.Equals(this.entries[i].key, key))
{
if (add)
{
ThrowHelper.ThrowArgumentException(
ExceptionResource.Argument_AddingDuplicate);
}
this.entries[i].@value = value;
Dictionary<TKey, TValue> dictionary = this;
dictionary.version = dictionary.version + 1;
return;
}
num1++;
}
依其程式邏輯,Insert時(Add背後會呼叫Insert(key, value, true))先用GetHashCode()取得key參數的hashCode值,接著依Bucket分群規則(取hashCode除以Bucket個數的餘數,餘數相同者放在同一Bucket)比對同一Bucket的現存Key值,若hashCode相同且新舊Key值.Equals()也相等,依add參數決定要拋出鍵值重複錯誤還是取代現值。延伸閱讀
由此可知:
- hashCode可能重複,而多個hashCode可能同屬一個Bucket
- 每次比對時先由hashCode找出其所屬Bucket,一一比對該Bucket下是否已存在相同鍵值
- 比對鍵值是否相同的邏輯為 this.entries[i].hashCode == hashCode && this.comparer.Equals(this.entries[i].key, key),當hashCode相同才會進行第二階段Equals()比對。
故可推論,當hashCode重複的機率愈高,鍵值會集中在少數Bucket中,額外執行Equals()的次數愈高,勢必拖累效能。
所以新增一萬筆會耗時近2秒是因為StkNo變化太少導致Hash Code大量重複嗎?用以下段程式驗證:
static void Test2()
{
var hashCodeStats = new Dictionary<int, int>();
for (var i = 0; i < 20000; i++)
{
var hashCode = stocks[i].GetHashCode();
if (hashCodeStats.ContainsKey(hashCode))
hashCodeStats[hashCode]++;
else
hashCodeStats.Add(hashCode, 1);
}
Console.WriteLine("共有{0}種HashCode", hashCodeStats.Count);
}
測試結果:
StkNo = rnd.Next(1000, 1011).ToString() 兩萬筆資料只有11種HashCode
StkNo = rnd.Next(1000, 1100).ToString() 兩萬筆資料只有100種HashCode
StkNo = rnd.Next().ToString() 兩萬筆資料有兩萬種HashCode
雖然兩萬筆資料有唯一的Idx屬性,還配上不同TypeCode,但GetHashCode()似乎只跟StkNo有關,使用以下程式進一步證實這點:
static void Test3()
{
var idx1 = new Index()
{
StkNo = "AAAA",
Idx = 1,
TypeCode = 2
};
var idx2 = new Index()
{
StkNo = "AAAA",
Idx = 100,
TypeCode = 200
};
var idx3 = new Index()
{
StkNo = "AAAB",
Idx = 1,
TypeCode = 2
};
Console.WriteLine("{0} vs {1} vs {2}",
idx1.GetHashCode(),
idx2.GetHashCode(),
idx3.GetHashCode());
}
結果為80968112 vs 80968112 vs -651627219,StkNo相同GetHashCode()就相同。
由以上分析,想改善新増效能有兩個解決方向:1) 改用唯一值Idx當鍵值 2) 解決Index.GetHashCode()大量重複的狀況
第一種做法可將Dictionary<Index, int>改為Dictionary<int, int>:
static void Test1A()
{
var dictStruct = new Dictionary<int, int>(CAPACITY);
Action<int> test = (count) =>
{
Stopwatch sw = new Stopwatch();
dictStruct.Clear();
sw.Start();
for (var i = 0; i < count; i++)
{
dictStruct.Add(stocks[i].Idx, i);
}
sw.Stop();
Console.WriteLine("{0:n0} items: {1:n0}ms {2:n0}ticks",
dictStruct.Count,
sw.ElapsedMilliseconds,
sw.ElapsedTicks);
};
for (var i = 0; i < 5; i++)
{
test(10000);
test(CAPACITY);
}
}
修改後,一萬筆及兩萬筆的執行時間都變成0ms,速度快到測不出來,需改比ElapsedTicks。程式跑五次,排除第一次暖機期,兩萬筆的執行時間約為一萬筆的兩倍。
10,000 items: 1ms 3,471ticks
20,000 items: 0ms 2,248ticks
10,000 items: 0ms 545ticks
20,000 items: 0ms 1,127ticks
10,000 items: 0ms 561ticks
20,000 items: 0ms 1,140ticks
10,000 items: 0ms 540ticks
20,000 items: 0ms 1,082ticks
10,000 items: 0ms 573ticks
20,000 items: 0ms 1,113ticks
但修改成Dictionary<int, int>有後遺症,取得鍵值數字後需另外查表才態拿到Index型別,程式需配合改寫。還有第二種做法,我們override Index的GetHashCode()傳回更具唯一性的雜湊值(流水號整數Idx是很好的選擇),如下:
public struct Index
{
public string StkNo;
public byte TypeCode;
public int Idx;
public override int GetHashCode()
{
return Idx;
}
}
static void Test1()
{
Action<int> test = (count) =>
{
var dictStruct = new Dictionary<Index, int>(CAPACITY);
Stopwatch sw = new Stopwatch();
dictStruct.Clear();
sw.Start();
for (var i = 0; i < count; i++)
{
dictStruct.Add(stocks[i], i);
}
sw.Stop();
Console.WriteLine("{0:n0} items: {1:n0}ms {2:n0}ticks",
dictStruct.Count,
sw.ElapsedMilliseconds,
sw.ElapsedTicks);
};
for (var i = 0; i < 5; i++)
{
test(10000);
test(CAPACITY);
}
}
測試結果如下,自訂GetHashCode()後,ticks數為Dictionary<int, int>的兩倍,但執行時間已加速到小於1ms,兩萬筆耗時約為一萬筆的2.5倍:
10,000 items: 3ms 11,730ticks
20,000 items: 0ms 2,292ticks
10,000 items: 0ms 1,471ticks
20,000 items: 0ms 2,035ticks
10,000 items: 0ms 1,089ticks
20,000 items: 0ms 2,749ticks
10,000 items: 0ms 1,064ticks
20,000 items: 0ms 2,247ticks
10,000 items: 0ms 1,078ticks
20,000 items: 0ms 2,677ticks
【結論】使用Struct作為Dictionary TKey,若GetHashCode()值重複性過高,將嚴重影響Add()效能,應考慮自訂GetHashCode()增加差異性加以改善,或改用string或int等單純型別以達最佳效能。
Comments
# by 丫土伯
感謝分享!!
# by hkk
感謝分享~~~