在C#自訂物件型別,基於Referece Type特性,只有兩個變數指向同一物件,==或Equals()才會傳回true(如果對Reference Type跟Value Type間的差異感到模糊,可以來個小測驗自虐釐清一番),而這常不待我們的期待。以股票代號物件為例,假設有個Ticker物件,將股票代號分為Symbol(ex: 2330)與Market(ex: TW)兩部分,另外有FullSymbol傳回2330.TW:

public class Ticker
{
    public string Symbol { get; set; }
    public string Market { get; set; }
 
    public Ticker(string symbol, string market)
    {
        Symbol = symbol;
        Market = market;
    }
 
    public Ticker(string fullsymbol)
    {
        var p = fullsymbol.Split('.');
        if (p.Length != 2) throw new ArgumentException();
        Symbol = p[0];
        Market = p[1];
    }
 
    public string FullSymbol
    {
        get
        {
            return Symbol + "." + Market;
        }
    }
}

測試程式中,t1,t2的內容均為2330.TW,t3則指向t1,進行Equals()及==比對:

    static void Main(string[] args)
    {
        var t1 = new Ticker("2330", "TW");
        var t2 = new Ticker("2330.TW");
        var t3 = t1;
 
        Console.WriteLine("Equals Test: {0}", t1.Equals(t2));
        Console.WriteLine("== Test: {0}", t1 == t2);
        Console.WriteLine("== Test(Same Object): {0}", t1 == t3);
        Console.Read();
    }

結果t1.Equals(t2)與t1 == t2都傳回false,只有t1 == t3傳回true:

Equals Test: False
== Test: False
== Test(Same Object): True

依據MSDN文章教學,我們可以覆寫Equals()、==、!=運算子自訂Ticker比較規則,判定Symbol與Market都一致就相等:

public class Ticker
{
    public string Symbol { get; set; }
    public string Market { get; set; }
 
    public Ticker(string symbol, string market)
    {
        Symbol = symbol;
        Market = market;
    }
 
    public Ticker(string fullsymbol)
    {
        var p = fullsymbol.Split('.');
        if (p.Length != 2) throw new ArgumentException();
        Symbol = p[0];
        Market = p[1];
    }
 
    public string FullSymbol
    {
        get
        {
            return Symbol + "." + Market;
        }
    }
 
    //REF: https://msdn.microsoft.com/en-us/library/ms173147(v=vs.90).aspx
    public override bool Equals(System.Object obj)
    {
        // If parameter is null return false.
        if (obj == null) return false;
        // If parameter cannot be cast to Point return false.
        Ticker p = obj as Ticker;
        if ((System.Object)p == null) return false;
        // Return true if the fields match:
        return FullSymbol == p.FullSymbol;
    }
 
    public bool Equals(Ticker p)
    {
        // If parameter is null return false:
        if ((object)p == null) return false;
 
 
        // Return true if the fields match:
        return FullSymbol == p.FullSymbol;
    }
 
    public override int GetHashCode()
    {
        return FullSymbol.GetHashCode();
    }
 
    public static bool operator ==(Ticker a, Ticker b)
    {
        // If both are null, or both are same instance, return true.
        if (System.Object.ReferenceEquals(a, b)) return true;
        // If one is null, but not both, return false.
        if (((object)a == null) || ((object)b == null)) return false;
        // Return true if the fields match:
        return a.FullSymbol == b.FullSymbol;
    }
 
    public static bool operator !=(Ticker a, Ticker b)
    {
        return !(a == b);
    }
}

重新測試,Equals()與==比對結果會依Symbol與Market是否相同決定,符合我們的期望。

    static void Main(string[] args)
    {
        var t1 = new Ticker("2330", "TW");
        var t2 = new Ticker("2330.TW");
        var t3 = new Ticker("1234", "TW");
 
        Console.WriteLine("Equals Test: {0}", t1.Equals(t2));
        Console.WriteLine("== Test: {0}", t1 == t2);
        Console.WriteLine("!Equals Test: {0}", !t1.Equals(t3));
        Console.WriteLine("!= Test: {0}", t1 != t3);
        Console.Read();
    }

測試結果:

Equals Test: True
== Test: True
!Equals Test: True
!= Test: True

講完了?且慢!以上範例埋藏了一個錯誤。

同事轉來ReSharper的警告:Non-readonly fields referenced in GetHashCode(),GetHashCode的計算來源必須保證不會變動,而使用readonly欄位是最直接有效的做法。而我這才注意,MSDNTwoDPoint範例,其中的x, y就是readonly,代表它們只能在建構時指定,事後不得變更。而我原本的寫法使用FullSymbol.GetHashCode(),一旦Symbol或Market變動,GetHashCode()的結果就會不同。

Eric Lippert有篇GetHashCode須知,節錄摘要相關說明下:

Rule: 相等的項目,其Hash Code必定也相同

如果兩個物件相等,其Hash Code必定相等;反之,若兩物件Hash Code不相等,其Equals()必為false。
但依邏輯學,若兩個物件的Hash Code相等,不代表物件相等。(Hash Code只有40億種變化,存在不同物件擁有Hash Code相同的機率。)

Guideline: GetHashCode傳回的整數值永遠不可改變

理想上GetHashCode應由不會異動的欄位計算而得,在物件存在的生命週期不得改變。但這只是理想,真實的規則是:至少要做到當有其他資料結構(註:例如Dictionary<T, T>,Hashtable)依賴物件的Hash Code運作時,GetHashCode()的傳回結果絕不可變動。

想像一下,若物件被放在雜湊資料結構,GetHashCode()結果卻發生改變,很明顯Contains()查詢就會壞掉。物件放進去時依Hash Code放進位置#5,修改物件Hash Code變成47,Contains()該物件時去找第#47位置,啥都沒有。

除此之外,許多LINQ運算也依賴GetHashCode()運行,一旦允許它變來變去,產生的靈異現象足以讓你鬼打牆到想改行。

洗心革面改寫程式,將Symbol及Market屬性改為唯讀,另外宣告修改readonly版欄位symbol及market,透過建構式給值,GetHashCode則改由兩個readonly欄位取值,如此才能杜絕Symbol/Market事後被修改GetHashCode()結果異動的風險:

public class Ticker
{
    readonly string symbol;
    readonly string market;
 
    public string Symbol { get { return symbol; } }
    public string Market { get { return market; } }
 
    public Ticker(string symbol, string market)
    {
        this.symbol = symbol;
        this.market = market;
    }
 
    public Ticker(string fullsymbol)
    {
        var p = fullsymbol.Split('.');
        if (p.Length != 2) throw new ArgumentException();
        this.symbol = p[0];
        this.market = p[1];
    }
 
    //...餘略...        
 
    public override int GetHashCode()
    {
        return symbol.GetHashCode() ^ market.GetHashCode();
    }
 
}

大家在自訂GetHashCode()時,請留意此一原則。


Comments

# by 路人甲

感謝黑大分享,但若是該物件需要用於序列化/反序列化的動作時,就必須提供一個無參數的建構式,但這樣的話,物件當中的屬性便無法給予值了(以此為例,symbol跟market必為null)。

# by Jeffrey

to 路人甲,同意。若所使用的序列化/反序列化機制要求一定要有預設建構式(無參數),此法就行不通了。面對此種狀況,我會考慮換個比較有彈性的序列化程式庫,例如:Json.NET http://blog.darkthread.net/post-2016-08-10-json-net-constructor-issue.aspx

Post a comment