利用LINQ GroupBy快速分組歸類
14 |
分享最近學到的LINQ小技巧一則。有時我們會需求將資料物件分組擺放,方便後續查詢處理,例如:將散亂的銷售資料依客戶分群,同一客戶的所有資料變成一個List<T>。
過去面對這種問題,我慣用的做法先定義一個Dictionary<string, List<T>>,使用 foreach 逐筆抓取來源資料,從中取出鍵值(例如:客戶編號),先檢查鍵值是否已存在於Dictionary,若無則新増一筆並建立空的List<T>,確保Dictionary有該鍵值專屬List<T>,將資料放入List<T>。執行完畢得到以鍵值分類的List<T>,再進行後續處理。
foreach + Dictionary寫法用了好幾年,前幾天才忽然想到,這不就是SQL語法中的GROUP BY嗎?加上LINQ有ToDictionary, GroupBy(o => o.客戶編號).ToDictionary(o => o.Key, o => o.ToList()) 一行就搞定了呀!阿呆。
來個應景的程式範例吧!
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace LinqTip
{
class Program
{
public enum Teams
{
Valor, Mystic, Instinct, Dark
}
public class Trainer
{
public Teams Team;
public string Name;
public Trainer(Teams team, string name)
{
Team = team; Name = name;
}
}
static void Main(string[] args)
{
//來源資料如下
List<Trainer> trainers = new List<Trainer>()
{
new Trainer(Teams.Valor, "Candela"),
new Trainer(Teams.Valor, "Bob"),
new Trainer(Teams.Mystic, "Blanche"),
new Trainer(Teams.Valor, "Alice"),
new Trainer(Teams.Instinct, "Spark"),
new Trainer(Teams.Mystic, "Tom"),
new Trainer(Teams.Dark, "Jeffrey")
};
//目標:以Team分類,將同隊的訓練師集合成List<Trainer>,
//最終產出Dictionary<Teams, List<Trainer>>
//以前的寫法,跑迴圈加邏輯比對
var res1 = new Dictionary<Teams, List<Trainer>>();
foreach (var t in trainers)
{
if (!res1.ContainsKey(t.Team))
res1.Add(t.Team, new List<Trainer>());
res1[t.Team].Add(t);
}
//新寫法,使用LINQ GroupBy
var res2 =
trainers.GroupBy(o => o.Team)
.ToDictionary(o => o.Key, o => o.ToList());
}
}
}
就醬,又學會一招~
不過,GroupBy().ToDictionary() 做法適用分類現有資料,若之後要陸續接收新增資料,仍可回歸 foreach + Dictionary<string, List<T>> 寫法。
[2016-08-24補充] 感謝Phoenix補充,LINQ還有更簡潔的做法:ToLookup(o > o.Teams, o => o),其產出的型別為ILookup,以Key分組的Value集合,與Dictionary最大的差異是ILookup屬唯讀性質,事後不能變更或修改集合項目。
Comments
# by Phoenix
另一種寫法 var res3 = trainers.ToLookup(o => o.Team, o => o);
# by 小安
倒數第二行的 OrderBy().ToDictionary() 是筆誤 原本是GroupBy().ToDictionary() 嗎 ?
# by Jeffrey
to Phoenix, 學習了!感謝補充,已加入本文。 to 小安,是的,我又寫錯了(大概是早上還沒睡飽 Orz),謝謝指正。
# by Holey
第三段開頭是不是筆誤呢 (foreah -> foreach )
# by Jeffrey
to Holey, 西滴,謝謝指正。
# by Jack
如果想要自訂群組可以用類似 string[] StringSet={ 字串集合…… }; var query1=StringSet.GroupBy(s=>s,new StringComparer()); 那如果不想用Lambda運算式 想要用 查詢運算式 var query2=from S in StringSet group S by new StringComparer(); 但這樣跑出的結果很奇怪 請問要如何修改?
# by Jeffrey
to Jack, 較常見的GroupBy應用是每筆資料有多個欄位,依其中某個欄位對資料做分組,不太明白你所說將單純字串陣列做GroupBy的情境,能再提供更具體的範例嗎?
# by Jack
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; namespace Test_String_Group_ { class StringComparer:IEqualityComparer<string> { public bool Equals(string x,string y) { return GetHashCode(x) == GetHashCode(y); } public int GetHashCode(string String) { return String.Length; } } class Program { static void Main(string[] args) { string[] S = { "A12", "A123", "B12", "C12", "B123", "A1234", "B1234", "C123", "C1234" }; var query = from s in S group s by new StringComparer(); /*var query = S.GroupBy(s => s, new StringComparer());*/ foreach(var group in query) { Console.WriteLine(group.Key + " : "); foreach(var item in group) { Console.WriteLine(item); } } Console.Read(); } } } 用Lambda 執行的結果是正確的 但用LINQ執行出來是錯的
# by Jeffrey
to Jack, 依我所知,GroupBy() 允許自訂 IEqualityComparer,group by 後方接的應是比對值而不是比對邏輯物件,所以結果才會跟你預期的不一樣。如果一定要寫成類 SQL 語法,我想到最接近的解法是寫成 var query = from s in S group s by s.Length。
# by Jack
如果一定要用IEqualityComparer自訂群組 且用類似SQL的寫法 要如何寫
# by Jeffrey
to Jack, 我個人的看法是無解,即使有解,其複雜度與成本應會令人卻步。
# by 喵大王
不好意思,想請教問題。 LinqPad程式碼如下: 為什麼我最後的select g,無法改成g.ToDictionary(k => k.Key, v => v.ToList()) void Main() { AuctionItemSuscriber.ItemPrice ip1 = new AuctionItemSuscriber.ItemPrice() { ItemName = "Book1", Price = 1170 }; AuctionItemSuscriber.ItemPrice ip2 = new AuctionItemSuscriber.ItemPrice() { ItemName = "Book2", Price = 3960 }; AuctionItemSuscriber.ItemPrice ip3 = new AuctionItemSuscriber.ItemPrice() { ItemName = "Phone3", Price = 5000 }; AuctionItemSuscriber.ItemPrice ip4 = new AuctionItemSuscriber.ItemPrice() { ItemName = "Book2", Price = 3333 }; AuctionItemSuscriber user1 = new AuctionItemSuscriber() { User = "user1", SuscribeItem = { ip1, ip2, ip3 } }; AuctionItemSuscriber user2 = new AuctionItemSuscriber() { User = "user2", SuscribeItem = { ip4 } }; List<AuctionItemSuscriber> allUsers = new List<AuctionItemSuscriber>(); allUsers.Add (user1); allUsers.Add (user2); var selectQ = from u in allUsers from i in u.SuscribeItem group new { u.User, i.ItemName, i.Price } by i.ItemName into g select g; // 這裡的Select g為什麼無法改成 g.ToDictionary (q => q.Key, q=> q.ToList()) // 結果就是需在下面的迴圈另外修改。 foreach (var s in selectQ.ToDictionary (q => q.Key, q=> q.ToList())) { Console.WriteLine (s); } } public class AuctionItemSuscriber { public class ItemPrice { public string ItemName; public int Price; public override string ToString() { return $"{ItemName} ${Price.ToString ("N0")}"; } } public string User; public List<ItemPrice> SuscribeItem = new List<ItemPrice> (256); }
# by Jeffrey
to 喵大王,ToDictionary() 是 IEnumerable<T> 的擴充方法,我用 Console.WriteLine(selectQ.First().GetType()); 查詢 g 的型別是 System.Linq.Lookup`2+Grouping[System.String,<>f__AnonymousType1`3[System.String,System.String,System.Int32]],屬於 IGrouping<TKey, TElement> ( 參考: https://docs.microsoft.com/zh-tw/dotnet/api/system.linq.igrouping-2?view=netframework-4.8 ),故不適用 ToDictionary()
# by 喵大王
原來如此,感謝Jeffrey 大大佛心的回應,感恩。