欄位合併效能比較:CTE vs FOR XML
7 |
前一篇文章介紹用CTE實現SQL一對多關聯欄位合併的方法,找資料期間曾發現另一個替代做法,利用SQL Server的FOR XML PATH('')技巧,可將多筆資料轉成單一字串(參考),看來也相當簡潔。但當下覺得子查詢加FOR XML轉換的做法有效能疑慮,理應不如CTE(事實不然!),加上搞CTE比較有挑戰性,故選了CTE研究嚐鮮。
網友ChoeChin留言提到FOR XML寫法,讓我重新思考:如果要大量重度應用,效能議題不可忽視,CTE效能是否真的比FOR XML方法好?有待實驗證實。經一番測試後,結論是我錯估了CTE與FOR XML兩種做法的效能表現。
原本的測試樣本太小,很難看出效能差異。我改用以下指令生出10萬筆糸統,並將AppSystem的Nam欄位與AppSupport的AppName, Engineer欄位設成Primary Key確保查詢效能。
DECLARE @I INT, @NO VARCHAR(6);
SET @I = 0;
TRUNCATE TABLE AppSystem;
TRUNCATE TABLE AppSupport;
WHILE @I < 25000
BEGIN
SET @NO = CONVERT(VARCHAR(6), @I);
INSERT INTO AppSystem VALUES ('HR-' + @NO, 'Mars-' + @NO);
INSERT INTO AppSystem VALUES ('ERP-' + @NO, 'Jupiter-' + @NO);
INSERT INTO AppSystem VALUES ('POS-' + @NO, 'Venus-' + @NO);
INSERT INTO AppSystem VALUES ('MAIL-' + @NO, 'Apolo-' + @NO);
INSERT INTO AppSupport VALUES ('HR-' + @NO, 'Jeffrey-' + @NO);
INSERT INTO AppSupport VALUES ('ERP-' + @NO, 'Jeffrey-' + @NO);
INSERT INTO AppSupport VALUES ('POS-' + @NO, 'Jeffrey-' + @NO);
INSERT INTO AppSupport VALUES ('ERP-' + @NO, 'Darkthread-' + @NO);
INSERT INTO AppSupport VALUES ('POS-' + @NO, 'Darkthread-' + @NO);
INSERT INTO AppSupport VALUES ('HR-' + @NO, 'Alice-' + @NO);
INSERT INTO AppSupport VALUES ('ERP-' + @NO, 'Bob-' + @NO);
INSERT INTO AppSupport VALUES ('MAIL-' + @NO, 'Jeffrey-' + @NO);
SET @I = @I + 1;
END
我設計了四種查詢方式:
- CTE寫法的最終改良版,採用路人乙建議的ROW_NUMBER()法挑出最終合併結果
- 我最早的CTE寫法,用JOIN找出最終合併結果
- 標準FOR XML PATH做法
- 將重複出現的FOR XML PATH子查詢轉成CTE,看效能是否因此提升
;WITH SupportCTE (RowNum, Name, CodeName, Engineer,Engineers) AS
(
SELECT 1, A.Name, A.CodeName, MIN(B.Engineer),
CAST(MIN(B.Engineer) AS VARCHAR(MAX)) AS Engineers
FROM AppSystem A JOIN AppSupport B ON A.Name = B.AppName
GROUP BY A.Name, A.CodeName
UNION ALL
SELECT A.RowNum + 1, A.Name, A.CodeName, B.Engineer,
CAST(A.Engineers + ', ' + B.Engineer AS VARCHAR(MAX)) As Supports
FROM SupportCTE A JOIN AppSupport B
ON A.Name = B.AppName AND B.Engineer > A.Engineer
)
--Trail 1
SELECT Name,CodeName,Engineers
FROM (
SELECT *,ROW_NUMBER() OVER (PARTITION BY Name ORDER BY RowNum Desc) AS Pos
FROM SupportCTE
) T
WHERE Pos = 1
ORDER BY Name
--Trial 2
--SELECT A.Name, A.CodeName, A.Engineers FROM SupportCTE A
--JOIN (SELECT Name, Max(RowNum) AS MaxRowNum FROM SupportCTE GROUP BY Name) B
--ON A.Name = B.Name AND A.RowNum = B.MaxRowNum
--ORDER BY A.Name
GO
--Trail 3
SELECT DISTINCT d.Name ,d.CodeName,
(
SELECT c.Engineer +','FROM
(
SELECT a.Name, a.CodeName, b.Engineer
FROM AppSystem a LEFT JOIN AppSupport b ON a.Name = b.AppName
) c
WHERE c.Name = d.Name FOR XML PATH('')
) AS Engineer
FROM
(
SELECT a.Name, a.CodeName, b.Engineer
FROM AppSystem a LEFT JOIN AppSupport b ON a.Name = b.AppName
) d
ORDER BY Name
GO
--Trail 4
;WITH SupportCTE (Name, CodeName, Engineer) AS
(
SELECT a.Name, a.CodeName, b.Engineer
FROM AppSystem a LEFT JOIN AppSupport b ON a.Name = b.AppName
)
SELECT DISTINCT d.Name ,d.CodeName,
(
SELECT c.Engineer + ',' FROM SupportCTE c
WHERE c.Name = d.Name FOR XML PATH('')
) AS Engineer
FROM SupportCTE d
ORDER BY Name
執行結果如下。以Total execution time欄位做為比較標的,CTE+ROW_NUMBER()法 2.96秒、CTE+JOIN找結果法7.14秒,而FOR XML PATH法只花了0.59秒,FOR XML PATH+CTE法0.64秒(沒有變快)。
CTE慢了近5倍,顛覆我原本的想像…
由執行計劃推敲效能差異來源。FOR XML PATH法的執行計劃很單純,絕大部分的時間用在JOIN查詢。
CTE做法的執行計劃相對複雜許多,而高達84%消耗在Sort上!由Sort出現位置推測約略發生在用ROW_NUMBER()找最終合併結果時,猜想問題出在CTE結果暫存於記憶體,無法仰賴Index等機制加速,資料量一變大即成瓶頸。
若要以此推論CTE效能不好也有失公允,這次合併欄位的CTE動用了MIN、GROUP BY、ROW_NUMBER(),相對FOR XML PATH笨重複雜許多,而CTE的遞迴邏輯有其強大難以被取代的場合,隨便換個例子就不是FOR XML PATH可以搞定的。只能說,在這個案例用CTE加遞迴有殺雞用牛刀之嫌。如果只是在SQL要將一對多欄位合併成單一字串,FOR XML PATH方法有較好的效能表現。
最後補充一點,前述的FOR XML PATH寫法有個小瑕疵,Engineer欄位結尾會多一個逗號。
這個小問題在C#端可以用TrimEnd(',')輕鬆搞定,若要在T-SQL做掉稍微囉嗦一點,但也不難,例如以下範例。(感謝網友艾里克斯補充STUFF解法)
SELECT Name, CodeName,
CASE WHEN LEN(Engineer) > 1 THEN SUBSTRING(Engineer, 1, LEN(Engineer)-1) ELSE Engineer END AS Engineer
FROM (
SELECT DISTINCT d.Name ,d.CodeName,
STUFF(
(
SELECT ',' + c.Engineer FROM
(
SELECT a.Name, a.CodeName, b.Engineer
FROM AppSystem a LEFT JOIN AppSupport b ON a.Name = b.AppName
) c
WHERE c.Name = d.Name FOR XML PATH('')
), 1, 1, ''
) AS Engineer
FROM
(
SELECT a.Name, a.CodeName, b.Engineer
FROM AppSystem a LEFT JOIN AppSupport b ON a.Name = b.AppName
) d
) X
ORDER BY Name
Comments
# by steve
事實上前一篇文章出來後小弟馬上就參考著在Oracle上用同樣的方式解決了一個先前用start with connect by解決不了的case. 小弟的淺見是, start with connect by(oracle)以及for xml path(ms-sql) 適用於where條件能篩選出所有資料的狀況,篩選出來後再根據條件串接成需要的階層,這兩個指令很方便 但如果碰到沒有適當的where條件,篩選出來資料仍太多,要靠遞迴一層一層去串出真正需要的階層,就非得要用CTE了
# by Oaww
https://www.mssqltips.com/sqlservertip/2022/concat-aggregates-sql-server-clr-function/ 其實我覺得最方便是用CLR的function(死)
# by 艾里克斯
STUFF( SELECT ','+c.Engineer FROM ( SELECT a.Name, a.CodeName, b.Engineer FROM AppSystem a LEFT JOIN AppSupport b ON a.Name = b.AppName ) c WHERE c.Name = d.Name FOR XML PATH('') ),1,1,'') AS Engineer 就可以 TRIM 掉多餘的逗號囉 :)
# by Jeffrey
to Oaww,原本很愛SQLCLR,體驗過一回上線流程後就默默把它從選項畫掉了,呵。 to 艾里克斯,好簡潔,讚!已加入本文,謝謝回饋。
# by 艾里克斯
Hi, 黑大, STUFF 是用來抽換指定字元, 所以加做逗號相加時逗號要放在前面 ==> ','+c.Engineer, 後面設定的 1,1,'') 才能換掉第一個逗號 :)
# by Jeffrey
to 艾里克斯,啊,實測成功抄寫過去時漏改了,謝謝提醒。
# by Hardy
我的現況是FOR XML PATH跟CTE都很慢 現在還在思考要如何是好