前一篇文章介紹用CTE實現SQL一對多關聯欄位合併的方法,找資料期間曾發現另一個替代做法,利用SQL Server的FOR XML PATH('')技巧,可將多筆資料轉成單一字串(參考),看來也相當簡潔。但當下覺得子查詢加FOR XML轉換的做法有效能疑慮,理應不如CTE(事實不然!),加上搞CTE比較有挑戰性,故選了CTE研究嚐鮮。

網友ChoeChin留言提到FOR XML寫法,讓我重新思考:如果要大量重度應用,效能議題不可忽視,CTE效能是否真的比FOR XML方法好?有待實驗證實。經一番測試後,結論是我錯估了CTE與FOR XML兩種做法的效能表現。

原本的測試樣本太小,很難看出效能差異。我改用以下指令生出10萬筆糸統,並將AppSystem的Nam欄位與AppSupport的AppName, Engineer欄位設成Primary Key確保查詢效能。

DECLARE @I INT, @NO VARCHAR(6);
SET @I = 0;
TRUNCATE TABLE AppSystem; 
TRUNCATE TABLE AppSupport;
WHILE @I < 25000
BEGIN
    SET @NO = CONVERT(VARCHAR(6), @I);
    INSERT INTO AppSystem VALUES ('HR-' + @NO, 'Mars-' + @NO);
    INSERT INTO AppSystem VALUES ('ERP-' + @NO, 'Jupiter-' + @NO);
    INSERT INTO AppSystem VALUES ('POS-' + @NO, 'Venus-' + @NO);
    INSERT INTO AppSystem VALUES ('MAIL-' + @NO, 'Apolo-' + @NO);
    INSERT INTO AppSupport VALUES ('HR-' + @NO, 'Jeffrey-' + @NO);
    INSERT INTO AppSupport VALUES ('ERP-' + @NO, 'Jeffrey-' + @NO);
    INSERT INTO AppSupport VALUES ('POS-' + @NO, 'Jeffrey-' + @NO);
    INSERT INTO AppSupport VALUES ('ERP-' + @NO, 'Darkthread-' + @NO);
    INSERT INTO AppSupport VALUES ('POS-' + @NO, 'Darkthread-' + @NO);
    INSERT INTO AppSupport VALUES ('HR-' + @NO, 'Alice-' + @NO);
    INSERT INTO AppSupport VALUES ('ERP-' + @NO, 'Bob-' + @NO);
    INSERT INTO AppSupport VALUES ('MAIL-' + @NO, 'Jeffrey-' + @NO);
    SET @I = @I + 1;
END

我設計了四種查詢方式:

  1. CTE寫法的最終改良版,採用路人乙建議的ROW_NUMBER()法挑出最終合併結果
  2. 我最早的CTE寫法,用JOIN找出最終合併結果
  3. 標準FOR XML PATH做法
  4. 將重複出現的FOR XML PATH子查詢轉成CTE,看效能是否因此提升
;WITH SupportCTE (RowNum, Name, CodeName, Engineer,Engineers) AS
(
    SELECT 1, A.Name, A.CodeName, MIN(B.Engineer), 
           CAST(MIN(B.Engineer) AS VARCHAR(MAX)) AS Engineers
    FROM AppSystem A JOIN AppSupport B ON A.Name = B.AppName
    GROUP BY A.Name, A.CodeName
 
    UNION ALL
 
    SELECT A.RowNum + 1, A.Name, A.CodeName, B.Engineer, 
           CAST(A.Engineers + ', ' + B.Engineer AS VARCHAR(MAX)) As Supports
    FROM SupportCTE A JOIN AppSupport B 
    ON A.Name = B.AppName AND B.Engineer > A.Engineer 
)
 
--Trail 1
SELECT Name,CodeName,Engineers
FROM (
  SELECT *,ROW_NUMBER() OVER (PARTITION BY Name ORDER BY RowNum Desc) AS Pos
  FROM SupportCTE
) T
WHERE Pos = 1 
ORDER BY Name
 
--Trial 2
--SELECT A.Name, A.CodeName, A.Engineers FROM SupportCTE A
--JOIN (SELECT Name, Max(RowNum) AS MaxRowNum FROM SupportCTE GROUP BY Name) B
--ON A.Name = B.Name AND A.RowNum = B.MaxRowNum
--ORDER BY A.Name
 
GO
 
--Trail 3
SELECT DISTINCT d.Name ,d.CodeName,
(
 SELECT c.Engineer +','FROM 
 (
    SELECT a.Name, a.CodeName, b.Engineer 
    FROM AppSystem a LEFT JOIN AppSupport b ON a.Name = b.AppName
  ) c
  WHERE c.Name = d.Name FOR XML PATH('')
) AS Engineer
FROM 
(
  SELECT a.Name, a.CodeName, b.Engineer 
  FROM AppSystem a LEFT JOIN AppSupport b ON a.Name = b.AppName
) d
ORDER BY Name
GO
 
--Trail 4
;WITH SupportCTE (Name, CodeName, Engineer) AS
(
  SELECT a.Name, a.CodeName, b.Engineer 
  FROM AppSystem a LEFT JOIN AppSupport b ON a.Name = b.AppName
)
SELECT DISTINCT d.Name ,d.CodeName,
(
  SELECT c.Engineer + ',' FROM SupportCTE c
  WHERE c.Name = d.Name FOR XML PATH('')
) AS Engineer
FROM SupportCTE d
ORDER BY Name

執行結果如下。以Total execution time欄位做為比較標的,CTE+ROW_NUMBER()法 2.96秒、CTE+JOIN找結果法7.14秒,而FOR XML PATH法只花了0.59秒,FOR XML PATH+CTE法0.64秒(沒有變快)。

CTE慢了近5倍,顛覆我原本的想像…

由執行計劃推敲效能差異來源。FOR XML PATH法的執行計劃很單純,絕大部分的時間用在JOIN查詢。

CTE做法的執行計劃相對複雜許多,而高達84%消耗在Sort上!由Sort出現位置推測約略發生在用ROW_NUMBER()找最終合併結果時,猜想問題出在CTE結果暫存於記憶體,無法仰賴Index等機制加速,資料量一變大即成瓶頸。

若要以此推論CTE效能不好也有失公允,這次合併欄位的CTE動用了MIN、GROUP BY、ROW_NUMBER(),相對FOR XML PATH笨重複雜許多,而CTE的遞迴邏輯有其強大難以被取代的場合,隨便換個例子就不是FOR XML PATH可以搞定的。只能說,在這個案例用CTE加遞迴有殺雞用牛刀之嫌。如果只是在SQL要將一對多欄位合併成單一字串,FOR XML PATH方法有較好的效能表現。

最後補充一點,前述的FOR XML PATH寫法有個小瑕疵,Engineer欄位結尾會多一個逗號。

這個小問題在C#端可以用TrimEnd(',')輕鬆搞定,若要在T-SQL做掉稍微囉嗦一點,但也不難,例如以下範例。(感謝網友艾里克斯補充STUFF解法)

SELECT Name, CodeName, 
CASE WHEN LEN(Engineer) > 1 THEN SUBSTRING(Engineer, 1, LEN(Engineer)-1) ELSE Engineer END AS Engineer
FROM (
    SELECT DISTINCT d.Name ,d.CodeName,
    STUFF(
       (
         SELECT ',' + c.Engineer FROM 
         (
            SELECT a.Name, a.CodeName, b.Engineer 
            FROM AppSystem a LEFT JOIN AppSupport b ON a.Name = b.AppName
          ) c
          WHERE c.Name = d.Name FOR XML PATH('')
        ), 1, 1, ''
    ) AS Engineer
    FROM 
    (
      SELECT a.Name, a.CodeName, b.Engineer 
      FROM AppSystem a LEFT JOIN AppSupport b ON a.Name = b.AppName
    ) d
) X
ORDER BY Name

Comments

# by steve

事實上前一篇文章出來後小弟馬上就參考著在Oracle上用同樣的方式解決了一個先前用start with connect by解決不了的case. 小弟的淺見是, start with connect by(oracle)以及for xml path(ms-sql) 適用於where條件能篩選出所有資料的狀況,篩選出來後再根據條件串接成需要的階層,這兩個指令很方便 但如果碰到沒有適當的where條件,篩選出來資料仍太多,要靠遞迴一層一層去串出真正需要的階層,就非得要用CTE了

# by Oaww

https://www.mssqltips.com/sqlservertip/2022/concat-aggregates-sql-server-clr-function/ 其實我覺得最方便是用CLR的function(死)

# by 艾里克斯

STUFF( SELECT ','+c.Engineer FROM ( SELECT a.Name, a.CodeName, b.Engineer FROM AppSystem a LEFT JOIN AppSupport b ON a.Name = b.AppName ) c WHERE c.Name = d.Name FOR XML PATH('') ),1,1,'') AS Engineer 就可以 TRIM 掉多餘的逗號囉 :)

# by Jeffrey

to Oaww,原本很愛SQLCLR,體驗過一回上線流程後就默默把它從選項畫掉了,呵。 to 艾里克斯,好簡潔,讚!已加入本文,謝謝回饋。

# by 艾里克斯

Hi, 黑大, STUFF 是用來抽換指定字元, 所以加做逗號相加時逗號要放在前面 ==> ','+c.Engineer, 後面設定的 1,1,'') 才能換掉第一個逗號 :)

# by Jeffrey

to 艾里克斯,啊,實測成功抄寫過去時漏改了,謝謝提醒。

# by Hardy

我的現況是FOR XML PATH跟CTE都很慢 現在還在思考要如何是好

Post a comment