CODE-縮短版GUID字串

2013-01-29 10:44 PM

頗特殊的需求: 一個跨平台整合在傳遞以GUID為Primary Key資料時，對方的參數欄位只接受最長30個字元，即使使用16進位數字表示法(例如: 4854c292c333480890f916d1a062b8e3)，GUID字串也長達32字元，超出限制。另外想一種不會重複的識別編號法則是種解法，但要做到GUID等級的唯一性得付出不少代價。因此，另一個思考方向是如何用較短的字串長度表示GUID，評估是較省力的做法。

要比16進位表示法更簡短，最簡便的做法是將其GUID先轉為byte[]，再用Base64編碼轉為字串。例如: 4854c292c333480890f916d1a062b8e3 可以轉換成 ksJUSDPDCEiQ+RbRoGK44w==，只要24個字元。但有個問題，對方系統的資料庫定序(Collation)被設為不分大小寫，而Base64編碼需大小寫有別，唯一限制及查詢比對可能出問題。

原本想拿出自己在VB6時代發明的36進位表示去(使用A-Z及0-9共36個英數元表示二進位資料)，但轉念一想，如果有標準可循，還是別自己造輪子好。查詢後，真的有所謂的Base32編碼，而且還是RFC標準(RFC 4648)! Base32只用A-Z及2-7共32個字元表示，編碼結果比Base64長約20%，但好處是:

適用於不分大小寫的情境
數字段避開0, 1, 8，可避免與字母O, I, B混淆
使用於URL時完全不需要UrlEncoding

只是Base32的應用不若Base64普遍，在.NET基本類別庫無現成可用的函數編解碼。所幸，既是公開標準，就不難找到前輩先進寫好的範例。我在stackoverflow找到一個實作範例，但測試發現它在預估字串長度時計算有誤差，會有多餘的Padding字元"="，但這只需要小小調整就可改善: [已留言建議]

原本是 charCount = (int)Math.Ceiling(input.Length / 5d) * 8;

應改為 charCount = (int)Math.Ceiling(input.Length / 5d * 8);

以下是測試範例，建立10,000個GUID進行編碼及還原，逐筆驗證還原結果無誤並計算執行時間。(同時支援Base64、Base32兩種格式)

    class Program

        static void Main(string[] args)

            List<Guid> guidPool = new List<Guid>();

            for (int i = 0; i < 10000; i++)

                guidPool.Add(Guid.NewGuid());

            string str = null;

            Guid restored = Guid.Empty;

            Stopwatch sw = new Stopwatch();

            //Base64 version test

            sw.Start();

            foreach (Guid uid in guidPool)

                str = GetShortGuidString(uid);

                restored = ParseShortGuidString(str);

                if (!uid.Equals(restored))

                    throw new ApplicationException("Test Failed!");

            sw.Stop();

            Console.WriteLine("Base64 Version in {0:N}ms,\n  {1:N} -> {2}",

                sw.ElapsedMilliseconds, restored, str);

            //Base32 version test

            sw.Restart();

            foreach (Guid uid in guidPool)

                str = GetShortGuidString(uid, true);

                restored = ParseShortGuidString(str, true);

                if (!uid.Equals(restored))

                    throw new ApplicationException("Test Failed!");

            sw.Stop();

            Console.WriteLine("Base32 Version in {0:N}ms,\n  {1:N} -> {2}",

                sw.ElapsedMilliseconds, restored, str);

            Console.Read();

        static string GetShortGuidString(Guid uid, bool useBase32 = false)

            if (useBase32)

                return Base32Encoding.ToString(uid.ToByteArray());

            else

                return Convert.ToBase64String(uid.ToByteArray());

        static Guid ParseShortGuidString(string s, bool useBase32 = false)

            if (useBase32)

                return new Guid(Base32Encoding.ToBytes(s));

            else

                return new Guid(Convert.FromBase64String(s));

測試結果如下:

Base64 Version in 13.00ms,
4854c292c333480890f916d1a062b8e3 -> ksJUSDPDCEiQ+RbRoGK44w==
Base32 Version in 38.00ms,
4854c292c333480890f916d1a062b8e3 –> SLBFISBTYMEEREHZC3I2AYVY4M

推測目前找到的Base32編碼邏輯，最佳化程度不及.NET內建的Convert.ToBase64String，故Base32執行速度較Base64慢3倍，但1萬筆只需0.04秒，在整合應用時成為瓶頸的可能性不高，測試驗證可行。

Comments

# 2013-01-29 10:50 PM by Robin

感謝分享~~

# 2013-03-03 10:19 PM by 豬公峰

另可考慮用Alphanumerics, 就是把16進位從F擴充到Z 就有36進位(不分大小寫)資料長度比Base32短分大小寫的話是62進位資料, 比Base64大一點, 但好處是都是文數字

# 2016-09-19 02:13 AM by Aqua

我觉得 charCount = (int)Math.Ceiling(input.Length / 5d) * 8 是对的吧，需要补齐到8个字符的倍数。否则，例如输入6个字节，按照 charCount = (int)Math.Ceiling(input.Length / 5d * 8) 计算，输出9个字符就没法编码了。

# 2016-09-19 05:25 PM by Jeffrey

to Aqua, 依我的理解補=與否並不影響解析，用以下程式實測，(int)Math.Ceiling(input.Length / 5d * 8)可正確編碼及還原長度1到長度16的byte[]: for (var i = 1; i < 17; i++) { byte[] raw = new byte[i]; for (var j = 0; j < i; j++) raw[j] = (byte)j; var enc = Base32Encoding.ToString(raw); Console.WriteLine(enc); var res = Base32Encoding.ToBytes(enc); Console.WriteLine(BitConverter.ToString(res)); }

# 2016-09-26 09:18 PM by Kevin

把32長度壓縮到30長度，似乎增加了碰撞的風險，這一點有考慮嗎?

# 2016-09-26 11:42 PM by Jeffrey

to Kevin, 文章所提的應用情境是將byte[16]轉為較短，不分大小寫且不需UrlEncode的純文字表示法，與接收端約定好編碼格式可100%還原，應不存在碰撞問題，或者可否請你再補充可能發生碰撞的情境？