Coding4Fun - .NET 串接 Whisper 麥克風語音識別整合 ChatGPT 即時翻譯
1 |
昨天提到將 Whisper Mic 包成微服務,走 Server-Sent Events (SSE) 串流提供即時語音識別結果,SSE 是 HTTP 標準,理論上不管你用什麼程式語言或平台,都能輕鬆串接。
剛好有讀者問到語音識別加即時翻譯的問題,算是經典到不能再經典的應用。那還等什麼,就來寫幾行程式實測看看可行性。
程式語言當然選我最擅長的 C#,呼叫 ChatGPT API 部分可參考先前文章,範例中的 GptChatService 類別,完全不用修改可以直上。
要用 HttpClient 讀取串流回應,GetAsync() 要傳入 HttpCompletionOption.ResponseHeadersRead,指定不要等待全部 Response 傳完,讀完 Header 就返回。接著從 response.Content.ReadAsStreamAsync() 取得 Stream 物件,後續則完全比照 FileStream、NetworkStream 讀資料的操作,沒什麼大學問。
因此,若想即時翻譯語音識別結果,while 迴圈跑 StreamReader.ReadLine() 逐句接收 Whisper Mic 傳回的語音識別文字,呼叫 ChatGPT API 將其翻成英文,就這麼簡單。程式範例如下:
using System.Text;
using Microsoft.Extensions.Configuration;
Console.OutputEncoding = Encoding.UTF8;
Console.InputEncoding = Encoding.UTF8;
var config = new ConfigurationBuilder()
.SetBasePath(Directory.GetCurrentDirectory())
.AddJsonFile("appsettings.json", optional: false, reloadOnChange: true)
.AddEnvironmentVariables()
.Build();
Func<string, string> readConfig = (key) => config[key] ?? throw new ArgumentNullException(key);
var chatSvc = new GptChatService(readConfig("OpenAiUrl"), readConfig("OpenAiKey"), readConfig("OpenAiDepName"));
using var httpClient = new HttpClient();
using var response = await httpClient.GetAsync("http://localhost:5000/listen", HttpCompletionOption.ResponseHeadersRead);
response.EnsureSuccessStatusCode();
Console.WriteLine("開始接收語音辨識結果進行即時翻譯...");
using var stream = await response.Content.ReadAsStreamAsync();
using var reader = new System.IO.StreamReader(stream);
var sysPrompt = "你是一個即時口語翻譯,負責將語音識別回傳的中文內容翻成英文";
while (true)
{
var line = await reader.ReadLineAsync();
if (line != null && line.StartsWith("data: "))
{
var text = line.Substring("data: ".Length);
// TODO: 可累積一定長度再送出,前後文參照愈完整翻譯準確度愈好,代價是會增加延遲
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine($"原文:{text}");
var answer = await chatSvc.Complete(sysPrompt, text, true);
Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine($"翻譯:{answer}");
Console.ResetColor();
}
}
實測結果一切如同預期,語音轉成的中文順利翻成英文,除非洞察說話者背景或完整語境才能正確識別的專業詞彙,如 .NET,不意外地猜錯了,而翻譯出來英文則維持 ChatGPT 一向的水準,沒太多可以挑剔之處。
打完收工。
This article demonstrates combining Whisper Mic for speech recognition and ChatGPT API for real-time translation into English using C#. The process involves streaming recognized speech via SSE and translating it with ChatGPT. The provided C# code illustrates how to handle streaming responses and call the translation API. The practical implementation confirms successful translation with minor errors in technical terms.
Comments
# by python路過吾好錯過
import requests import json # Read configuration file with open('config.json', 'r') as config_file: config = json.load(config_file) def read_config(key): value = config.get(key) if value is None: raise ValueError(f"Configuration key '{key}' not found") return value # Initialize GptChatService parameters open_ai_url = read_config("OpenAiUrl") open_ai_key = read_config("OpenAiKey") open_ai_dep_name = read_config("OpenAiDepName") # Assuming GptChatService is a class, here we create a simple mock class class GptChatService: def __init__(self, url, key, dep_name): self.url = url self.key = key self.dep_name = dep_name async def complete(self, system_prompt, text, stream=False): # This is just a mock implementation; actual usage would require calling the specific API return f"Translated: {text}" chat_svc = GptChatService(open_ai_url, open_ai_key, open_ai_dep_name) # Send GET request to the specified URL, streaming the response headers response = requests.get("http://localhost:5000/listen", stream=True) # Ensure HTTP response status code is successful response.raise_for_status() # Output prompt indicating that voice recognition results will be received for real-time translation print("Starting to receive voice recognition results for real-time translation...") # Set system prompt for translation context sys_prompt = "You are a real-time oral translator responsible for translating the Chinese content returned by speech recognition into English" # Infinite loop to continuously receive voice recognition results for line in response.iter_lines(): # Check if the line is a valid data line if line and line.startswith(b'data: '): # Extract valid data part text = line[len(b'data: '):].decode('utf-8') # Set console color to yellow to display original text print("\033[93m" + f"Original: {text}" + "\033[0m") # Call chat_svc to perform translation and get result answer = await chat_svc.complete(sys_prompt, text, True) # Set console color to cyan to display translated text print("\033[96m" + f"Translation: {answer}" + "\033[0m")