Whisper Mic 微服務 - 為程式加上麥克風語音轉文字功能

2024-12-01 11:15 PM

這些年拜深度學習之賜，語音轉文字(STT)及文字轉語音(TTS)技術突飛猛進，電腦開始聽得懂南腔北調，合成語音幾可亂真。如今語音轉文字的服務多如牛毛，語音功能都快成為軟體標配，彷彿程式聽不懂人話都不好意思收錢了。(不要瞎掰好嗎？)

那... 如果是自己寫的程式呢？我們能不能也加上語音輸入功能？最好還是不用花錢的。

之前玩過 OpenAI Whisper，效果讓人驚豔，小模型(純英語版 224MB，多國語言 472MB)配上 3050 等級顯卡，識別中文的速度及準確性好到出奇。而 Whisper 採 MIT 開源授權，能在本機離線執行，硬體門檻不高(平民版 RTX-3050 便能順跑，小模型只需要約 2G VRAM)，說它是目前 AI 語音識別模型的王者也不為過。

之前我只玩過 MP3 轉逐字檔，這回則要直接由麥克風輸入，找到一個開源專案 - Whisper Mic，巧妙地整合 Whiper，可直接將麥克風收音資料傳到 Whisper 即時轉文字(示範影片)，有巨人的肩膀可站，做不出來只怕無顏見江東父老了。

whisper_mic.exe 是個執行檔，但它也可以從程式呼叫，要整進自己的程式不難：

from whisper_mic import WhisperMic

mic = WhisperMic()
result = mic.listen()
print(result)

不過，Whisper 依賴 PyTorch 適用 Python 環境，我大部分的程式是用 .NET C# 開發，不太可能為了語音功能用 Python 改寫。但我的想法很簡單，那就加幾行把它包成微服務，走 Web 介面用 Server Sent Event 串流方式輸出識別文字。

Wishper Mic 有個 .listen_continuously() 方法，配合 for in 迴圈可持續接收語音識別結果；而我依據過去 ASP.NET 玩訊息廣播的經驗，選擇另起一條 Thread 跑迴圈，將結果字串塞進 Queue 裡。Web 方法則用 while 迴圈檢查 Queue 是否有新內容，有則輸出到 MIME Type 為 text/event-stream 的串流 Response。

實測發現一個小麻煩 - .listen_continuously() 無法用 Ctrl-C 隨意中斷。我對 Python 了解不夠暫時射不了茶包，選擇採用鋸箭解法是設計一個"結束口令" - 「打完收工」，若偵測到使用者嚷著要收工，主動中斷迴圈。

這個簡單的土砲微服務 PoC 如下：

from whisper_mic import WhisperMic
from flask import Flask, Response, request
import queue
import threading

mic = WhisperMic(model='small', pause=0.5)
app = Flask(__name__)
text_queue = queue.Queue()

stop_flag = False

def listen_continuously():
    for text in mic.listen_continuously():
        if text is not None:
            if text.lower() == '打完收工':
                stop_flag = True
                print("停止識別，請按 Ctrl-C 結束")
                text_queue.put('[EOS]')
                break
            text_queue.put(text)
            print(text)

@app.route('/listen')
def stream():
    if stop_flag is True:
        return Response("service stopped", status=503)
    def generate():
        while stop_flag is False:
            text = text_queue.get()
            if text == '[EOS]':
                break
            yield f"data: {text}\n\n"
    return Response(generate(), mimetype='text/event-stream')

if __name__ == '__main__':
    listener_thread = threading.Thread(target=listen_continuously)
    listener_thread.start()
    app.run()

由於 Server Sent Event 是 HTTP 標準規格，用 curl.exe http://localhost:5000/listen 就能測試，只要 curl.exe 能成功測通，理論上不管什麼程式語言或平台都要能串接，整合上不會有什麼障礙。

最後，來看看我的土砲 Whisper 即時語音轉文字微服務實測結果如何，在一般規格的筆電 (RTX-3050 4G VRAM) 跑起來還算順，雖稱不上行雲流水，但堪用了。

Whisper Mic 語音識別實測

出錯的兩個小地方是 CUDA 被聽成「枯打」、小模型講 Small Model 有被正確解讀，講 Small 模型被聽成「Smo模型」，這些需要很深度的前後文參照才可能被正確解讀，我認為不能算責任分失。

【CUDA 啟用補充】

我的做法是先 python -m venv .venv、.venv\scripts\activate.bat 切換到虛擬環境，依據官方說明輸入 pip install whisper_mic 下載安裝 Whisper Mic，接著用 whisper_mic --model small 就能測試了。但這種方式不會啟用 CUDA，用 CPU 跑速度大約慢了四五倍。後來研究了一下，用 python -c "import torch; print(torch.cuda.is_available())" 可以檢測有沒有啟用 CUDA，若執行結果為 False，可用 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 重裝 CUDA 版本，先確認 torch.cuda.is_available() 已傳回 True，之後重跑 Whisper Mic 應可感受速度明顯變快，從工作管理員檢視顯卡專屬 GPU 記憶體(VRAM)的使用量應該也會上升。(檢視方法可參考 Ollama 文)

Comments

# 2024-12-16 10:25 PM by python路過吾好錯過

from whisper_mic import WhisperMic from flask import Flask, Response, request import queue import threading mic = WhisperMic(model='small', pause=0.5) app = Flask(__name__) text_queue = queue.Queue() stop_flag = False def listen_continuously(): global stop_flag # Declare stop_flag as global to modify it inside the function for text in mic.listen_continuously(): if text is not None: if text.lower() == '打完收工': stop_flag = True print("停止識別，請按 Ctrl-C 結束") text_queue.put('[EOS]') break text_queue.put(text) print(text) @app.route('/listen') def stream(): if stop_flag is True: return Response("service stopped", status=503) def generate(): while stop_flag is False: text = text_queue.get() if text == '[EOS]': break yield f"data: {text}\n\n" return Response(generate(), mimetype='text/event-stream') if __name__ == '__main__': listener_thread = threading.Thread(target=listen_continuously) listener_thread.start() app.run(debug=True, threaded=True)

# 2024-12-17 12:04 PM by Jeffrey

to python路過吾好錯過，感謝留言分享，但建議加些說明所附程式碼的特色與值得學習之處，有助於吸引讀者深入了解，才不會錯過你想分享的 Python 美妙之處。

Comments

# 2024-12-16 10:25 PM by python路過吾好錯過

# 2024-12-17 12:04 PM by Jeffrey

Post a comment