Add documentation and coding guidelines for VRCT backend

- Introduced a comprehensive coding rules document outlining naming conventions, module structure, import order, type annotations, error handling, and testing practices. - Created a specification document detailing project goals, target users, and functional/non-functional requirements for the VRCT project. - Added a design document describing the application's architecture, initialization policies, concurrency models, and error handling strategies. - Included a detailed design document specifying major classes, functions, data structures, and exception handling. - Removed outdated mypy configuration and several unused scripts related to documentation verification and cleanup. - Deleted test files for OSC and overlay imports as part of the cleanup process.
2025-10-13 22:55:48 +09:00
parent d4f89a734d
commit fcb1295302
43 changed files with 5829 additions and 2956 deletions
--- a/src-python/docs/仕様書.md
+++ b/src-python/docs/仕様書.md
@@ -0,0 +1,58 @@
+# 仕様書
+
+概要
+- プロジェクト名: VRCT (VR Chat Translator)
+- 目的: マイク入力とスピーカー出力をリアルタイムに文字起こし・翻訳し、VR オーバーレイや OSC/WebSocket 経由で外部に送出するバックエンドロジック。
+- 言語: Python
+
+対象ユーザー
+- VR 環境でリアルタイム翻訳・文字起こしを利用したいエンドユーザー
+- フロントエンド（GUI）や VR クライアント（OSC）と連携するアプリケーション開発者
+
+主要機能（機能要件）
+1. 音声の取り込み・文字起こし
+   - マイク（送信）およびスピーカー（受信）から音声を取得し、ローカル Whisper（faster-whisper）または外部サービスによりテキスト化する。
+   - 音声エネルギー（音量）監視を行い、閾値ベースで検出する。
+
+2. 翻訳
+   - DeepL / DeepL API / 各クラウド翻訳 / ローカル CTranslate2 モデルの複数バックエンドをサポート。
+   - 複数出力言語への一括翻訳、翻訳エンジンのフォールバック（CTranslate2 など）。
+   - 翻訳モデルのダウンロードと管理機能。
+
+3. 表示・通知
+   - OpenVR オーバーレイ（small/large）用の画像生成と更新。
+   - OSC による VR へのメッセージ送信（typing/通知等）。
+   - WebSocket サーバーを介した外部クライアントへの JSON ブロードキャスト。
+
+4. 入出力インターフェース
+   - stdin ラインベースの JSON コマンド受信（mainloop が実装）。
+   - stdout に対して構造化された JSON レスポンスを出力（printResponse/printLog）。
+
+5. 設定・永続化
+   - JSON ベースの設定ファイルを使用（`config.py` による読み書きとデバウンス保存）。
+
+6. ロギングと監視
+   - プロセスログ（process.log）とエラーログ（error.log）をローテーションで管理。
+   - ウォッチドッグ機構で定期的に死活チェック・コールバック。
+
+非機能要件
+- プラットフォーム: 主に Windows（Audio 周りは WASAPI を利用）を想定。クロスプラットフォームでの import 安全性を考慮。
+- 可用性: 外部依存（PyAudio, CUDA, ctranslate2 等）が無い環境でも安全にインポートでき、機能劣化しつつ動作する。
+- パフォーマンス: ローカルモデル利用時は GPU を利用して計算性能を確保。compute type 選択ロジックを実装。
+- セキュリティ: 外部への API キー（DeepL など）は設定で扱い、コード上では平文保持を避ける（設定ファイルに保存）。
+
+運用フロー
+- 起動: stdin でコマンドを受け付ける mainloop を実行。必要な初期化は遅延実行（lazy init）を採用。
+- モデル重ダウンロード: CTranslate2/Whisper 重みは `weights/` 配下にダウンロードし、チェックサム等で整合性確認。
+- 障害時: 例外は utils.errorLogging() でトレースを error.log に出力。重要機能はフォールバック実装。
+
+インターフェース（抜粋）
+- stdin(JSON): {"endpoint": "/set/..." | "/get/..." | "/run/...", "data": <base64(JSON)|any>} 
+- stdout(JSON): 標準化されたレスポンスを printResponse/printLog が出力（status, endpoint, result など）。
+
+依存関係（オプション含む）
+- 必須（実装時想定）: requests, packaging, flashtext, pillow, pyaudiowpatch, speech_recognition
+- ローカル推奨: faster-whisper, ctranslate2, torch（GPU 利用時）
+- Windows 固有（音声ループバック）: pycaw, comtypes
+
+参考: 実装上の安全設計として optional な import は try/except でガードしており、存在しない依存があっても import 時にクラッシュしない。