Telegram Offline Voice

v0.1.3

本地生成 Telegram 语音消息，支持自动清洗、分段与临时文件管理。

⭐ 2· 3.2k·15 current·15 all-time

bysanwe@sanwecn

MIT-0

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

Purpose & Capability

Name/description promise: local, offline TTS. Implementation: uses the Python package edge_tts (declared in the script header comments and imported in code). edge_tts typically requests audio from Microsoft TTS endpoints (i.e., network calls), so the claim of being fully offline and zero-token may be false or misleading. Other requirements (ffmpeg, Python, optional uv) match the stated purpose.

ℹ

Instruction Scope

SKILL.md and the script keep scope to converting provided text -> temporary mp3 -> final ogg and printing file paths. The script only reads the supplied --text and writes temp files under outdir (/tmp by default). No instructions to read unrelated files or environment variables. However, runtime will transitively call edge_tts which may make network requests to Microsoft endpoints — the SKILL.md asserts local-only processing but does not document or justify any network usage.

ℹ

Install Mechanism

No formal install spec (instruction-only), which is lower risk. SKILL.md recommends installing uv with curl | sh from https://astral.sh/ (third-party installer); piping remote scripts to sh is higher risk and should be treated cautiously. The script lists Python dependencies (edge-tts, aiofiles) in header comments but does not provide explicit pip install steps — user must install these. No arbitrary binary downloads or obscure URLs in included files beyond the uv installer suggestion.

✓

Credentials

The skill requests no environment variables or credentials. The script operates entirely on supplied text and local temp files; there are no demanded tokens, keys, or config path access. This is proportionate to its stated functionality — except for the unresolved offline claim noted above.

✓

Persistence & Privilege

The skill is user-invocable only (always: false) and does not request persistent agent-wide privileges or modify other skills. It does not persist credentials or change system-wide settings.

What to consider before installing

This skill is mostly coherent: it converts text to OGG via edge_tts and ffmpeg and manages temp files correctly. However, the author repeatedly claims "100% local / zero token" while importing and using edge_tts — that library typically obtains TTS audio from Microsoft services (network calls), so the skill may not be truly offline. Before installing or running: 1) Verify edge_tts behavior in your environment (check its docs or run it in a sandbox) to confirm whether audio is generated locally or fetched from the network. 2) If you require strict offline operation, consider replacing edge_tts with a known local TTS engine (Coqui/tts, local OpenTTS endpoint, or local Edge browser invocation) or explicitly audit network traffic. 3) Avoid running curl | sh blindly — the uv install script is optional; prefer installing known packages from distro repos or inspecting the installer first. 4) Run the script in an isolated environment first (container or VM), monitor outbound connections (tcpdump/strace) and review installed Python dependencies (pip show edge-tts) to ensure no unexpected exfiltration. If you cannot confirm edge_tts is purely local, treat the offline claim as inaccurate and proceed with caution.

Like a lobster shell, security has layers — review code before you run it.

latestvk97akhnxk1z3h82wr96j1rebvh80pw61

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

Runtime requirements

🎙️ Clawdis

OSLinux

Binsffmpeg, uv

SKILL.md

telegram-offline-voice 🎙️

本地生成，一键封装 — 使用 Microsoft Edge-TTS 生成高质量中文语音，完全离线处理。

💡 为什么需要这个升级？

原生的 TTS 方案通常只能生成 MP3 附件，且无法处理 Markdown 标记和超长文本。本项目通过工程化封装，将“语音合成”进化为“语音交互”：

告别标记音：自动识别并清洗 **, #, [link] 等 Markdown 符号，避免 AI 读出这些“代码噪音”。
智能对话流：超长文本不再是一读到底，而是按句号、感叹号自动切分为多个语音气泡，听感更像真人在发语音。
并发安全：针对多代理/子代理并行调用的场景，使用 UUID 隔离临时文件，彻底杜绝文件读写冲突。
零token消耗：完全基于 Edge-TTS 本地生成，无需 OpenAI/Azure 的付费 Token。

✨ 特性

🔒 隐私保护：100% 本地音频处理，不经过任何额外云端 TTS 提供商。
💰 零token消耗：使用免费的 Edge-TTS 引擎，省下昂贵的 API 额度。
🎯 一键生成：内置 voice_gen.py 脚本，自动完成“文本->MP3->OGG”的全过程。
🧹 自动清洗：自动剔除 Markdown 符号和 URL 链接，让朗读更自然。
✂️ 智能分段：超长文本自动按标点符号切分为多个语音气泡。
🛡️ 安全并发：使用 UUID 命名临时文件，支持多代理同时调用。

🛠️ 安装依赖

# 需要 Python 环境和 FFmpeg
sudo apt update && sudo apt install ffmpeg python3-pip -y

# 推荐安装 uv 以极速运行封装脚本
curl -LsSf https://astral.sh/uv/install.sh | sh

🚀 使用方法 (推荐)

直接调用封装脚本，一键生成 Telegram 原生语音气泡路径：

uv run {baseDir}/scripts/voice_gen.py --text "您的待播报内容"

⚙️ 技术细节

参数说明

--text / -t: 待生成的文本（必填）。
--voice: 声线选择，默认 zh-CN-XiaoxiaoNeural（晓晓）。
--rate: 语速调节，默认 +5%。
--outdir: 临时文件存放目录，默认 /tmp。

自动化清洗规则

脚本会自动移除以下内容以确保朗读流畅：

Markdown 符号：**, *, _, `, #
链接逻辑：[文本](url) 以及所有 http/https 开头的链接
分割线：---, *** 等

👨💻 关于作者

由 @sanwe 调优并维护。欢迎关注我的推特获取更多 OpenClaw 进阶玩法：https://x.com/sanwe

Files

5 total

Select a file

Select a file to preview.

Comments

Loading comments…