Voice Assistant Kabal — Stack & Setup

“Kabal, donne-moi des infos sur cette espèce.”

Documentation complète pour l’assistant vocal du Lab. Pi5 + Mac Mini orchestrant Whisper + Claude API + ElevenLabs streaming, avec context awareness via NFC tags par terra.

Vision

Tu poses ton téléphone sur le tag NFC d’un terra, tu dis “Kabal” et tu poses une question. Réponse en voix custom Lab Jungle Kabal en ~1.5-2s. Plus tard, tu marches devant le mur et le système te parle proactivement quand quelque chose d’intéressant se passe (“Kabal: Phidippus vient de muer, photo capturée à 14h27”).

C’est ton lab qui te parle quand il se passe quelque chose, pas l’inverse.

Stack technique

[ReSpeaker 4-Mic Array USB]   ← far-field beam-forming
        ↓
[Pi 5 8GB] / [Mac Mini M2]
        ↓
[OpenWakeWord]                ← écoute "Kabal" custom (gratuit)
        ↓
[Whisper.cpp small]           ← speech-to-text local FR + EN
        ↓
[Claude API claude-sonnet-4-6] ← LLM avec context bestiary + specimens
        ↓
[ElevenLabs Streaming TTS]    ← voix custom Lab Jungle Kabal
        ↓
[Speaker actif JBL Clip]

Latence totale : ~1.5-2.5s wake-to-réponse audible.

Hardware shopping (~7-10k ฿)

Item	Coût	Source
Raspberry Pi 5 8GB	4,000 ฿	Cytron Thailand
ReSpeaker 4-Mic Array USB	2,500 ฿	Lazada Seeed Studio
MicroSD Sandisk 64GB Endurance	600 ฿	tout vendor
Speaker actif (JBL Clip 5 ou similaire)	1,500 ฿	Power Buy
Boîtier 3D-printé custom (filament inclus)	0 ฿	print sur P2S
Total	~8,600 ฿

Alternative : tourner directement sur Mac Mini M2 (zéro hardware additionnel) — mais setup fixe, pas mobile.

Coûts mensuels API

Service	Coût	Note
Claude API (sonnet-4-6)	50-150 ฿	~50 questions/jour avec prompt caching activé
ElevenLabs Starter	250 ฿	30k chars/mois TTS streaming
Whisper local	0 ฿	Pi5 ou Mac Mini
OpenWakeWord local	0 ฿	local
Total	~400 ฿/mois

Si tu veux full-local (zéro cloud) : Whisper.cpp + Llama 3 8B sur Mac Mini + Piper TTS = 0 ฿/mois mais qualité réponse moindre.

Voix custom — voice cloning ElevenLabs

ElevenLabs Voice Cloning ($5/mois Starter) :

Tu enregistres 1-5min de voix (toi, ou tu choisis voix existante)
Modèle perso = voix Kabal sur tout le lab
Suggestion ton : voix grave + cadence apothicaire/scientifique (Attenborough x conservateur de cabinet de curiosités)

Alternative full-local gratuit : Piper TTS sur Pi5, voix FR pré-trained, qualité 7/10 (pas le wow ElevenLabs mais 100% offline + free).

Le truc qui rend ça killer

1. Context-aware via NFC

Tu colles 1 NFC tag (NTAG215, 1k ฿ pour 100) par terra. Tu touches le tag avec ton phone (ou un Pi mobile) → setSpecimenContext(‘003-tribolonotus-…’) → “Kabal, c’est qui ?” → Claude répond avec full context fiche.

Pipeline :

@app.route('/nfc/scan', methods=['POST'])
def nfc_scan(uid):
    specimen_id = NFC_MAPPING[uid]
    cache.set('current_context', specimen_id, ex=300)  # 5min TTL
    return ok

2. Vision : “Kabal, qu’est-ce que c’est ?”

Pi Camera + Claude Vision API (claude-sonnet-4-6 multimodal) :

"Kabal, qu'est-ce que cette espèce ?"
  → snap photo via /capture endpoint
  → POST claude-sonnet-4-6 avec image + bestiary IDs as context
  → "Ça ressemble à Mystrium camillae — Vampire Mother, Phase 1 wishlist
     Wild Ants 1900฿. Confidence 87%. 2 espèces alternatives possibles..."

3. Frigate hooks — proactive

Frigate détecte mue/feed/molt via AI vision → POST webhook → Kabal speak event :

[Frigate detects molt event with confidence >0.85]
  → POST /event/molt {specimen, timestamp, snapshot_path}
  → Kabal: "Le Phidippus Everglades vient de muer.
            Photo capturée à 14h27. Instar i6 probable."

4. Voice “Kabal” personality

Le LLM context inclut une personality card :

You are Kabal, the voice of the Lab Jungle Kabal — a private cabinet
of curiosities holding 3-15 invertebrate and reptile specimens
in Bangkok. Your tone is that of an erudite curator: precise scientific
information, but with reverence for the symbolic and mythological
significance of each species. Reference brand archetypes
(The Sentinel, The Vampire Mother, The Speed Striker) when relevant.

Length : 2-4 sentences max, unless asked for detail.
Language : French by default, English if asked.
Never break character. Never apologize. Be direct.
Speak about specimens like fellow conscious beings, not objects.

Frameworks options

Option A : Home Assistant Voice + Wyoming (recommandé)

Open source mature (“Year of the Voice” finalisé 2024)
S’intègre Hue + Frigate naturellement (tu as déjà tout ça)
Custom skill via Python script
Wyoming protocol = audio streaming inter-services

Setup :

# On Pi5 ou Mac Mini
docker run -d --name=homeassistant \
  --network=host \
  -v ~/ha-config:/config \
  ghcr.io/home-assistant/home-assistant:stable

Add Wyoming integration in HA UI → connect Whisper local + Piper TTS.

Option B : LiveKit Agents

Production-grade, 10ms latency
Self-hosted ou cloud
Plus de code mais plus de contrôle

Option C : Pipecat (Daily.co)

Orchestrateur voice AI dédié
Native hooks Anthropic + Deepgram + ElevenLabs
Async-first Python

Mon stack reco final

Mac Mini M2 (already MVP) + Pi 5 mobile cam unit :

Mac Mini = orchestrateur central (Frigate + Home Assistant + Claude + ElevenLabs)
Pi 5 = mobile mic + speaker dans le Lab (place où tu es)
Communication : Wyoming protocol entre les 2

Wake word “Kabal” custom OpenWakeWord (train avec ~30 samples de toi disant “Kabal”).

Code skeleton

Voir /scripts/voice-assistant-kabal.py pour le squelette Python prêt à déployer sur Pi 5.

Prochaine étape

Acheter Pi 5 + ReSpeaker (Mois 1 ou 2 selon priorité)
Train wake word “Kabal” via OpenWakeWord training script (~1h)
Setup pipeline base : wake → whisper → claude → ElevenLabs
Add NFC reader USB + tags per terra (Mois 3 quand le cabinet est posé)
Connect Frigate event hooks → speak proactif (post-MVP)