GM-Genie uses a combination of server-side and client-side processing to create an immersive audio experience for text-based games. Key components include:
- A custom model serving API that handles concurrent requests from multiple clients.
- Real-time speech-to-text using Gemini Live API with continuous capture, no noise gate on the client side.
- Dynamic sound effects fetched in real time based on game context from Freesound API and cached locally for reuse.
- An audio pipeline that captures raw PCM data at 16kHz and batches it before sending to the server.
- A scene detector on the server that triggers events like sound changes or text updates based on transcript analysis.
- A dynamic story arc system that evolves through phases, generating encounter seeds tailored to the current phase of the larger narrative.
Read the full article at DEV Community
Want to create content about this topic? Use Nemati AI tools to generate articles, social posts, and more.





