AI & Machine Learning

How I Built GM-Genie: A Cinematic AI Game Master with Gemini Live API

Ali Nemati6 hours ago47 sec read10 views

GM-Genie uses a combination of server-side and client-side processing to create an immersive audio experience for text-based games. Key components include:

A custom model serving API that handles concurrent requests from multiple clients.
Real-time speech-to-text using Gemini Live API with continuous capture, no noise gate on the client side.
Dynamic sound effects fetched in real time based on game context from Freesound API and cached locally for reuse.
An audio pipeline that captures raw PCM data at 16kHz and batches it before sending to the server.
A scene detector on the server that triggers events like sound changes or text updates based on transcript analysis.
A dynamic story arc system that evolves through phases, generating encounter seeds tailored to the current phase of the larger narrative.

Read the full article at DEV Community

Want to create content about this topic? Use Nemati AI tools to generate articles, social posts, and more.

Comments

Beyond Simple API Requests: How OpenAI's WebSocket Mode Changes the Game for Low Latency Voice Powered AI Experiences

OpenAI introduced a Realtime API using WebSocket mode to reduce latency in voice-enabled AI applications by enabling simultaneous audio input and output without intermediate text transcription steps. This shift supports native multimodal processing, ...

Ali Nemati

Tech & Gadgets12 hours ago26 sec read

Show HN: Claude Code skills that build complete Godot games

Godogen, a pipeline developed over a year, uses text prompts to generate complete, playable Godot 4 games by overcoming challenges in training data scarcity, build-time vs runtime state management, and evaluation through visual QA. This tool is signi...

Ali Nemati

AI & Machine Learning15 hours ago33 sec read

Build DeepSeek-V3: Multi-Head Latent Attention (MLA) Architecture

The Multihead Latent Attention (MLA) is an advanced attention mechanism designed to enhance efficiency in transformer models through compression/decompression of queries and key-values, LoRA-style low-rank projections for computational savings, and R...

Ali Nemati

AI & Machine Learning1 day ago22 sec read

Aligning Language Models from User Interactions

Researchers propose a method using self-distillation to improve language model performance by learning from multi-turn user interactions, enhancing alignment and instruction-following abilities without degrading other capabilities. This approach allo...

Ali Nemati

$98$\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the vLLM Semantic Router$

AI & Machine Learning1 day ago26 sec read

98$\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the vLLM Semantic Router

Researchers introduced optimizations to vLLM Semantic Router that significantly reduce latency and memory usage for long-context classification without requiring a dedicated GPU. Key improvements include custom Flash Attention, prompt compression tec...

Ali Nemati

How I Built GM-Genie: A Cinematic AI Game Master with Gemini Live API

Related Articles

Beyond Simple API Requests: How OpenAI's WebSocket Mode Changes the Game for Low Latency Voice Powered AI Experiences

Show HN: Claude Code skills that build complete Godot games

Build DeepSeek-V3: Multi-Head Latent Attention (MLA) Architecture

Aligning Language Models from User Interactions

98$\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the vLLM Semantic Router