For the best experience, visit on a desktop browser.
< BACK TO MISSIONS

RASPBERRY PI AI ASSISTANT

CODING3D PRINTING2026-03-22
airaspberrypizordon

Building Zordon: A Raspberry Pi AI Command Center That Talks Back

About a month ago, I set up an instance of OpenClaw on a Raspberry Pi 5 with a simple goal: a personal assistant to help manage email, calendar, fitness tracking, and real estate stuff. I named the main orchestrator agent "Zordon" and themed the whole thing around Power Rangers just for fun. What I didn't expect was how immediately useful it would be.

It didn't take long before I wanted a way to check on things at a glance without messaging my agent over Slack. So I dug out a spare 10" touchscreen, plugged it in, and had Zordon put together a basic dashboard. It took a matter of minutes. The dashboard worked great, but it wasn't exactly fun or exciting.

I wanted something my kids would actually want to interact with. Something alive. I decided to try turning it into a voice assistant with a 3D avatar: Zordon as a floating blue head in a tube, just like in the show. The only problem was I had no idea how to pull that off. I know my way around CAD software for mechanical design, but organic 3D modeling and animation? Totally different world.

After some searching, I got lucky and happened to stumble upon a free, rigged 3D model of a bald man that was close enough to work with. I used Blender to strip out the unnecessary parts (reducing the file size enough to run on a Pi), shifted the skin texture to give him a blue tint, and had my agent walk me through the process. Then came the software side.

1 / 2

The Tech Stack

The UI runs as a Node.js/Express server on the Pi. The frontend is built on Three.js with the TalkingHead.js library handling the 3D facial animation (lip sync, eye movement, expressions) all driven by audio playback. The GLB model is served locally and rendered in a fullscreen browser kiosk.

For speech recognition, I'm running a local whisper.cpp server on the Pi that handles voice-to-text transcription. For text-to-speech, I'm using Piper (a fast local TTS engine) with custom audio post-processing via FFmpeg: a slight pitch shift, tempo adjustment, and a short echo effect to give Zordon's voice a slightly otherworldly quality. There's a fallback to espeak-ng if Piper fails, and a TTS cache that pre-warms common phrases so greetings and fillers play back instantly.

1 / 4

The Tiered Routing System

One of the most important architectural decisions was how to handle different types of prompts without burning cloud API calls or introducing unnecessary latency. I ended up with a three-tier routing system:

  • Tier 1: Pi Local (instant): Simple rule-based responses handled entirely on the Pi with no model involved. Greetings, the current time/date, basic arithmetic, short conversational follow-ups. Zero latency.
  • Tier 2: LAN Ollama (fast): A Qwen 2.5 14B model running on a spare gaming PC on my local network handles general knowledge questions and longer conversations. It also knows to self-escalate: if a question requires real-time data (weather, email, etc.), it returns an escalation signal rather than hallucinating an answer.
  • Tier 3: OpenClaw / Claude (full capabilities): Anything that requires actual tooling (e.g. checking email, reading the calendar, adjusting the Philips Hue lights, looking up packages) gets routed through OpenClaw, which has access to all of those integrations. This tier is slower but handles the things that actually matter.

The router also maintains conversation continuity: if you're mid-conversation with a Tier 2 or Tier 3 response, short follow-ups ("do it," "yes," "cancel") stay on that same tier instead of being re-routed. A separate Ollama vision model instance handles scene understanding via a background process capturing camera frames at a set interval and sends them for analysis, so Zordon has a description of what's happening in the room as context for any conversation.

The Hardware Build

After a while, the setup still looked like a screen with wires hanging out of it and an exposed Pi mounted to the back. So I modeled a case in Autodesk Fusion and 3D printed it over a weekend. Having a proper enclosure made a huge difference in making it look like an actual product, and it's a lot more durable against my small children who had pressing questions to ask Zordon such as "6-7" and "do you poop?"

1 / 3

The camera mounted in the housing opened the door to a few more features:

  • Gaze detection: Zordon notices when someone approaches and delivers a greeting (with a cooldown so he doesn't say hello every 30 seconds)
  • Head tracking: The avatar moves its eyes and head to follow whoever is in frame
  • Scene context: The vision model describes the room and feeds that into Zordon's context window

The last few features I added were more for fun than utility. There's an image generation mode powered by the OpenAI Images API where you describe what you want and it generates it, with an option to email the image directly to yourself. And I connected Spotify, which lets Zordon control music playback (complete with head bobbing/swaying to the tempo of whatever's playing).

What should I add next?

DANNY HINES

LOADING 0%