Build Along

A voice-guided assembly assistant prototype that turns manual-style instructions into short, structured steps with ElevenLabs text-to-speech.

Active prototype — voice-guided assembly assistantElevenLabsText-to-speechPythonNext.jsTypeScriptReact

Live demo Technical synopsis Source code

Overview

I built Build Along to explore how voice can make physical assembly tasks easier to follow hands-free. The prototype turns manual-style instructions into structured step cards that can be read aloud using ElevenLabs, with controls for moving through steps and replaying the current instruction.

Built

A step-card interface for moving through structured assembly instructions one step at a time.
Read-aloud and repeat-step controls for hands-free guidance.
A FastAPI backend endpoint that calls ElevenLabs text-to-speech and returns playable audio.
Static demo audio support for public portfolio deployments where live API usage is not required.
A deliberately small V1 that proves the core loop from instruction step to playable spoken guidance.

Engineering decisions

I kept V1 focused on one complete product loop: structured instruction step, UI card, backend speech endpoint, ElevenLabs audio generation, and playable output.
I treated the project as an assembly guidance prototype rather than a generic text-to-speech wrapper.
I separated the frontend and backend so the UI, orchestration, and speech generation boundary stays explicit.
I added static demo audio support so the project can be shown publicly without relying on live paid API calls.
I avoided PDF parsing, camera input, voice commands, authentication, and persistence in V1 so the core interaction could be tested quickly.

Demonstrates

Product thinking around AI-assisted physical workflows.
Voice UI and text-to-speech integration.
FastAPI backend design.
Next.js frontend development.
API boundary design between frontend and backend.
Scoped MVP delivery.
Pragmatic handling of public demo constraints.

Trade-offs

Using ElevenLabs gives the prototype high-quality speech output quickly, but it introduces API cost and deployment considerations.
Using sample structured instructions keeps the prototype focused, but it avoids the harder problem of parsing arbitrary manuals.
Static demo audio makes the portfolio demo safer and cheaper to host, but it is less impressive than fully dynamic speech generation.
Keeping V1 small limits functionality, but it makes the project easier to finish, explain, and improve.

Add voice commands for next, previous, repeat, and pause.
Introduce manual ingestion and structured step extraction.
Add richer step metadata such as tools, parts, warnings, and estimated duration.
Explore camera-based progress checks for visual confirmation during assembly.
Add a shareable QR-code flow where a product can link directly to its guided build experience.