Back to projects
Build Along
A voice-guided assembly assistant prototype that turns manual-style instructions into short, structured steps with ElevenLabs text-to-speech.
Active prototype — voice-guided assembly assistantElevenLabsText-to-speechPythonNext.jsTypeScriptReact
Overview
I built Build Along to explore how voice can make physical assembly tasks easier to follow hands-free. The prototype turns manual-style instructions into structured step cards that can be read aloud using ElevenLabs, with controls for moving through steps and replaying the current instruction.
Built
- A step-card interface for moving through structured assembly instructions one step at a time.
- Read-aloud and repeat-step controls for hands-free guidance.
- A FastAPI backend endpoint that calls ElevenLabs text-to-speech and returns playable audio.
- Static demo audio support for public portfolio deployments where live API usage is not required.
- A deliberately small V1 that proves the core loop from instruction step to playable spoken guidance.
Engineering decisions
- I kept V1 focused on one complete product loop: structured instruction step, UI card, backend speech endpoint, ElevenLabs audio generation, and playable output.
- I treated the project as an assembly guidance prototype rather than a generic text-to-speech wrapper.
- I separated the frontend and backend so the UI, orchestration, and speech generation boundary stays explicit.
- I added static demo audio support so the project can be shown publicly without relying on live paid API calls.
- I avoided PDF parsing, camera input, voice commands, authentication, and persistence in V1 so the core interaction could be tested quickly.
Demonstrates
- Product thinking around AI-assisted physical workflows.
- Voice UI and text-to-speech integration.
- FastAPI backend design.
- Next.js frontend development.
- API boundary design between frontend and backend.
- Scoped MVP delivery.
- Pragmatic handling of public demo constraints.
Trade-offs
- Using ElevenLabs gives the prototype high-quality speech output quickly, but it introduces API cost and deployment considerations.
- Using sample structured instructions keeps the prototype focused, but it avoids the harder problem of parsing arbitrary manuals.
- Static demo audio makes the portfolio demo safer and cheaper to host, but it is less impressive than fully dynamic speech generation.
- Keeping V1 small limits functionality, but it makes the project easier to finish, explain, and improve.
Next
- Add voice commands for next, previous, repeat, and pause.
- Introduce manual ingestion and structured step extraction.
- Add richer step metadata such as tools, parts, warnings, and estimated duration.
- Explore camera-based progress checks for visual confirmation during assembly.
- Add a shareable QR-code flow where a product can link directly to its guided build experience.