Back to projects

Build Along

A voice-guided assembly assistant prototype that turns manual-style instructions into short, structured steps with ElevenLabs text-to-speech.

Active prototype — voice-guided assembly assistantElevenLabsText-to-speechPythonNext.jsTypeScriptReact

Overview

I built Build Along to explore how voice can make physical assembly tasks easier to follow hands-free. The prototype turns manual-style instructions into structured step cards that can be read aloud using ElevenLabs, with controls for moving through steps and replaying the current instruction.

Built

  • A step-card interface for moving through structured assembly instructions one step at a time.
  • Read-aloud and repeat-step controls for hands-free guidance.
  • A FastAPI backend endpoint that calls ElevenLabs text-to-speech and returns playable audio.
  • Static demo audio support for public portfolio deployments where live API usage is not required.
  • A deliberately small V1 that proves the core loop from instruction step to playable spoken guidance.

Engineering decisions

  • I kept V1 focused on one complete product loop: structured instruction step, UI card, backend speech endpoint, ElevenLabs audio generation, and playable output.
  • I treated the project as an assembly guidance prototype rather than a generic text-to-speech wrapper.
  • I separated the frontend and backend so the UI, orchestration, and speech generation boundary stays explicit.
  • I added static demo audio support so the project can be shown publicly without relying on live paid API calls.
  • I avoided PDF parsing, camera input, voice commands, authentication, and persistence in V1 so the core interaction could be tested quickly.

Demonstrates

  • Product thinking around AI-assisted physical workflows.
  • Voice UI and text-to-speech integration.
  • FastAPI backend design.
  • Next.js frontend development.
  • API boundary design between frontend and backend.
  • Scoped MVP delivery.
  • Pragmatic handling of public demo constraints.

Trade-offs

  • Using ElevenLabs gives the prototype high-quality speech output quickly, but it introduces API cost and deployment considerations.
  • Using sample structured instructions keeps the prototype focused, but it avoids the harder problem of parsing arbitrary manuals.
  • Static demo audio makes the portfolio demo safer and cheaper to host, but it is less impressive than fully dynamic speech generation.
  • Keeping V1 small limits functionality, but it makes the project easier to finish, explain, and improve.

Next

  • Add voice commands for next, previous, repeat, and pause.
  • Introduce manual ingestion and structured step extraction.
  • Add richer step metadata such as tools, parts, warnings, and estimated duration.
  • Explore camera-based progress checks for visual confirmation during assembly.
  • Add a shareable QR-code flow where a product can link directly to its guided build experience.