Overview
Current product scope, partner-evaluation path, validated metrics, and roadmap. M0 is complete. M1 platform work is complete and partner evaluations are active. M2 has just started.
Architecture
Three layers (two shipped)
-
Infrastructure — Complete
GPU phoneme alignment (~20ms). Word boundary detection <1ms. Vendor-neutral and already live in production. -
Signals — Complete
Acoustic scoring 3.4ms (contract: <10ms). Batch product live and public. Streaming remains selective beta, not the lead product surface today. -
Intelligence — In development
Custom ML models trained natively in Rust. L1-adaptive scoring and coaching quality upgrades come next.
Public product scope today is the alignment + signals wedge: reference-guided speech scoring for guided workflows. Intelligence is the expansion layer, not the claim we lead with today.
Performance
| Metric | Value | Notes |
|---|---|---|
| Pipeline latency (p50) | ~15ms | Target: <30ms |
| Acoustic scoring | 3.4ms | Full sentence; per-word: 165ns–822ns |
| Streaming cadence | 500–1000ms | P95 jitter: 55ms |
| Phoneme recognition | 93–94.7% PRR | L2-ARCTIC validated |
| Cost vs Azure (Q1 2026) | ~5x cheaper | $0.004/min vs ~$0.022/min (details) |
Developer experience
-
TypeScript SDK —
@prosody/sdk
Full TypeScript types, runtime validation, automatic retries, and browser audio utilities. Score and batch guided workflows from one client; streaming remains selective beta. -
REST API — Simple JSON endpoints
POST /v1/scoresfor single recordings,/v1/scores/batchfor up to 100 at once. WebSocket streaming exists for real-time beta integrations, but batch is the default external evaluation path today. -
Playground — Try without a key
Upload or record audio in the browser. Batch-first product proof. No signup required.
npm install @prosody/sdk
Roadmap
| Milestone | Focus | Status |
|---|---|---|
| M0 | Audio pipeline, alignment, and core scoring infrastructure | Done |
| M1 | Batch scoring platform, playground, and partner-evaluation foundation | Done |
| M2 | Custom model quality upgrades and L1-adaptive scoring | Just started |
| M3 | L1 detection, noise robustness, intelligibility | Planned |
| M4–M8 | Learner modeling, on-device inference, extended platform | Planned |
Here, M1 refers to the shipped batch platform and partner-evaluation foundation. Live design-partner evaluations and outbound are ongoing on top of that base.
About
Prosody Studio, LLC is a speech infrastructure company based in New York. We build the alignment + pronunciation signals layer for guided speech workflows — coaching, assessment, QA, and products that need structured feedback rather than generic transcription.
The platform is 14 Rust crates and 4,100+ tests, spanning GPU alignment, scoring services, APIs, observability, and the TypeScript SDK. Founded January 2026.