Architecture

Three layers (two shipped)

Public product scope today is the alignment + signals wedge: reference-guided speech scoring for guided workflows. Intelligence is the expansion layer, not the claim we lead with today.

Performance

Metric Value Notes
Pipeline latency (p50) ~15ms Target: <30ms
Acoustic scoring 3.4ms Full sentence; per-word: 165ns–822ns
Streaming cadence 500–1000ms P95 jitter: 55ms
Phoneme recognition 93–94.7% PRR L2-ARCTIC validated
Cost vs Azure (Q1 2026) ~5x cheaper $0.004/min vs ~$0.022/min (details)
14 Rust crates
4,100+ tests
Batch playground live
Design-partner phase
$0 external speech APIs
@prosody/sdk available

Developer experience

npm install @prosody/sdk

Roadmap

Milestone Focus Status
M0 Audio pipeline, alignment, and core scoring infrastructure Done
M1 Batch scoring platform, playground, and partner-evaluation foundation Done
M2 Custom model quality upgrades and L1-adaptive scoring Just started
M3 L1 detection, noise robustness, intelligibility Planned
M4–M8 Learner modeling, on-device inference, extended platform Planned

Here, M1 refers to the shipped batch platform and partner-evaluation foundation. Live design-partner evaluations and outbound are ongoing on top of that base.

About

Prosody Studio, LLC is a speech infrastructure company based in New York. We build the alignment + pronunciation signals layer for guided speech workflows — coaching, assessment, QA, and products that need structured feedback rather than generic transcription.

The platform is 14 Rust crates and 4,100+ tests, spanning GPU alignment, scoring services, APIs, observability, and the TypeScript SDK. Founded January 2026.

francois@prosody.studio