Prosody API (Public)
Consumer-facing endpoints only: health, scoring, alignment, languages, auth helpers, preview history/results, and beta streaming. Internal admin and debug routes are excluded.
Reference-guided speech feedback for guided workflows. Send audio + expected text, get aligned words, phonemes, and structured signals back.
Score a guided speech recording in under 60 seconds.
# Score a recording with curl curl -X POST https://api.prosody.studio/v1/scores \ -H "X-API-Key: $PROSODY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "audio_data": "'$(base64 < recording.wav)'", "sample_rate": 16000, "language": "en-US", "reference_text": "The quick brown fox" }'
// TypeScript SDK import { ProsodyClient } from "@prosody/sdk"; const client = new ProsodyClient({ apiKey: process.env.PROSODY_API_KEY }); const result = await client.score({ audio: readFileSync("recording.wav").toString("base64"), language: "en-US", referenceText: "The quick brown fox" });
# Score a recording with Python import requests, base64 with open("recording.wav", "rb") as f: audio = base64.b64encode(f.read()).decode() resp = requests.post( "https://api.prosody.studio/v1/scores", headers={"X-API-Key": PROSODY_API_KEY}, json={ "audio_data": audio, "sample_rate": 16000, "language": "en-US", "reference_text": "The quick brown fox", }, ) result = resp.json()
{
"scores": {
"pronunciation": 72.4,
"script_adherence": 100.0,
"overall": 72.4
},
"words": [
{
"word": "the",
"status": "match",
"acoustic_match": 68.1,
"timing": { "start": 0.12, "end": 0.24, "duration_ms": 120 },
"phonemes": [
{ "detected": "DH", "acoustic_match": 71.2, "timing": { "start": 0.12, "end": 0.18 } },
{ "detected": "AH", "acoustic_match": 65.0, "timing": { "start": 0.18, "end": 0.24 } }
]
}
]
}
Works with any HTTP client. Public examples here use English (en-US) today. See the full
POST /v1/scores reference
below, or try the playground — no key
required.
Import the public collection, pair it with the production
environment, set api_key, and start with
GET /health and POST /v1/scores.
Consumer-facing endpoints only: health, scoring, alignment, languages, auth helpers, preview history/results, and beta streaming. Internal admin and debug routes are excluded.
Preconfigured for https://api.prosody.studio. Set
api_key and the collection script injects
X-API-Key automatically.
Recommended consumer flow: use api_key for external
evaluation. JWT login remains available for local/dev and
user-session testing. WebSocket streaming is available but should be
treated as beta in Postman. Stored result lookup is a manual flow
that uses score_result_id, which is distinct from the
scoring response request_id. GET
/v1/history
and GET /v1/results are preview endpoints and currently
return mock data.
Prosody aligns every phoneme in the speaker's audio against a reference text. The alignment engine runs on GPU and produces per-phoneme timing boundaries in ~20ms. This is the foundation layer — it tells the system what was said and when.
On top of alignment, Prosody generates acoustic scores that measure how well each phoneme was pronounced. Scores use a 0–100 scale across three perspectives:
All scoring requires a reference_text — the sentence
the learner was asked to read. The system aligns what was spoken
against what was expected, then scores the match. This is the core
pattern for guided speech products: the user sees a prompt, records
audio, and the product needs aligned feedback back.
Single — Score one recording at a time. Best for feedback after a speaker finishes talking.
Batch — Score up to 100 recordings in a single request. Best for grading homework sets, running test suites, or processing guided speech sessions at scale.
Streaming — Send audio over WebSocket and receive word-by-word scores as the learner speaks. 500–1000ms tick cadence. Best for live coaching interfaces. The public product still leads with batch; streaming is available in beta.
https://api.prosody.studio
All API requests use HTTPS. HTTP requests are rejected.
Authenticate requests with an API key via the
X-API-Key header.
curl -X POST https://api.prosody.studio/v1/scores \ -H "X-API-Key: your_api_key" \ -H "Content-Type: application/json" \ -d '...'
Trial mode is available without a key — 10 requests per day per IP. Try the playground to test without signing up.
Score a single audio recording against a reference text.
{
"audio_data": "<base64-encoded audio>",
"sample_rate": 16000,
"language": "en-US",
"reference_text": "The quick brown fox"
}
| Field | Type | Required | Description |
|---|---|---|---|
audio_data |
string | Yes | Base64-encoded audio data |
sample_rate |
integer | Yes | Audio sample rate in Hz (e.g. 16000) |
language |
string | Yes | Language code: en-US |
reference_text |
string | Yes | The expected text the speaker should have read |
| Parameter | Type | Default | Description |
|---|---|---|---|
detail |
string | standard |
Response detail level: summary,
standard, or full
|
silence_threshold_ms |
integer | 120 |
Silence gap (ms) for word boundary detection |
{
"scores": {
"pronunciation": 72.4,
"script_adherence": 100.0,
"overall": 72.4
},
"words": [
{
"word": "the",
"status": "match",
"acoustic_match": 68.1,
"timing": { "start": 0.12, "end": 0.24, "duration_ms": 120 },
"phonemes": [
{
"expected": "DH",
"detected": "DH",
"acoustic_match": 71.2
},
{
"expected": "AH",
"detected": "AH",
"acoustic_match": 65.0
}
]
}
]
}
SDK equivalent: client.score()
Score multiple recordings in a single request (up to 100 items).
{
"items": [
{
"item_id": "sentence-1",
"audio_data": "<base64>",
"sample_rate": 16000,
"language": "en-US",
"reference_text": "Hello world"
}
],
"parallel": true,
"max_concurrency": 4
}
{
"success_count": 5,
"failure_count": 0,
"total_time_ms": 1240,
"results": [
{
"item_id": "sentence-1",
"success": true,
"result": { /* same as POST /v1/scores response */ }
}
]
}
SDK equivalent: client.scoreBatch()
Stream audio for real-time scoring over a persistent WebSocket connection. Results arrive as words are recognized, with a 500–1000ms tick cadence.
wss://api.prosody.studio/v1/stream ?language=en-US &reference_text=The+quick+brown+fox
{"type":"end"} to signal end of
audio.
SDK equivalent: client.stream(). Streaming scoring is in beta. Contact us for access and
integration guidance.
| Field | Type | Description |
|---|---|---|
pronunciation |
float | GOP quality for words matched to the script. 0–100 scale. |
script_adherence |
float | How closely the speaker followed the reference text. 100 means all expected words were detected. 0–100 scale. |
overall |
float | Combined score from acoustic match and script adherence. 0–100 scale. |
| Field | Type | Description |
|---|---|---|
word |
string | The expected word from the reference text |
start |
float | Start time in seconds |
end |
float | End time in seconds |
status |
string |
match, mismatch, or
missing
|
acoustic_match |
float | Per-word acoustic match score. 0–100 scale. |
phonemes |
array | Per-phoneme detail (see below) |
| Field | Type | Description |
|---|---|---|
expected |
string |
Expected phoneme (ARPAbet notation, e.g. DH)
|
detected |
string |
Detected phoneme (ARPAbet notation, e.g. DH)
|
acoustic_match |
float | Per-phoneme acoustic match score. 0–100 scale. |
start |
float | Start time in seconds |
end |
float | End time in seconds |
Audio is sent as base64-encoded data in the audio_data
field.
| Format | Supported | Notes |
|---|---|---|
| WAV (PCM 16-bit) | Recommended | Lossless, best quality |
| WebM (Opus) | Yes | Browser recording default |
| MP3 | Yes | Auto-detected and converted |
| FLAC | Yes | Lossless compressed |
| OGG (Vorbis) | Yes | Auto-detected and converted |
All audio is internally converted to 16kHz mono PCM for alignment. Providing 16kHz mono WAV avoids conversion overhead.
Errors return a JSON body with error and
message fields.
{
"error": "bad_request",
"message": "reference_text is required"
}
| Status | Error | Description |
|---|---|---|
400 |
bad_request | Invalid audio, missing reference_text, or bad parameters |
401 |
unauthorized | Missing or invalid API key (not applicable in trial mode) |
429 |
rate_limit_exceeded |
Too many requests. Check
retry-after header
|
500 |
internal | Server error during scoring |
503 |
service_unavailable | Alignment engine temporarily unavailable |
| Tier | Limit | Auth |
|---|---|---|
| Trial | 10 requests / day | None (IP-based) |
| Developer | 60 requests / min | API key |
| Growth | Custom | API key |
Rate limit status is returned in response headers:
| Header | Description |
|---|---|
x-ratelimit-limit |
Maximum requests allowed in the current window |
x-ratelimit-remaining |
Requests remaining in the current window |
x-ratelimit-reset |
Unix timestamp when the window resets |
retry-after |
Seconds to wait (only present on 429 responses) |
The official TypeScript SDK (@prosody/sdk) wraps the
Prosody HTTP API with full type safety, automatic retries, Zod
schema validation, and browser audio utilities.
Install it directly, or use the HTTP API if you want a language-agnostic integration.
npm install @prosody/sdk
For GDPR-specific questions or a Data Processing Agreement, contact francois@prosody.studio.