Vocal Analysis — nefesh.ai Docs

Vocal Signals

Voice is a powerful stress indicator. When people are stressed, their vocal pitch rises, speech rate changes, and tonal quality shifts. Nefesh accepts pre-extracted vocal features and fuses them with other signals for a more complete picture.

Accepted Fields

Field	Type	Values	Description
`tone`	string	calm, focused, hesitant, tense, frustrated, anxious, hostile, excited, neutral	Classified vocal tone
`speech_rate`	float	words/min	Speaking speed — elevated rates may indicate agitation
`pitch_variability`	float	Hz	Variation in fundamental frequency — low variability can indicate monotone/suppressed speech

How It Works

Nefesh does not process raw audio. You send pre-classified vocal features (from your own speech analysis pipeline or a third-party service), and Nefesh fuses them with cardiovascular, visual, and textual signals.

The tone field has the strongest impact. Values like tense, anxious, and hostile shift the stress score upward, while calm and neutral shift it downward.

Example Payload

{
  "session_id": "sess_abc123",
  "timestamp": "2026-03-30T14:30:00Z",
  "tone": "tense",
  "speech_rate": 168.5,
  "pitch_variability": 42.3
}

Privacy

No audio is sent to or stored by Nefesh. Only the extracted features (tone classification, speech rate, pitch variability) are transmitted. Raw audio stays on the client.