MAITRI

Maitri

Initializing
Preprocessing

Data acquisition pipelines

Live view of the audio and video feature vectors fed into the emotion models. Everything runs on-device.

Audio · A_feat
Energy (RMS)
Pitch (Hz)
ZCR
MFCC (13)
Pipeline: PCM → noise gate → framing + Hann window → FFT → mel filterbank → log → DCT → MFCC + pitch + energy + ZCR. Vector length 16.
Video · V_feat
Face
Brightness
Motion
Aligned 8×8 embedding
Pipeline: frame → face detection (FaceDetector API w/ heuristic fallback) → alignment → resize + lighting normalization → 8×8 luma embedding + landmarks. Vector length 66.
Fused feature flow
A_feat (16)  →  ┐
                              ├─►  multimodal fusion  →  emotion + severity
V_feat (66)  →  ┘