Preprocessing

Data acquisition pipelines

Live view of the audio and video feature vectors fed into the emotion models. Everything runs on-device.

Audio · A_feat

Energy (RMS)

—

Pitch (Hz)

—

ZCR

—

MFCC (13)

Pipeline: PCM → noise gate → framing + Hann window → FFT → mel filterbank → log → DCT → MFCC + pitch + energy + ZCR. Vector length 16.

Video · V_feat

Face

—

Brightness

—

Motion

—

Aligned 8×8 embedding

Pipeline: frame → face detection (FaceDetector API w/ heuristic fallback) → alignment → resize + lighting normalization → 8×8 luma embedding + landmarks. Vector length 66.

Fused feature flow

A_feat (16)  →  ┐
                              ├─►  multimodal fusion  →  emotion + severity
V_feat (66)  →  ┘

Maitri

Data acquisition pipelines