Preprocessing
Data acquisition pipelines
Live view of the audio and video feature vectors fed into the emotion models. Everything runs on-device.
Audio · A_feat
Energy (RMS)
—
Pitch (Hz)
—
ZCR
—
MFCC (13)
Pipeline: PCM → noise gate → framing + Hann window → FFT → mel filterbank → log → DCT → MFCC + pitch + energy + ZCR. Vector length 16.
Video · V_feat
Face
—
Brightness
—
Motion
—
Aligned 8×8 embedding
Pipeline: frame → face detection (FaceDetector API w/ heuristic fallback) → alignment → resize + lighting normalization → 8×8 luma embedding + landmarks. Vector length 66.
Fused feature flow
A_feat (16) → ┐
├─► multimodal fusion → emotion + severity
V_feat (66) → ┘
