Engine v1.0 · Multimodal Fusion
Perceiving the unseen currents of human emotion.
MAITRI continuously listens to face and speech, fuses them into a single emotional baseline, and gently surfaces moments that deserve attention — without ever storing your raw audio or video.
Stream
Session 084.A9
Live
Resonance
Valence
0.82 ↑
Arousal
−0.14 ∼
Coherence
94%
Methodology
Four modules. One continuous baseline.
Face stream
Frame-by-frame visual signal with on-device preprocessing.
Speech stream
Mel-spectrogram features over short rolling windows.
Late fusion
Weighted probability blending — graceful single-modality fallback.
Trend analytics
Negative-affect score + threshold alerts with helpful next steps.
Privacy first
Your raw video and audio never leave your device.
MAITRI processes camera and microphone frames locally and stores only derived emotion scores tied to your account. You can export or delete your history at any time.

