Unlocking the Next Level of Learning: A Conversation on Multimodal AI
- Published on: December 11, 2025
- |
- Updated on: December 11, 2025
- |
- Reading Time: 2 mins
- |
-
Views
- |
In our most recent episode of EdTech Connect: Innovators in Conversation, I sat down with Dipesh, our VP of Growth, and Luyen Chou, CEO and co‑founder of DeweyLearn, to discuss multimodal AI.
In our conversation, we discuss why education has struggled to leverage meaningful data, what principles should guide the responsible use of emerging AI capabilities in learning environments, and more.
What Is Multimodal AI?
As Luyen explained, multimodal AI refers to systems that can interpret more than text. They process audio, visual cues, gestures, tone, and other behavioral signals, which are the same signals humans use to understand one another.
How we express something can hold very different meanings depending on the facial expression, tone, or hesitation that comes with it. This is why text‑only interaction with AI will soon feel outdated. The next generation of learning tools will be able to read context the same way a skilled educator does.
Why Education Has the Data, but Not the Insight
Education has generated data for decades, but most of it has served administrative or compliance purposes rather than pedagogical ones.
Dipesh noted that the intent behind data collection shapes the value of the data. If the goal is compliance, we get completion rates and clicks. If the goal is understanding behavior and learning processes, we need different inputs entirely.
Historically, the industry was also constrained by:
- Limited ways to capture meaningful learning signals
- Fragmented systems with siloed data
- Tools incapable of interpreting complex, unstructured information
Standardized tests, for example, were never designed to measure the learning process. Multimodal AI enables richer data capture (micro‑expressions, engagement cues, etc.), immediate contextual interpretation, and actionable insights at the next level.
Educators can finally access the “in‑between” moments of learning, the struggle, the uncertainty, the cognitive load, and not just the final answer.
Privacy: The Non‑Negotiable Variable
Both Dipesh and Luyen emphasized a fundamental principle: privacy is not binary. It is contextual and based on transparency.
People are more open to sharing data when:
- They understand exactly what is being collected
- They see a clear, tangible benefit
- They know how the data will be stored, used, and discarded
This mirrors what we already experience with biometric systems in airports. When there is visibility into process and purpose, trust increases.
In education, this means:
- Transparent communication with students, educators, and parents
- Clear documentation outlining purpose and data lifecycle
- Stringent safeguards and boundaries
- Communicating value before asking for access
Framing multimodal capture as “surveillance” will shut down conversations. Demonstrating how it improves learning outcomes opens them up.
The Future of Multimodal AI and Learning
Multimodal AI is a major shift in how we understand learning because it captures expression, emotion, engagement, confusion, confidence, and the entire process of learning.
Until now, that information has been invisible to digital systems, but it is now measurable. Dipesh and Luyen made it clear that the institutions that begin building responsible, transparent, context‑aware frameworks for multimodal AI today will be the ones leading the next evolution in learning.
FAQs
An AI that reads more than text. It interprets audio, visuals, gestures, and tone so tools can understand context like a skilled educator.
Most data was collected for compliance, not teaching, and systems were too siloed to capture meaningful learning signals.
Micro‑expressions, engagement cues, hesitation, and other behavioral indicators that reveal the “in‑between” moments of learning.
Use clear purpose statements, strict safeguards, and transparent communication so people know what is collected and why.
Define use cases, document data lifecycles, align with transparency principles, and pilot context‑aware features with opt‑in participation.
Get In Touch
Reach out to our team with your question and our representatives will get back to you within 24 working hours.