AI Hallucinates 100% of the Time – So How Do We Build for Education?
- Published on: April 16, 2026
- Updated on: April 16, 2026
- Reading Time: 4 mins
-
Views
There’s a lot of talk about keeping a “human in the loop” when it comes to AI. In most conversations, that means humans in an oversight capacity, reviewing outputs, flagging errors, and signing off before something goes live. It’s a reasonable instinct. But after sitting down with Vida Williams, Chief Data Officer at Stride K-12, and Harish Agarwal, Head of Data and AI Solutions at Magic EdTech, I walked away realizing we’re putting the human in the wrong place.
The harder question isn’t whether humans are checking AI’s work at the end. It’s whether human judgment is embedded in the design from the start – in the data architecture, in the pedagogical assumptions, in the way we define what “working” even means.
AI Hallucinates 100% of the Time and Gets It Right Occasionally
That’s a quote from Vida that was said casually during our conversation, but I haven’t stopped thinking about it. Most discussions about AI hallucination treat it as a bug: something that happens some percentage of the time and needs to be managed. Vida’s framing flips that. These systems are probabilistic by design. They generate responses based on statistical patterns, not understanding. Getting it right is the outcome we’re hoping for, not the starting condition.
That reframe matters because it changes what you build for. If you treat hallucination as an edge case, you design for monitoring and correction. If you accept that the entire mechanism is probabilistic, you design for accuracy from the ground up, modular systems where a misbehaving model can be taken offline without pulling everything else down. As Harish explained, in a non-AI system, you deploy and monitor, but with AI, you have to keep testing, because the model can stop getting it right at any time. And in education specifically, as Vida put it, the AI needs to be right more often than a teacher is going to be wrong.
I think that’s actually a more optimistic starting point than it sounds. When you’re honest about what the technology is doing under the hood, you make better decisions about where to use it and how much to trust it.
Can Your Team Drive a Stick?
Vida used an analogy I keep coming back to. She said she asks her leaders, “Can your team drive a stick?” Most people can drive an automatic. But if you’re building new technology on top of existing technology, which is essentially what AI implementation is, and your people don’t have the core skill of designing systems from fundamentals, the work will erode over time.
This pushes back on the popular narrative that AI can act as a great equalizer for teams, that it can compensate for gaps in expertise. Maybe in some contexts. But in education, someone still needs to know what a good outcome looks like. Someone needs to understand how a seven-year-old learns differently from a
fifteen-year-old, how state standards map to curriculum design, and where cognitive rigor should be enforced rather than shortcut. AI can accelerate the work of people who already understand these things. It can’t stand in for the understanding.
Across industries, we need humans who can define “right” before we ask AI to get there faster. In education, the stakes are just higher; you’re not producing a bad report, you’re shaping how a kid understands a subject.
Don’t Put AI on Top. Rethink the Design.
Harish raised something that I think a lot of technical leaders already feel but haven’t fully acted on. He said the biggest problem he sees is organizations sticking AI on top of existing products and processes. His challenge to clients: imagine you didn’t have this product and you were designing it today. How would you build it?
That’s an uncomfortable question for organizations with decades of student-tested material and research-backed strategies. Nobody wants to throw that away. But the question isn’t about starting over – it’s about whether the architecture can support what AI actually needs to function well: clean data pipelines, systems that talk to each other, and structures designed for the way students interact with technology now, not ten years ago.
The risk of layering AI onto legacy systems isn’t just technical debt. In education, it’s the difference between a tool that genuinely personalizes learning and one that repackages existing content with a chatbot interface. Nobody wants to put AI-generated filler in front of students.
Data Leadership Belongs at the Strategy Table
Harish also made an observation that I think deserves more attention: at many institutions, data leadership is still housed within IT and remains primarily focused on cybersecurity, governance, and compliance. All important work, but it’s defensive by nature. What’s missing is data leadership that’s involved in instructional strategy, product design, and procurement decisions.
After spending time with Vida, I see why this matters so much. Her approach at Stride isn’t just about protecting data – it’s about understanding the implications of using data about people to build tools for those same people. That’s a different way of thinking about the CDO role than treating it as a compliance function. It means asking what data is being collected from students, what’s being given back, and whether that exchange actually serves learning or just generates metrics.
Harish noted that when institutions buy an LMS, many still aren’t asking about data ownership, portability, or how the platform consumes and shares data. Those are the questions that determine whether AI can eventually work well in that ecosystem – and right now, they’re often afterthoughts. CDOs who are positioned to influence these decisions early, rather than clean up after them, will be the ones who make AI in education actually work.
Optimism Requires Honesty
Vida said something toward the end of our conversation that stuck with me: “It’s exciting work. It’s scary work.” I think she’s right on both counts. AI can genuinely change how students learn, but only if we’re honest about what we don’t yet know how to measure and where our systems aren’t ready. The human expertise isn’t the checkpoint at the end – it’s the reason to build any of this in the first place.
Watch the full conversation with Vida Williams and Harish Agarwal on EdTech Connect: Innovators in Conversation.
FAQs
The article argues that post-output review is too late to be the primary control. The stronger move is to build around constrained use cases, modular architecture, and a clear instructional definition of what "right" means. Human judgment should shape the system design, not just catch failures after they appear.
The first thing to do would be to assess the underlying system itself. In case the process of the flow of data, the structuring of content, and product development itself have not considered any adaptability or probability within it, implementing a chatbot will only provide superficial novelty.
Moving data leadership closer to product and instructional strategy doesn't create risk, but brings the design conversation concerns that were there earlier. When data leaders help shape procurement, product choices, and learning goals from the start, institutions are less likely to identify structural problems after deployment.
There are three essential things that must be aligned: criteria for the success of the learner, acceptable data that can be collected, and unacceptable modes of failure. In the absence of consensus on all of these issues, there will be a technical delivery emphasis rather than building trust around learning outcomes. It’s imperative that product, pedagogy, and data leaders work collectively in lockstep fashion, rather than sequentially.
That is usually an execution gap, not a strategy gap. In cases like this, a partner such as Magic EdTech can help operationalize the work across content structure, data workflows, and evaluation design, while the institution or product team still owns the definition of "right." The core principle remains the same: outside support only helps if human judgment stays embedded in the system from the start.
Get In Touch
Reach out to our team with your question and our representatives will get back to you within 24 working hours.