A year nine student walks into class full of ideas, but when it is time to contribute, the tools around them do not listen. Their speech is difficult for standard voice systems to recognise, typing feels slow and exhausting, and the lesson moves on without their voice being heard. The challenge is not a lack of ability but a lack of access.
Across the world, millions of learners face communication barriers. Some live with apraxia of speech or dysarthria, others with limited mobility, hearing differences, or neurodiverse needs. When speaking, writing, or pointing is unreliable or tiring, participation becomes limited, feedback is lost, and confidence slowly erodes. This is not a rare exception but an everyday reality in classrooms.
These barriers appear in very practical ways. Students are skipped or misunderstood when they cannot respond quickly. Their ability is under-measured because their means of expression are constrained. Teachers struggle to maintain the pace of lessons while making individual accommodations. Peers interact less often, reducing opportunities for social belonging.
Assistive technologies have helped over the years, with tools like text-to-speech, symbol boards, and simple gesture inputs. Yet most of these tools are designed for a single mode of interaction. They assume the learner will either speak, or type, or tap. Real communication, however, is fluid. Learners naturally combine gestures, partial speech, symbols, and context to share meaning, especially when fatigue, anxiety, or motor challenges come into play.
This is where modern AI changes the picture. We are beginning to move beyond single-solution tools into multimodal systems that can understand speech, even when it is disordered, interpret gestures and visual symbols, combine signals to infer intent, and adapt in real time as the learner's abilities develop or change.
AI is reshaping accessibility in education by shifting from isolated tools to multimodal and adaptive systems. These systems combine gesture, speech, and intelligent feedback to meet learners where they are, while also supporting their growth over time.
In this article, we will explore what this shift looks like in practice, how it can unlock participation, and how adaptive feedback personalises support and we will also build a hands-on multimodal demo that turns these ideas into a classroom-ready tool.
The past few years have shown how AI can make classrooms more inclusive when we focus on accessibility. Developers, educators, and researchers are already experimenting with tools that bridge communication gaps.
In my first freeCodeCamp tutorial, I built a gesture-to-text translator using MediaPipe. This project demonstrated how computer vision can track hand movements and convert them into text in real time. For learners who rely on gestures, this kind of system can provide a bridge to participation.
Here is a simplified example of how MediaPipe detects hand landmarks:
This small piece of code shows how MediaPipe processes a video frame and extracts hand landmarks. From there, you can classify gestures and map them to text.
π You can explore the full project on GitHub or read the complete tutorial on freeCodeCamp.
In another freeCodeCamp article, I demonstrated how to build AI accessibility tools with Python, such as speech recognition and text-to-speech. These projects provided readers with a foundation for building their own inclusive tools, and you can find the full source code in the repository.
Beyond these individual projects, the wider field has also made significant progress. Advances in sign language recognition have improved accuracy in capturing complex hand shapes and movements. Text-to-speech systems have become more natural and adaptive, giving users voices that sound closer to human speech. Mobile and desktop accessibility apps have brought these capabilities into everyday classrooms.
These achievements are encouraging, but they remain limited. Most of today's tools are still designed for a single mode of communication. A system may work for gestures, or for speech, or for text, but not all of them together.
The next step is clear: we need multimodal, adaptive AI tools that can blend gestures, speech, and feedback into unified systems. This is where the most exciting opportunities in accessibility lie, and it is where we will turn next.
Figure 1: Comparison of isolated single-modality systems with unified multimodal AI systems.
One of my first projects in this area focused on translating Makaton into English.
Makaton is a language programme that uses signs and symbols to support people with speech and language difficulties. It is widely used in classrooms where learners may not rely fully on speech. The challenge is that while a learner communicates in Makaton, their teachers and peers often work in English, which creates a communication gap.
The system followed a clear pipeline:
Camera Input β Hand Landmark Detection β Gesture Classification β English Translation Output
Figure 2: AI workflow for translating Makaton gestures into English.
Here is a simplified version of how this workflow might look in code:
This is a simplified example, but it shows the core idea: map gestures to meaning and then bridge that meaning into English.
Imagine a student signing thank you in Makaton and the system instantly displaying the words on screen. Teachers can check understanding, peers can respond naturally, and the learner's contribution becomes visible to everyone.
The key takeaway is that AI can bridge symbol and gesture based languages with mainstream spoken and written communication. Instead of forcing learners to adapt to rigid systems, we can design systems that adapt to the way they already communicate.
Another project I worked on is called AURA, the Apraxia of Speech Adaptive Understanding and Relearning Assistant. The idea was to design a system that not only recognises speech but also supports learners with speech disorders by detecting errors, adapting feedback, and offering multimodal alternatives.
Most commercial speech recognition systems fail when a person's speech does not follow typical patterns. This is especially true for people with apraxia of speech, where motor planning difficulties make pronunciation inconsistent. The result is frequent misrecognition, frustration, and exclusion from tools that rely on voice input.
The AURA prototype used a layered architecture:
Speech Input β Wav2Vec2 (fine-tuned for disordered speech) β CNN + BiLSTM Error Detection β Reinforcement Learning Feedback β Multimodal Output (Speech + Gesture)
Figure 3: Workflow of the AURA prototype, combining speech, error detection, adaptive feedback, and multimodal outputs.
Here's a simplified view of how an error detection module could be structured:
This snippet shows the heart of the error detection pipeline: combining CNN layers for feature extraction with BiLSTMs for sequence modeling. The model can flag articulation errors, which then guide the feedback loop.
With AURA, the goal was not just to recognise what someone said, but to help them communicate more effectively. The prototype adapted in real time offering corrective feedback, suggesting gestures, or switching modes when speech became difficult.
The takeaway is that AI can evolve from being a passive recognition tool into an active partner in learning and communication.
The two projects we explored, translating Makaton into English and building the AURA prototype highlight a much larger transformation underway. Accessibility technology is moving away from isolated, single-purpose applications toward multimodal platforms that bring together speech, gestures, text, and adaptive AI into one seamless system.
The benefits of this shift are profound:
The impact of multimodal accessibility extends across many sectors:
This demo combines both use cases: a Makaton to English classroom tool and the AURA assistive speech path. It prioritizes gesture when a sign is detected, falls back to speech when it isn't, and produces a unified English output (with optional text-to-speech). We'll focus on the translation layer, multimodal fusion, and a simple Streamlit UI.
The structure provided above outlines the organization of a project directory for a multimodal Makaton to English translator demo using Streamlit. Here's a brief explanation of each component:
The code snippet above provides instructions for creating and activating a Python virtual environment, which is a self-contained directory that contains a Python installation for a particular version of Python, plus several additional packages.
The command above is used to install multiple Python packages at once. Here's what each package does:
The code above creates a Streamlit application that combines gesture recognition and speech recognition to translate Makaton signs into English. Here's a brief explanation of how it works:
This application demonstrates a multimodal approach by integrating both gesture and speech recognition to facilitate communication for users who rely on Makaton.
The command above is used to launch a Streamlit application. When executed, it starts a local web server and opens the specified Python script in a web browser, allowing you to interact with the app's user interface. This command is typically run in a terminal or command prompt.
Figure -- App interface: the Simulated Sign tab before any input.
You have developed a multimodal translator that integrates both gesture recognition (specifically Makaton signs) and speech recognition to produce a unified English output. The system is designed to prioritize gesture input, using speech as a fallback when gestures are not detected.
User Interface
The application is built using Streamlit, featuring two main tabs:
While the promise of multimodal accessibility tools is exciting, building them responsibly requires us to confront several challenges. These are not only technical problems but also ethical ones that affect how learners, teachers, and communities experience AI.
Training AI systems requires large, diverse datasets. But when it comes to disordered speech or symbol systems like Makaton, the data is limited. Without enough examples, models risk being inaccurate or biased toward a narrow group of users. Collecting more data is essential, but it must be done ethically, with consent and respect for the communities involved.
AI systems often work better for some groups than others. A model trained mostly on fluent English speakers may fail to recognise learners with strong accents or speech difficulties. Similarly, gesture recognition may not account for differences in motor ability. Fairness means designing models that work across abilities, accents, and cultures, so that no group is excluded by design.
Speech and video data are highly sensitive, especially when collected in schools. Protecting this data is not optional, it is a requirement. Systems must anonymize or encrypt recordings and store them securely. Transparency is also key: learners, parents, and teachers should know exactly how data is being used and who has access to it.
Ironically, many "accessibility tools" remain inaccessible because they are expensive, require powerful hardware, or are too complex to use. For AI to truly reduce barriers, solutions must be affordable, lightweight, and easy for teachers to set up in real classrooms, not just in research labs.
These challenges remind us that accessibility in AI is not only a technical question but also an ethical and social responsibility. To build tools that genuinely help learners, we need collaboration between developers, educators, policymakers, and the communities who will use the systems.
The future of AI accessibility tools is speculative, but the possibilities are both exciting and necessary. What we have now are prototypes and early systems. What lies ahead are tools that could reshape how classrooms and society more broadly approach communication and inclusion.
One promising direction is the ability to translate Makaton across multiple languages. A learner in the UK could sign in Makaton and see their contribution appear not just in English but in French, Spanish, or Yoruba. This would open up international classrooms and give learners access to global opportunities that are often closed off by language barriers.
Imagine a classroom assistant powered by AI that adapts in real time. If a learner struggles with speech, it could switch to gesture recognition. If gestures become tiring, it could prompt the learner with symbol-based options. These AI tutors would not only support communication but also guide learning, adapting to each student's strengths and challenges over time.
The rise of lightweight hardware makes it possible to imagine wearable AI assistants that provide instant translation and support. Glasses could capture gestures and overlay text, while earbuds could translate disordered speech into clear audio for peers and teachers. Instead of bulky setups, accessibility would become portable, personal, and ever-present.
These innovations go beyond technology alone. They align with the United Nations Sustainable Development Goals (SDGs) especially:
The journey from single-modality tools to multimodal, adaptive systems is still in its early stages. But if we continue to push forward with creativity, ethics, and inclusivity at the center, AI accessibility tools will not only change classrooms they will change lives.
AI accessibility tools are no longer just optional add-ons for a few learners. They are becoming core enablers of inclusion in education, healthcare, workplaces, and daily life.
The journey from early gesture recognition systems to multimodal, adaptive prototypes like Makaton translation and AURA shows what is possible when technology is designed around people rather than forcing people to adapt to technology. These innovations break down communication barriers and open up new opportunities for learners who have too often been left on the margins.
But the future of accessibility is not automatic. It depends on choices we make now as developers, educators, researchers, and policymakers. Building tools that are open, ethical, and affordable requires collaboration and commitment.
The vision is clear: a world where every learner, regardless of ability, can express themselves fully, be understood by others, and participate with confidence.