When I started my research on AI systems that could translate Makaton (a sign and symbol language designed to support speech and communication), I wanted to bridge a gap in accessibility for learners with speech or language difficulties.
Over time, this academic interest evolved into a working prototype that combines on-device AI and cloud AI to describe images and translate them into English meanings. The idea was simple: I wanted to build a lightweight web app that recognized Makaton gestures or symbols and instantly provided an English interpretation.
In this article, I'll walk you through how I built my Makaton AI Companion, a single-page web app powered by Gemini Nano (on-device) and the Gemini API (cloud). You'll see how it works, how I solved common issues like CORS and API model errors, and how this small project became part of my journey toward AI for accessibility.
By the end of this article, you will be able to:
To build the Makaton AI Companion, I wanted something lightweight, fast to prototype, and easy for anyone to run without complicated dependencies. I chose a plain web stack with a focus on accessibility and transparency.
Now let's dive into how the Makaton AI Companion works under the hood. This project follows a simple but effective flow: Upload an image → Describe (AI) → Map to Meaning → Speak or Copy the result
We'll go through each part step by step.
You don't need any complex setup. Just create a new folder and add these files:
If you prefer a ready-to-run version, you can serve everything from one zip (I'll share a GitHub link at the end).
Your file defines the interface where users upload an image, click Describe, and view the results.
This interface is intentionally minimal: no frameworks, no build tools, just clear HTML.
The file holds a simple keyword-based dictionary. When the AI describes an image (like "a raised open hand"), the app searches for keywords that match known Makaton signs.
It's simple but effective enough to simulate real symbol-to-language translation for demo purposes.
The file connects to Gemini Nano (on-device) or the Gemini API (cloud). If Nano isn't available, the app falls back to the cloud model. And if that fails, it lets users type a description manually.
Note: This retry system is essential because many users encounter 404 model errors due to the unavailability of certain Gemini versions in every account.
This script ties everything together: file upload, AI call, meaning mapping, and output display.
Let's break down the main sections of the script for the Makaton AI Companion, as there's a lot going on here:
This script effectively ties together the user interface, file handling, AI processing, and output display, providing a seamless experience for translating Makaton signs into English meanings.
While working on this project, I started appreciating how computer vision and language understanding complement each other in multimodal systems like this one.
This realization reshaped how I think about accessibility: the best assistive technologies often emerge not from smarter models alone, but from the interaction between modalities like seeing, describing, and reasoning in context.
To make the app more accessible, I added speech output and a quick copy button:
This gives users both visual and auditory feedback, especially helpful for learners or educators.
No AI or web integration project runs smoothly the first time - and that's okay. Here's a breakdown of the main issues I faced while building the Makaton AI Companion, how I diagnosed them, and how I fixed each one.
These lessons will help anyone trying to integrate Gemini APIs, on-device AI, or local web apps without a full backend.
That single step fixed all the CORS errors and allowed my modules to load correctly.
The next big challenge came from the Gemini API. Even though I had a valid API key, my console showed this error:
It turns out Google's API endpoints can vary slightly depending on your project setup and key permissions.
✅ Fix: I rewrote my script to automatically try multiple Gemini model endpoints until it found one that worked. Something like this:
And I wrapped it in a loop that stopped once one endpoint succeeded.
Later, I improved it further by listing available models dynamically using
and automatically trying whichever ones supported image generation.
That dynamic discovery approach fixed the 404 errors permanently.
Once I got everything working, I wanted a version that others could test easily without installing Node.js or running build tools.
Everything runs locally in the browser, no server-side code required. This also makes it perfect for demos, classrooms, and so on.
Another subtle issue appeared when I noticed this red message:
That line told me exactly what was wrong: my import and export function names didn't match. The fix was straightforward:
Using the browser console as my debugging dashboard turned out to be the most powerful tool of all. Every fix started by reading and reasoning about those red error lines.
Let's see the Makaton AI Companion in action and understand what's happening under the hood.
Once you've downloaded or cloned the project folder, open your terminal in that directory and start a local development server: . Then open your browser and visit:
You should see the Makaton AI Companion interface:
This means your key is stored safely in localStorage and is only accessible from your browser.
If you're using Chrome Canary, you can run Gemini Nano locally without internet access. This allows the Makaton AI Companion to generate text even when the API key isn't set.
Visit the official Chrome Canary download page and install it on your Windows or macOS system. Chrome Canary is a special version of Chrome designed for developers and early adopters, offering the latest features and updates.
Open Chrome Canary and type in the address bar.
Locate the "Prompt API for Gemini Nano" flag in the list. Set this flag to Enabled. This action allows Chrome Canary to support the Gemini Nano model for on-device AI processing.
After enabling the flag, relaunch Chrome Canary to apply the changes.
Open a new tab in Chrome Canary and enter in the address bar.
Scroll down to find the "Optimization Guide" component. Click on Check for update. This action will initiate the download of the Gemini Nano model, which is necessary for running AI tasks locally without an internet connection.
Once the Gemini Nano model is installed, the Makaton AI Companion app will automatically detect it. You should see a message indicating that the app is using on-device AI: "No API key found. Using on-device AI (text) for best guess..."
This confirmation means that the app can now generate text descriptions using the Gemini Nano model without needing an API key or internet access.
By following these detailed steps, you ensure that the Gemini Nano model is correctly set up and ready to use for on-device AI processing in the Makaton AI Companion.
Click Choose File to upload any Makaton image (for example, the "help" sign), then press Describe (Cloud or Nano). You'll immediately see console logs confirming that the app is running correctly and connecting to the Gemini API:
When no mapping is found:
The AI description is accurate but doesn't yet match a known Makaton keyword.
This demonstrates how accessible, AI-assisted tools can support communication for people who rely on Makaton. Even when a gesture isn't recognized, the system provides a structured output and allows users or educators to expand the mapping list making the tool smarter over time.
Building this project turned out to be much more than a coding exercise for me.
It was a meaningful experiment in combining accessibility, natural language processing, and computer vision. These three fields, when brought together, can create real social impact.
While working on it, I began to understand how computer vision and language understanding complement each other in practice. The vision model perceives the world by identifying shapes, gestures, and spatial patterns, while the language model interprets what those visuals mean in human terms.
In this project, the artificial intelligence system first sees the Makaton sign, then describes it, and finally maps it to an English word that carries intent and meaning.
This interaction between perception and semantics is what makes multimodal artificial intelligence so powerful. It is not only about recognizing an image or generating text; it is about building systems that connect understanding across different forms of information to make technology more inclusive and human centered.
This realization changed how I think about accessibility technology. True innovation happens not only through smarter models but through the harmony between seeing and understanding, between what an artificial intelligence system observes and how it communicates that observation to help people.
Working on this project reminded me that accessibility isn't just about compliance or assistive devices. It's also about inclusion. A simple AI system that can describe a hand gesture or symbol in real time can empower teachers, parents, and students who communicate using Makaton or similar systems.
By mapping AI-generated descriptions to meaningful phrases, the app demonstrates how AI can support inclusive education, even at small scales. It bridges the communication gap between verbal and nonverbal learners, which is something that traditional translation systems often overlook.
On the technical side, this project showed me how naturally computer vision and language understanding complement each other. The Gemini API's multimodal models were able to analyze an image and produce coherent natural-language sentences, something that older APIs couldn't do without chaining multiple tools.
By feeding that output into a lightweight NLP mapping function, I was able to simulate a very early-stage symbol-to-language translator the core of my broader research interest in automatic Makaton-to-English translation.
While the cloud models are powerful, experimenting with Gemini Nano revealed something exciting:
on-device AI can make accessibility tools faster, safer, and more private.
In classrooms or therapy sessions, you often can't rely on stable internet connections or share sensitive student data. Running inference locally means learners' gestures or symbol images never leave the device, a crucial step toward privacy-preserving accessibility AI.
And since Nano runs directly inside Chrome Canary, it shows how AI is becoming embedded at the browser level, lowering barriers for teachers and developers to build inclusive solutions without needing large infrastructure.
This prototype is just a starting point. Future iterations could integrate gesture recognition directly from camera input, support multiple symbol sets, or even learn from user feedback to expand the dictionary automatically.
Most importantly, it reinforces a central belief in my research and teaching journey:
Accessibility innovation doesn't require massive systems. It starts with curiosity, empathy, and a few lines of purposeful code.
Building the Makaton AI Companion has been one of the most rewarding projects in my AI journey - not just because it worked, but because it proved how accessible innovation can be.
With just a browser, a few lines of JavaScript, and the right API, I was able to combine computer vision, language understanding, and accessibility design into a working system that translates symbols into meaning. It's a small step toward a future where anyone, regardless of speech or language ability, can be understood through technology.
The project also reinforced something deeply personal to me as a researcher and educator: that AI for accessibility doesn't need to be complex, expensive, or centralized. It can be lightweight, open, and built with empathy by anyone who's willing to learn and experiment.
If this project inspires you, I'd love to see your own experiments and improvements. Can you make it support live webcam gestures? Could you adapt it for other symbol systems, like PECS or BSL?
Share your ideas in the comments or tag me if you publish your own version. Together, we can grow a small prototype into a community-driven accessibility tool and continue exploring how AI can give more people a voice.