Thomas Edison, way back before computers, artificial intelligence (AI), streaming, and Marvel movies, once said, "There is no expedient to which a man will not go to avoid the real labor of thinking."
I know that probably 99.99% of you are here because you think this article will tell you how to bypass content checkers. I know that because when I did a search on "How do AI checkers work?" I didn't get technical explanations about how the technology works, Instead, I got hundreds of YouTube videos from people who can't be bothered to do their own writing, showing others how to cheat using AI.
Also: I tested 7 AI content detectors - they're getting dramatically better at identifying plagiarism
This process is called "humanizing" text, and cheaters use it to sprinkle a little bit of special sauce into the cold and heartless verbiage generated by the great Landru in the sky.
But for those who actually care about the technology, let's discuss the actual meat and potatoes of how an AI checker works.
AI checkers today use a variety of techniques, starting with text analysis. As with all prompt queries, the submitted text is broken down into tokens and then normalized, removing punctuation and other non-essential indicators. They then use a technique called vectorization, which converts the text into a mathematical hash code for comparison to other text.
Also: Grammarly to roll out a new AI content detector tool. Here's how it works
Of course, all this normalization could remove clues of reprehensible human behavior, like using two spaces after a period or misusing the Oxford comma. But fortunately, AI checkers have more tools up their virtual sleeves -- or, as in Landru's case, under their holographic togas.
See that? AIs aren't going to generate tangentially relevant callbacks to earlier bits of their shtick (as I just did with holographic togas). AI content checkers use contextual awareness to examine the context in which various phrases are used in order to identify common phrases and assign them weight. Uncommon contextual connections, like some of those above, will be rated with more weight as human-written.
This also applies to semantic analysis of text, where AI checkers attempt to understand the meaning of text rather than just examine sequences of words. This allows them to balance contextual awareness with an understanding of what the writer is trying to say.
Also: How does ChatGPT actually work?
One reason that the OpenAI content detector (once it's re-released) is expected to perform so well is that it will be able to run summaries of the meaning of a submitted piece against the entire ChatGPT knowledge base, in order to see if the output of the suspect text shows a degree of similarity to what ChatGPT itself would have produced.
From an algorithmic perspective, content checkers may use n-grams, which are sequences of words, to extract context and meaning. Grammatical structure can also be examined to find patterns that reflect content written by an AI.
Then there's the comparison process, where whatever text is being checked gets compared to the entire internet. This can be done using traditional search algorithms, which look for exact matches, paraphrased text, and even fuzzy matches (near matches, synonyms, and rephrased content).
OpenAI also has an advantage here. Given that ChatGPT was trained on pretty much all accessible human knowledge, innuendo, fiction, and any database that stood still long enough, comparisons for similar text or text with similar base n-grams can be added to the content scoring process.
This challenge would favor OpenAI, Google, Microsoft, Meta, and other data-rich players in the AI field.
Also: Beware of AI 'model collapse': How training on synthetic data pollutes the next generation
Other content checkers probably don't have as vast a database for comparison. As a quick test, I tried a number of paragraphs copied from various articles I read today (some old, some new) and dumped them into plagiarism checkers. Almost all of them failed to note that text I copied and pasted from other sites was, in fact, copied and pasted from other sites. So, clearly, for content checkers to be able to use online content comparison, they have to have a big enough pool to pull comparison data from.
Once the comparison is done, most content checkers will provide some kind of report to the user. Ideally, this would be more than just a numerical score or the phrase "likely human written." Ideally, it would show the areas of the document the content checker deemed suspect so that evaluators can look further into aspects of the content that may have been generated by an AI.
Understand, though, that AI checkers are improving. When I first did my test of AI checkers in early 2023, they all mostly failed to differentiate human from AI-generated text. But, by mid-2024, about half of them got it right. So even the technology I'm talking about here will change over time.
That's especially true because this is an arms race. As AI checkers get better, some AI services will sprinkle in human foibles and styles to help the cheaters cheat.
Also: The best AI chatbots of 2024: ChatGPT, Copilot, and worthy alternatives
Then, AI checkers will get better and look for that.
Then, the AI cheating services will add more techniques.
And on it goes.
What about you? Are you a teacher or an editor trying to make sure the submission you got was written by the person who submitted it? Or are you a troublesome little cheater trying to find more ways to get out of work and create fake content? In either case, what has your experience been with AI content checkers? Let us know in the comments below.