The last couple of months has been absolutely crazy, all thanks to the release of ChatGPT. Now, in saying that, AI tools that can write text/content have existed for a long time, and having tried out a good chunk of them myself - I immediately knew that ChatGPT had taken it to another level.
Unlike other tools that struggle with creating basic lists or structuring cohesive sentences - ChatGPT can apply logic and reasoning to its writing (output). It genuinely feels like a conversational experience. Not only that, but ChatGPT can also write code, create formulas for Excel, and "impersonate" certain people or situations when given the correct commands.
All of this is great, but it also means that in the coming years - there is going to be a lot of content on the Web that has either been written by ChatGPT (and any upcoming models that can compete with it) or in some way altered/modified by it.
Why bother with detecting AI content in the first place?
Tools like ChatGPT and similar AI assistants are not perfect. Ted Chiang from The New Yorker did an excellent write-up on why these tools are threading murky waters where the information they output can neither be verified nor truly trusted.
But there are other reasons, too. Namely, you can never be one hundred percent sure whether or not the information is valid because (at least for the time being) these tools do not provide sources. Search engines like Google and Bing will likely mention sources for their own interpretations of these assistants.
And then there are areas like education. There have already been reports of schools within the United States and elsewhere in the world banning ChatGPT. In this context, it goes a long way for educators to be able to check student submissions and see whether or not the text is genuine.
This brings me to my next point.
Is it even possible to detect AI-generated text reliably?
Yes and no. If you use an AI writing tool and ask it to write an article about "Marketing tips for small business owners" - the output of the said prompt is going to be so generic that any tool built to detect AI text should be able to detect immediately.
And I have seen this firsthand with ChatGPT, too; the model is ultimately limited by the information it has access to, so a lot of the time, the text that comes out of it is easily recognizable (because it's repetitive).
So, let's do an experiment.
The text I used for testing these tools.
In order to do an actual live test of these AI detection tools, I prepared this prompt:
Write me a detailed summary about the principles of agile methodology.
And here is the response from ChatGPT:
This response is going to be the basis of our review.
So, let's take a look at how well AI detection tools can handle it.
AI Text Classifier (OpenAI)
The first tool on our list is the AI Text Classifier from OpenAI (the company behind ChatGPT). In their own words,
The AI Text Classifier is a fine-tuned GPT model that predicts how likely it is that a piece of text was generated by AI from a variety of sources, such as ChatGPT.
You need to have an OpenAI account to access this tool, and it's straightforward to use. On the page, you have a text submission form where you can enter your text and click Submit to analyze it. Here is the result from our response:
In this case, the answer is, "The classifier considers the text to be possibly AI-generated.".
OpenAI labels each analysis as either very unlikely, unlikely, unclear if it is, possibly, or likely. In our case, we know that the text we provided was AI-generated, but despite that, the classifier has doubts and marked it as possibly.
I went ahead and entered the first couple of paragraphs I wrote for this article, and this time the response was unlikely. But not "very unlikely".
The next tool I tried is called GPT Radar. I've seen high praise for this tool across various sources, but to my surprise - it did not actually succeed in catching our response as AI-generated and instead marked it as Likely Human Generated.
It could also be that ChatGPT is simply that good! I mean, perhaps that is exactly why there has been so much buzz about it. In this tool's defense, it did attribute a perplexity score of 13 (which is more of an indicator that content is written with AI), because human-generated content will have much higher scores than this.
The first 5 lookups with this tool are free, and any subsequent requests are priced at $0.02 per credit, which is enough to check 120 or so words.
Draft & Goal
Draft & Goal is a GPT content detector built specifically for detecting ChatGPT content. And it did an excellent job marking our example response as 100% AI content.
And yes, I tested with the text I have written in this article, to which it responded, "Based on our Analysis your test has been most likely written by a Human.".
I'm not exactly who the DNG group is that's behind this tool, but for the time being - it is free to use and is capable of checking both English and French texts.
The GPTZero tool comes with a few interesting features. For one, rather than saying flat out "yes or no" - it highlights the parts which the algorithm/model thinks are generated with AI.
Also, GPTZero also provides an analysis of perplexity and burstiness scores.
By the looks of it, this tool is built specifically to be used by educators to check students' work, and they also make it absolutely clear on their homepage that the results should be taken with a grain of salt.
When providing the tool with actual human-written text, it accurately marked it as, "Your text is likely to be written entirely by a human".
Detecting AI Text: Summary & conclusion
Google recently updated its guidelines for content that has been altered and/or primarily generated with AI tools. Their stance is that they're okay with AI content so as long as it is not done with the primary purpose of manipulating search engine results. E.g. Doing it only for SEO purposes.
I think there can be a lot of great uses for these AI content tools, but I can also see how it might end up creating even more problems when people start to get too comfortable with using these tools to write blogs, articles, or publish news.
We're in the very early stages of this, and we have yet to see the full extent of these types of assistants being integrated into search engines and other software.
And as far as detection goes, as we saw throughout testing the various detection tools - it can be hit-and-miss. Ultimately, the goal would be that any text you put inside these tools would always return that the content is written by a human.
If we look at it from this point of view, then I can see how these tools could be useful, especially if you know for a fact that the text was written using AI.