Several researchers have investigated the consequences of using ChatGPT in the education industry. Their findings raised doubts regarding the probable effects that ChatGPT may have on the academia. As such, the present study aimed to assess the ability of three methods, namely: (1) academicians (senior and young), (2) three AI detectors (GPT-2 output detector, Writefull GPT detector, and GPTZero) and (3) one plagiarism detector, to differentiate between human- and ChatGPT-written abstracts. A total of 160 abstracts were assessed by those three methods. Two senior and two young academicians used a newly developed rubric to assess the type and quality of 80 human-written and 80 ChatGPT-written abstracts. The results were statistically analysed using crosstabulation and chi-square analysis. Bivariate correlation and accuracy of the methods were assessed. The findings demonstrated that all the three methods made a different variety of incorrect assumptions. The level of the academician experience may play a role in the detection ability with senior academician 1 demonstrating superior accuracy. GPTZero AI and similarity detectors were very good at accurately identifying the abstracts origin. In terms of abstract type, every variable positively correlated, except in the case of similarity detectors (p < 0.05). Human-AI collaborations may significantly benefit the identification of the abstract origins.
The advent of artificial intelligence (AI)-based applications significantly affects the individuals as well as a plethora of organisations and societies. By combining linguistic and computer science models, AI aims to build computer models that can do tasks that would, otherwise, require human intelligence (HI). This includes learning, adapting, rationalising, understanding, and grasping abstract concepts, as well as being responsive to complex human traits, such as attentiveness, emotion, and innovation. As ChatGPT has gained immense popularity worldwide over the past year, it has led to widespread discussions of its implications. As ChatGPT is an AI-based language model, it has undergone extensive training on numerous text-based datasets from multiple languages. OpenAI, the developers of ChatGPT, describe it as a chatbot that uses the Generative Pre-trained Transformer (GPT) architecture to generate responses to user given text-based inputs. According to Brown et al., the GPT architecture analyses natural language using a neural network and generates replies based on the input text's context. As such, when given an input by a user, it can generate text-based responses that closely resemble that of a trained human.
The advent of ChatGPT was met with mixed reactions from the scientific and academic communities, and further prompted the long-standing debate on the potential advantages and disadvantages of adopting cutting-edge AI-based technologies. ChatGPT is excellent for various conversational and written tasks as it increases the speed and quality of the produced work. However, many users have raised concerns over the possibility of bias due to the sets of data that were used to train ChatGPT. This is because, if bias exists, it may hinder its performance and provide erroneous answers that appear to be scientifically accurate. The challenge of distinguishing between human- and AI-written content has also sparked several concerns in professional and education-related communities and renewed discussion on the importance of content written using HI. Therefore, the controversy surrounding ChatGPT was inevitable. Nevertheless, the possibility that it could produce factually erroneous content, as well as the ethical aspect of using and abusing AI-based technologies to produce content warrant careful consideration, especially since the produced content could cause misinformation in healthcare practices and academic publications.
Furthermore, the effects of ChatGPT extend beyond the realm of academic educational-related activities. Previous scholarly publication has named ChatGPT in its publication as a "contributing" author. However, many experts believe that AI does not satisfy all the requirements for authorship.
HI has several advantages over AI, namely, biological evolution, flexibility, creativity, emotional intelligence, and the capability to comprehend abstract ideas. However, it could prove beneficial to combine HI and AI, solely if the latter's output can be guaranteed to be both accurate and dependable. To date, no study has examined the effects of an academician's level of experience on their ability to identify the origin of scientific abstracts, nor has any study compared this aspect with the use of AI detection tools.
When using ChatGPT, or any AI model, for academic purposes, it is crucial to be precautious, particularly with regards to the ethical and societal consequences. This is because the internal operations of AI models lack transparency. Therefore, it is essential to recognize that AI models are opaque AI tools that yield outputs in response to user-inputted queries. As such, the accuracy of the outputs cannot be guaranteed.
The consideration of factual inaccuracies, ethical concerns, and the possibility of misuse, particularly the spread of false information, are crucial in both healthcare practice and academic writing. These risks can be mitigated by ensuring awareness of these possibilities and using appropriate tools to distinguish between human- and AI-written manuscripts.
Evaluating the detection capabilities of academicians against AI tools can determine if human expertise has a distinct advantage or if AI excels in recognizing its own outputs. Human judgment depends on experience, intuition, and contextual understanding, while AI detection tools utilize statistical patterns and probabilities. Comprehending their strengths and weaknesses can enhance detection strategies. While previous studies have explored human ability to differentiate AI-generated and authentic abstracts, few have examined the influence of experience level on content detection accuracy. Investigating whether senior academicians, with their extensive experience, outperform junior counterparts, or if both groups struggle equally, offers valuable insights into how experience level interacts with AI-generated text comprehension. Thus, the present study aimed to examine the ability and accuracy of four blinded human academicians of different experience levels to differentiate between and evaluate the quality of human- and AI-written content in conjunction with three AI output detectors and a plagiarism detector.