Today, Gen-AI tools are used for various purposes, ranging from everyday tasks, such as summarizing texts, to high-level solutions tailored to a company's needs. Trustable and high-quality datasets are the most important component in building the models for all artificial intelligence-based solutions. In some specific areas, creating a large dataset manually can be challenging, so various techniques can be used to expand existing datasets. Therefore, in this research, the Gen-AI tools were used to augment the educational context text dataset that can be used to detect students who used generators to answer open-ended questions. An experimental investigation has been performed to evaluate the effectiveness of three Gen-AI tools in augmenting the existing dataset: OpenAI ChatGPT, Google Gemini, and Microsoft Copilot. During the augmentation process, the number of texts increased from 1079 to 7982. To find the efficiency of each Gen-AI tool or their combinations, the dataset has been divided into various subsets. All subsets were used to train several machine-learning algorithms. Additionally, the text has been processed into numerical data using two methods: bag-of-words and sBERT. A total of 15,296 models have been trained, tested, and evaluated. The results of the research have shown that text augmentation using Gen-AI tools increased the models' accuracy.
People are creative beings and have always sought ways to simplify their work. This phenomenon is mainly observed in academic fields, where students often use prepared cheat sheets during assessments. In addition to students, researchers can also forget about academic integrity. In any case, the work of others presented as their own is considered as plagiarism. The fast development of technology and the rise of artificial intelligence (AI) present concerns for academic integrity. Before the public availability of generative AI technologies (Gen-AI) in 2022, plagiarism appeared clearly defined and well understood in the academic context. Now, students can submit text created by large language models (LLMs) as their own work. This type of cheating is often referred to as artificial intelligence-generated plagiarism. Most of the plagiarism controversy stems from large language models that operate based on machine learning (ML) and natural language processing (NLP) principles.
Artificial intelligence technologies have become and remain one of the fastest developing innovations, changing various areas of society. AI technologies such as ML, deep learning (DL), and NLP are now widely used in a various field, including education. Gen-AI tools pose new challenges in the context of plagiarism in the academic world, and this topic has been actively studied by various researchers. Plagiarism is a serious academic and ethical violation and can have legal consequences. It is necessary to have clear guidelines or regulations defining how to use Gen-AI tools to avoid plagiarism and ensure the ethical use of these tools. This should include not only technical solutions to ensure the originality of generated content but also educational initiatives that promote responsible use of Gen-AI tools. In light of this, Higher Education Institutions (HEIs) should review their policies on academic integrity. Perking and Roe, in their research, have analyzed how the issue of academic integrity is addressed in HEI around the world in relation to students' use of Gen-AI tools. The authors emphasized the need for continuous training of academic staff to identify cases of academic violations related to the use of Gen-AI tools, as well as to educate students about the ethical use of such tools regularly. The ethical use of Gen-AI is one of the popular topics being studied. Ten key ethical issues have been identified and prioritized in the research by Bukar et al.. Such aspects as academic ethics; incorrect referencing and citation practices as well as plagiarism should be considered. Another study by Jain and Raghuram emphasizes the importance of integrating Gen-AI into educational and learning processes, considering both the advantages and challenges. Gen-AI has the potential to transform the learning process and improve the quality of education. However, its application must maintain an attentive balance between moral considerations and overreliance on technology.
The emergence of Gen-AI tools, such as OpenAI chatGPT or Google Gemini, has created new challenges in the academic world. Previously, plagiarism was most often carried out by submitting a document that included paragraphs from other sources without references. But now students can use LLM to get the text they want or to do the assignment they need. According to Nicolic et al., the main challenge of using this technology is determining the most effective method to evaluate student work. The authors investigated five different Gen-AI tools, including OpenAI chatGPT, Google Gemini, and Microsoft Copilot, to identify their differences in performance. Research has shown strengths and weaknesses for each tool, but the authors concluded that OpenAI ChatGPT performed better than others in various types of tasks. However, the rapid progress of OpenAI chatGPT and the growing influence of Microsoft Copilot or Google Gemini motivate scientists to carry out continuous research in this field. In addition to investigating how to include Gen-AI in the education process, studies related to detection of the usage of these tools are also performed. According to Chaka AI detection is not reliable and it is recommended to use a complex methodology: both modern AI detectors and traditional antiplagiarism detection tools, along with human reviewers, to distinguish between AI-generated and human-authored texts. Another study by Orenstrakh et al. investigated the effectiveness of eight publicly available LLM-generated text detectors. The authors used human-written and OpenAI chatGPT generated text and the results were described in various aspects. According to the study, the highest accuracy of 97.06% was shown by the CopyLeaks solution. However, false positive results have also been obtained raising some concerns related to the usage of these tools for plagiarism detection in academic papers. Many researchers have shown that the detection of the use of Gen-AI in education texts is a difficult task, so constantly new techniques should be investigated to improve it. Also, it is worth mentioning that the accuracy of Gen-AI usage detection decreases when using non-English language.
In our research, the Lithuanian educational context text data has been investigated. The morphology of the Lithuanian language is difficult, making the detection of Gen-AI usage even more complex. The usage of smaller language influences the limitations of dataset size too. Additional effort to extend the dataset or apply dataset augmentation methods are needed when higher model accuracy is needed. Gen-AI could be used to augment text data, however it is not clear how efficient it would be for Lithuanian educational context text data. Therefore, the main contributions of the paper are as follows:
1) The influence of Gen-AI tools on text data augmentation has been experimentally investigated. The various combinations of Gen-AI tools were analyzed, which helps to find each of their efficiency separately and to combine one. The most popular Gen-AI tools at this time were analyzed: OpenAI chatGPT, Google Gemini, and Microsoft Copilot.
2) The efficiency of Gen-AI tools to augment the dataset has been evaluated by training different machine learning algorithms: multilayer perceptron, random forest, gradient-boosted trees, k-nearest neighbors, decision trees, and naive Bayes. All models have been trained using the same conditions. The results obtained have shown not only the efficiency of Gen-AI tools in text augmentation, but also which machine learning model is most suitable for text generator usage detection for educational purposes.
3) During the experimental investigation, hyperparameter optimization has been used. In this way, a deep analysis was performed to find the most suitable parameters for each machine learning algorithm. Also, two different techniques to transform text data into numerical expressions have been used. A total of 15,296 models have been created and evaluated.
The research conducted is valuable in two ways: establishing the suitability of Gen-AI on text data augmentation and the suitability of machine learning models to detect usage of Gen-AI to answer open-ended questions.
The structure of the manuscript is as follows. In "Related works", related works are reviewed. In "The methodology of experimental research", the methodology of the research is presented. Also, the original dataset, the augmented dataset, and subsets of the new data set used in the experimental investigation have been described. The results of the experimental investigation have been presented in "Experimental investigation". In "Discussion", the discussion of the performed experimental investigation is presented, and the limitations and future works are presented. The last "Conclusions and future work" concludes the paper.