Curated by THEOUTPOST
On Wed, 7 May, 12:08 AM UTC
3 Sources
[1]
AI may speed up the grading process for teachers
Grading can be a time-consuming task for many teachers. Artificial intelligence tools may help ease the strain, according to a new study from the University of Georgia published in Technology, Knowledge and Learning. Many states have adopted the Next Generation Science Standards, which emphasize the importance of argumentation, investigation and data analysis. But teachers following the curriculum face challenges when it's time to grade students' work. "Asking kids to draw a model, to write an explanation, to argue with each other are very complex tasks," said Xiaoming Zhai, corresponding author of the study and an associate professor and director of AI4STEM Education Center in UGA's Mary Frances Early College of Education. "Teachers often don't have enough time to score all the students' responses, which means students will not be able to receive timely feedback." AI is fast but bases grading off shortcuts The study explored how large language models grade students' work compared to humans. LLMs are a type of AI that are trained using a large amount of information, usually from the internet. They use that data to "understand" and generate human language. For the study, the LLM Mixtral was presented with written responses from middle school students. One question asked students to create a model showing what happens to particles when heat energy is transferred to them. A correct answer would indicate that molecules move slower when cold and faster when hot. Mixtral then constructed rubrics to assess student performance and assign final scores. The researchers found that LLMs could grade responses quickly, but they often used shortcuts like spotting certain keywords and assuming that a student understands a topic. This, in turn, lowered its accuracy when assessing students' grasp of the material. The study suggests that LLMs could be improved by providing them with rubrics that show the deep, analytical thought humans use when grading. These rubrics should include specific rules on what the grader is looking for in a student's response. The LLM could then evaluate the answer based on the rules the human set. "The train has left the station, but it has just left the station," said Zhai. "It means we still have a long way to go when it comes to using AI, and we still need to figure out which direction to go in." LLMs and human graders differ in their scoring process Traditionally, LLMs are given both the students' answers and the human grader's scores to train them. In this study, however, LLMs were instructed to generate their own rubric to evaluate student responses. The researchers found that the rubrics generated by LLMs had some similarities with those made by humans. LLMs generally understand what the question is asking of students, but they don't have the ability to reason like humans do. Instead, LLMs rely mostly on shortcuts, such as what Zhai referred to as "over-inferring." This is when an LLM assumes a student understands something when a human teacher wouldn't. For example, LLMs will mark a student's response as correct if it includes certain keywords but can't evaluate the logic the student is using. "Students could mention a temperature increase, and the large language model interprets that all students understand the particles are moving faster when temperatures rise," said Zhai. "But based upon the student writing, as a human, we're not able to infer whether the students know whether the particles will move faster or not." LLMs are especially reliant on shortcuts when presented with examples of graded responses without explanations of why certain papers are assigned the grades they were given. Humans still have a role in automated scoring Despite the speed of LLMs, the researchers warn against replacing human graders completely. Human-made rubrics often have a set of rules that reflect what the instructor expects of student responses. Without such rubrics, LLMs only have a 33.5% accuracy rate. When the AI has access to human-made rubrics, that accuracy rate jumps to just over 50%. If the accuracy of LLMs can be improved further, though, educators may be open to using the technology to streamline their grading processes. "Many teachers told me, 'I had to spend my weekend giving feedback, but by using automatic scoring, I do not have to do that. Now, I have more time to focus on more meaningful work instead of some labor-intensive work,'" said Zhai. "That's very encouraging for me."
[2]
Grading with AI: Faster than teachers, but not smarter - Earth.com
Grading student work can take up countless hours for teachers, especially when assignments call for deep thinking, explanations, or scientific modeling. However, a new study suggests that artificial intelligence (AI) could help ease that burden - but only if used carefully and alongside human input. The research was led by Xiaoming Zhai, associate professor and director of the AI4STEM Education Center at University of Georgia's Mary Frances Early College of Education. The study explores how well Large Language Models (LLMs) can assess student work compared to human graders. "Asking kids to draw a model, to write an explanation, to argue with each other are very complex tasks," Zhai said. "Teachers often don't have enough time to score all the students' responses, which means students will not be able to receive timely feedback." The study focused on middle school students' responses to science questions aligned with the Next Generation Science Standards. One question, for example, asked students to create a model showing how particles behave when heat energy is added. The correct answer would explain that molecules speed up as they heat and slow down when cooled. The research team fed student answers into an LLM called Mixtral and asked it to grade them. But unlike most AI grading studies, where the AI is trained using examples of human‑scored answers, this study took a different approach. Here, the LLM had to create its own grading rubric and apply it to student work. The researchers found that Mixtral could grade responses very quickly. However, it tended to rely on shortcuts, such as looking for specific keywords, rather than assessing the actual depth of the students' understanding. "Students could mention a temperature increase, and the large language model interprets that all students understand the particles are moving faster when temperatures rise," Zhai explained. "But based upon the student writing, as a human, we're not able to infer whether the students know whether the particles will move faster or not." In other words, the AI might give points to a student simply for mentioning the right terms, even if the reasoning behind the answer is unclear or incorrect. The study suggests that LLMs need better guidelines to match human grading standards. Specifically, AI models perform better when they use detailed rubrics created by teachers, which outline exactly what to look for in a good response. Without these rubrics, the AI only reached about 33.5% accuracy when compared to human grading. With access to human‑created rubrics, that accuracy jumped to just over 50%. "The train has left the station, but it has just left the station," Zhai said. "It means we still have a long way to go when it comes to using AI, and we still need to figure out which direction to go in." One key difference between human graders and LLMs is how they handle complex or incomplete answers. According to the researchers, while LLMs will mark a student's response as correct if it includes certain keywords,. However, it cannot evaluate the logic the student is using. This happens because LLMs tend to "over‑infer," assuming a student understands a concept based on surface clues. Human teachers, though, look for evidence of clear thinking and accurate reasoning. Without explanations for why certain answers earned specific grades, the AI lacks the context to make fine‑tuned decisions. Despite these limitations, many teachers are interested in using AI tools to speed up routine grading. "Many teachers told me, 'I had to spend my weekend giving feedback, but by using automatic scoring, I do not have to do that. Now, I have more time to focus on more meaningful work instead of some labor‑intensive work,'" Zhai said. "That's very encouraging for me." Rather than fully replacing human graders, the researchers suggest that AI systems should serve as assistants. This way, it frees teachers to concentrate on tasks that require human judgment, creativity, and connection. The research highlights both the promise and the challenges of using AI in classrooms. While current LLMs can process large batches of student work quickly, they still need better instructions, oversight, and refinement to deliver meaningful feedback. Teachers remain essential for guiding the use of these tools, setting expectations, and ensuring fairness. As AI technologies continue to evolve, the hope is that future models will become more adept at understanding not just keywords but the quality of student reasoning. With thoughtful design and human‑AI collaboration, these tools could become powerful allies in helping teachers support student learning - without sacrificing weekends to piles of ungraded papers. The study is published in the journal Technology, Knowledge and Learning. -- - Like what you read? Subscribe to our newsletter for engaging articles, exclusive content, and the latest updates.
[3]
AI May Speed Up the Grading Process for Teachers | Newswise
Newswise -- Grading can be a time-consuming task for many teachers. Artificial intelligence tools may help ease the strain, according to a new study from the University of Georgia. Many states have adopted the Next Generation Science Standards, which emphasize the importance of argumentation, investigation and data analysis. But teachers following the curriculum face challenges when it's time to grade students' work. "Asking kids to draw a model, to write an explanation, to argue with each other are very complex tasks," said Xiaoming Zhai, corresponding author of the study and an associate professor and director of AI4STEM Education Center in UGA's Mary Frances Early College of Education. "Teachers often don't have enough time to score all the students' responses, which means students will not be able to receive timely feedback." The study explored how Large Language Models grade students' work compared to humans. LLMs are a type of AI that are trained using a large amount of information, usually from the internet. They use that data to "understand" and generate human language. For the study, the LLM Mixtral was presented with written responses from middle school students. One question asked students to create a model showing what happens to particles when heat energy is transferred to them. A correct answer would indicate that molecules move slower when cold and faster when hot. Mixtral then constructed rubrics to assess student performance and assign final scores. "We still have a long way to go when it comes to using AI, and we still need to figure out which direction to go in." -- Xiaoming Zhai, College of Education The researchers found that LLMs could grade responses quickly, but they often used shortcuts like spotting certain keywords and assuming that a student understands a topic. This, in turn, lowered its accuracy when assessing students' grasp of the material. The study suggests that LLMs could be improved by providing them with rubrics that show the deep, analytical thought humans use when grading. These rubrics should include specific rules on what the grader is looking for in a student's response. The LLM could then evaluate the answer based on the rules the human set. "The train has left the station, but it has just left the station," said Zhai. "It means we still have a long way to go when it comes to using AI, and we still need to figure out which direction to go in." Traditionally, LLMs are given both the students' answers and the human grader's scores to train them. In this study, however, LLMs were instructed to generate their own rubric to evaluate student responses. The researchers found that the rubrics generated by LLMs had some similarities with those made by humans. LLMs generally understand what the question is asking of students, but they don't have the ability to reason like humans do. Instead, LLMs rely mostly on shortcuts, such as what Zhai referred to as "over-inferring." This is when an LLM assumes a student understands something when a human teacher wouldn't. For example, LLMs will mark a student's response as correct if it includes certain keywords but can't evaluate the logic the student is using. "Students could mention a temperature increase, and the large language model interprets that all students understand the particles are moving faster when temperatures rise," said Zhai. "But based upon the student writing, as a human, we're not able to infer whether the students know whether the particles will move faster or not." LLMs are especially reliant on shortcuts when presented with examples of graded responses without explanations of why certain papers are assigned the grades they were given. Despite the speed of LLMs, the researchers warn against replacing human graders completely. Human-made rubrics often have a set of rules that reflect what the instructor expects of student responses. Without such rubrics, LLMs only have a 33.5% accuracy rate. When the AI has access to human-made rubrics, that accuracy rate jumps to just over 50%. If the accuracy of LLMs can be improved further, though, educators may be open to using the technology to streamline their grading processes. "Many teachers told me, 'I had to spend my weekend giving feedback, but by using automatic scoring, I do not have to do that. Now, I have more time to focus on more meaningful work instead of some labor-intensive work,'" said Zhai. "That's very encouraging for me." The study was published in Technology, Knowledge and Learning and was co-authored by Xuansheng Wu, Padmaja Pravin Saraf, Gyeonggeon Lee, Eshan Latif and Ninghao Liu.
Share
Share
Copy Link
A new study from the University of Georgia explores the potential of AI in grading student work, highlighting both its speed and limitations in understanding complex responses.
A recent study from the University of Georgia, published in Technology, Knowledge and Learning, explores the potential of Artificial Intelligence (AI) to streamline the grading process for teachers. The research, led by Xiaoming Zhai, associate professor and director of the AI4STEM Education Center, investigates how Large Language Models (LLMs) compare to human graders when assessing student work 1.
With many states adopting the Next Generation Science Standards, which emphasize argumentation, investigation, and data analysis, teachers face increasing challenges in grading complex student responses. "Asking kids to draw a model, to write an explanation, to argue with each other are very complex tasks," Zhai explains. "Teachers often don't have enough time to score all the students' responses, which means students will not be able to receive timely feedback" 2.
The study utilized Mixtral, an LLM, to grade middle school students' written responses to science questions. Unlike traditional AI grading studies, this research required the LLM to create its own grading rubric and apply it to student work 3.
While the AI demonstrated impressive speed in grading responses, it often relied on shortcuts that compromised accuracy:
The researchers found that providing LLMs with human-made rubrics significantly improved their performance:
Zhai suggests that future improvements could come from providing LLMs with rubrics that reflect the deep, analytical thought processes used by human graders 1.
Despite the current limitations, many teachers express interest in using AI tools to speed up routine grading tasks. Zhai notes, "Many teachers told me, 'I had to spend my weekend giving feedback, but by using automatic scoring, I do not have to do that. Now, I have more time to focus on more meaningful work instead of some labor-intensive work'" 3.
However, the researchers caution against completely replacing human graders. They suggest that AI systems should serve as assistants, freeing teachers to concentrate on tasks that require human judgment, creativity, and connection 2.
While the study demonstrates the potential of AI in education, it also highlights the need for continued development. "The train has left the station, but it has just left the station," Zhai metaphorically states. "It means we still have a long way to go when it comes to using AI, and we still need to figure out which direction to go in" 1.
As AI technologies evolve, the hope is that future models will become more adept at understanding not just keywords but the quality of student reasoning, potentially becoming powerful allies in supporting student learning and teacher efficiency.
Reference
[1]
A paradox emerges in schools as teachers increasingly use AI tools for various tasks while attempting to restrict student access, raising ethical questions and concerns about the future of education.
2 Sources
2 Sources
Quizlet's latest report reveals a shift in AI adoption trends in education, with a slowdown in pace but an increase in intentional and strategic implementation. The study highlights both the benefits and challenges of AI integration in learning environments.
2 Sources
2 Sources
A high school math teacher in California embraces AI tools in his classroom, sparking discussions about the potential benefits and ethical concerns of AI integration in education.
2 Sources
2 Sources
A new study from the University of East Anglia compares essays written by students and ChatGPT, finding that while AI-generated essays are coherent, they lack the personal touch and engagement strategies of human-written work.
6 Sources
6 Sources
AI education company Kira introduces a comprehensive AI-powered learning platform for K-12 schools, aiming to streamline teaching processes and enhance student outcomes.
3 Sources
3 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved