Analysis of Final Exam Essay Answer Accuracy: The Role of Artificial Intelligence in Automatic Assessment

Rony Kriswibowo, Johan Suryo Prayogo, Rusina Widha Febriana, Agung Budi Setyawan, Yeeryzkhe Githasari Lieztyanto, Putri Ariatna Alia

Abstract


The issues investigated in this research include the accuracy of artificial intelligence (AI) scoring of essay answers, the challenges of validity and fairness in automated scoring, as well as potential biases in the algorithms used, and how these affect academic assessment results compared to human scoring. The method used in this study consists of several systematic steps. First, data was collected by asking relevant questions to an artificial intelligence (AI) platform. After that, the questions are inputted into AI platforms such as Copilot, Gemini, and Blackbox AI. Next, the resulting answers are analyzed using machine learning algorithms and natural language processing (NLP) to ensure the quality and depth of analysis. The answer results are then tested to evaluate their accuracy and relevance to the context of the question asked. the analysis results are visualized in the form of graphs or tables. this research shows that artificial intelligence (AI) has great potential in the automated assessment of essay answers. The results showed that Copilot got the highest score of 63.4%, Gemini 50.5% and Blackbox.AI 56.9%. However, the answer accuracy results between each AI Platform are still below 70%, meaning that the answers from the AI platform are still far from expectations. The next research recommendation is to retest the AI Platform using other methods to get accurate and consistent results.

Full Text:

PDF

References


J. Zerilli, A. Knott, J. Maclaurin, and C. Gavaghan, “Algorithmic Decision-Making and the Control Problem,” Minds Mach (Dordr), vol. 29, no. 4, pp. 555–578, Dec. 2019, doi: 10.1007/S11023-019-09513-7/FIGURES/1.

A. Munde and J. Kaur, “The Metaverse: A New Frontier for Learning and Teaching from the Perspective of AI,” Studies in Computational Intelligence, vol. 1128, pp. 101–119, 2023, doi: 10.1007/978-3-031-48397-4_6.

E. Cetinic and J. She, “Understanding and Creating Art with AI: Review and Outlook,” ACM Transactions on Multimedia Computing, Communications and Applications, vol. 18, no. 2, May 2022, doi: 10.1145/3475799.

H. Niemi, “AI in learning,” https://doi.org/10.1177/18344909211038105, vol. 15, Aug. 2021, doi: 10.1177/18344909211038105.

K. R. Chowdhary, “Natural Language Processing,” Fundamentals of Artificial Intelligence, pp. 603–649, 2020, doi: 10.1007/978-81-322-3972-7_19.

Y. Chen, N. Fan, and H. Wu, “AI-Enabled Metaverse for Education: Challenges and Opportunities,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , vol. 15429 LNCS, pp. 103–113, 2025, doi: 10.1007/978-3-031-76977-1_8.

W. A. Abro, A. Aicher, N. Rach, S. Ultes, W. Minker, and G. Qi, “Natural language understanding for argumentative dialogue systems in the opinion building domain,” Knowl Based Syst, vol. 242, Apr. 2022, doi: 10.1016/J.KNOSYS.2022.108318.

X. Dong et al., “Let’s Talk about AI: Talking about AI is Positively Associated with AI Crafting,” Asia Pacific Journal of Management, pp. 1–32, Jul. 2024, doi: 10.1007/S10490-024-09975-Z/TABLES/6.

M. Abdullah, A. Madain, and Y. Jararweh, “ChatGPT: Fundamentals, Applications and Social Impacts,” 2022 9th International Conference on Social Networks Analysis, Management and Security, SNAMS 2022, 2022, doi: 10.1109/SNAMS58071.2022.10062688.

A. Hagerty and I. Rubinov, “Global AI Ethics: A Review of the Social Impacts and Ethical Implications of Artificial Intelligence,” Jul. 2019, Accessed: Dec. 24, 2024. [Online]. Available: http://arxiv.org/abs/1907.07892

A. R. Vargas-Murillo, I. N. M. de la Asuncion Pari-Bedoya, and F. de Jesús Guevara-Soto, “Challenges and Opportunities of AI-Assisted Learning: A Systematic Literature Review on the Impact of ChatGPT Usage in Higher Education,” International Journal of Learning, Teaching and Educational Research, vol. 22, no. 7, pp. 122–135, Jul. 2023, doi: 10.26803/IJLTER.22.7.7.

T. Bin Arif, U. Munaf, and I. Ul-Haque, “The future of medical education and research: Is ChatGPT a blessing or blight in disguise?,” Med Educ Online, vol. 28, no. 1, 2023, doi: 10.1080/10872981.2023.2181052.

S. F. Ahmad et al., “Impact of artificial intelligence on human loss in decision making, laziness and safety in education,” Humanit Soc Sci Commun, vol. 10, no. 1, pp. 1–14, Jun. 2023, doi: 10.1057/s41599-023-01842-4.

V. Benuyenah, “Commentary: ChatGPT use in higher education assessment: Prospects and epistemic threats,” Journal of Research in Innovative Teaching and Learning, vol. 16, no. 1, pp. 134–135, Mar. 2023, doi: 10.1108/JRIT-03-2023-097.

M. Sallam and K. Al-Salahat, “Below average ChatGPT performance in medical microbiology exam compared to university students,” Front Educ (Lausanne), vol. 8, p. 1333415, Dec. 2023, doi: 10.3389/FEDUC.2023.1333415/BIBTEX.

L. Chen, P. Chen, and Z. Lin, “Artificial Intelligence in Education: A Review,” IEEE Access, vol. 8, pp. 75264–75278, 2020, doi: 10.1109/ACCESS.2020.2988510.

S. Paek and N. Kim, “Analysis of Worldwide Research Trends on the Impact of Artificial Intelligence in Education,” Sustainability 2021, Vol. 13, Page 7941, vol. 13, no. 14, p. 7941, Jul. 2021, doi: 10.3390/SU13147941.

A. Botelho, S. Baral, J. A. Erickson, P. Benachamardi, and N. T. Heffernan, “Leveraging natural language processing to support automated assessment and feedback for student open responses in mathematics,” J Comput Assist Learn, vol. 39, no. 3, pp. 823–840, Jun. 2023, doi: 10.1111/JCAL.12793.

M. M. Rahman and F. Akter, “An Automated Approach for Answer Script Evaluation Using Natural Language Processing,” vol. 9, pp. 39–47, 2019, [Online]. Available: www.ijcset.net

V. U. Thompson, C. Panchev, and M. Oakes, “Performance evaluation of similarity measures on similar and dissimilar text retrieval,” in 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K), 2015, pp. 577–584.

D. Chauhan, C. Singh, R. Rawat, and M. Dhawan, “Evaluating the Performance of Conversational AI Tools,” Conversational Artificial Intelligence, pp. 385–409, Feb. 2024, doi: 10.1002/9781394200801.CH24.

Y. Özkan, “A Systematic Review of AI-Based Mobile Learning Environments : Unveiling Trends and Future Directions Tarık Kışla,” Journal of Computer Education, vol. 3, no. June, pp. 1–24, 2024, [Online]. Available: https://www.journalofcomputereducation.info/archieve/vol3_1/JCE_3_1_1_pdf.pdf

E. Mupaikwa, “The Use of Artificial Intelligence in Education: Applications, Challenges, and the Way Forward,” https://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/978-1-6684-8671-9.ch002, pp. 26–50, Jan. 1AD, doi: 10.4018/978-1-6684-8671-9.CH002.

“The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power - Book - Faculty & Research - Harvard Business School.” Accessed: Dec. 26, 2024. [Online]. Available: https://www.hbs.edu/faculty/Pages/item.aspx?num=56791

M. Abu-Haifa, B. Etawi, H. Alkhatatbeh, and A. Ababneh, “Comparative Analysis of ChatGPT, GPT-4, and Microsoft Copilot Chatbots for GRE Test,” International Journal of Learning, Teaching and Educational Research, vol. 23, no. 6, pp. 327–347, Jun. 2024, doi: 10.26803/IJLTER.23.6.15.

H. Crompton and D. Burke, “Artificial intelligence in higher education: the state of the field,” International Journal of Educational Technology in Higher Education, vol. 20, no. 1, Dec. 2023, doi: 10.1186/S41239-023-00392-8.

D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural language processing: state of the art, current trends and challenges,” Multimed Tools Appl, vol. 82, no. 3, pp. 3713–3744, Jan. 2023, doi: 10.1007/S11042-022-13428-4.

M. Nayak, “How is the Artificial Intelligence of Today’s Time, ChatGPT and Blackbox.ai, Helpful in Machine Learning?,” Sep. 2024, doi: 10.2139/SSRN.4996043.

H. Karacali, E. Cebel, and N. Donum, “Performance Comparison of AI Platforms in Solving Computer Science Problems,” The Eurasia Proceedings of Science Technology Engineering and Mathematics, vol. 28, pp. 326–341, Aug. 2024, doi: 10.55549/EPSTEM.1521963.

Y. Qianyi, “Systematic Evaluation of AI-Generated Python Code: A Comparative Study across Progressive Programming Tasks,” Sep. 2024, doi: 10.21203/RS.3.RS-4955982/V1.

A. K. Putri and D. I. Nur, “Penggunaan Bahasa Python untuk Analisis dan Visualisasi Data Penduduk di Desa Sumberjo, Nganjuk,” KARYA: Jurnal Pengabdian Kepada Masyarakat, vol. 3, no. 3, pp. 206–217, Dec. 2023, Accessed: Dec. 26, 2024. [Online]. Available: https://jurnalfkip.samawa-university.ac.id/KARYA_JPM/article/view/588

S. Junaidi, M. Devegi, and H. Kurniawan, “Pelatihan Pengolahan dan Visualisasi Data Penduduk menggunakan Python,” ADMA : Jurnal Pengabdian dan Pemberdayaan Masyarakat, vol. 4, no. 1, pp. 151–162, Jul. 2023, doi: 10.30812/ADMA.V4I1.2963.

S. Mujilahwati, I. Lamongan Jl Veteran, and A. Lamongan -Jawa Timur, “Visualisasi Data Hasil Klasifikasi Naïve Bayes Dengan Matplotlib Pada Python,” Prosiding Sains Nasional dan Teknologi, vol. 11, no. 1, Nov. 2021, doi: 10.36499/PSNST.V1I1.5164.

R. Kriswibowo, P. A. Alia, J. S. Prayogo, and R. W. Febriana, “Implementation of Text Processing Techniques on Citizen Opinions Regarding Floods in Surabaya,” ELECTRON Jurnal Ilmiah Teknik Elektro, vol. 5, no. 1, pp. 30–36, May 2024, doi: 10.33019/ELECTRON.V5I1.148.

Moh. B. Tamam, A. Anwari, and H. Hozairi, “Visualisasi Data Penyebaran Covid 19 di Indonesia dan Malaysia,” Jurnal Simantec, vol. 11, no. 1, pp. 13–18, Dec. 2022, Accessed: Dec. 26, 2024. [Online]. Available: https://journal.trunojoyo.ac.id/simantec/article/view/14252

N. Singh, Z. Zhang, X. Wu, N. Zhang, S. Zhang, and E. Solomonik, “Distributed-memory tensor completion for generalized loss functions in python using new sparse tensor kernels,” J Parallel Distrib Comput, vol. 169, pp. 269–285, Nov. 2022, doi: 10.1016/J.JPDC.2022.07.005.

R. Al Ghivary, M. Mawar, N. Wulandari, N. Srikandi, and A. N. M. F, “PERAN VISUALISASI DATA UNTUK MENUNJANG ANALISA DATA KEPENDUDUKAN DI INDONESIA,” PENTAHELIX, vol. 1, no. 1, p. 57, Feb. 2023, doi: 10.24853/PENTA.1.1.57-62.

R. Kaestria et al., “Penerapan Matplotlib dalam Visualisasi Data untuk Analisis Hubungan Penggunaan Gadget dan Hasil Belajar,” Journal of Digital Business and Information Technology, vol. 1, no. 1, pp. 29–39, Jun. 2024, doi: 10.23971/JOBIT.V1I1.204.




DOI: https://doi.org/10.29040/ijcis.v5i4.204

Article Metrics

Abstract view : 74 times
PDF - 27 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License