Büyük Dil Modelleri Kullanan Otomatik Madde Üretimi Yöntemlerinin İncelenmesi
Yıl 2025,
Cilt: 12 Sayı: 2, 317 - 340, 01.06.2025
Bin Tan
,
Nour Armoush
,
Elisabetta Mazzullo
Okan Bulut
,
Mark Gierl
Öz
Bu çalışma, otomatik madde üretimi (AIG) için büyük dil modelleri (LLM) kullanımına ilişkin mevcut araştırmaları gözden geçirmektedir. Yedi araştırma veritabanında kapsamlı bir literatür taraması yaptık, önceden tanımlanmış kriterlere göre çalışmalar seçtik ve AIG sürecinde LLM'leri kullanan 60 ilgili çalışmayı özetledik. Mevcut AIG literatüründe en sık kullanılan LLM'leri, AIG sürecindeki özel uygulamalarını ve üretilen maddelerin özelliklerini belirledik. LLM'lerin farklı diller ve konu alanları arasında çeşitli madde türleri üretmede esnek ve etkili olduğunu bulduk. Ancak, birçok çalışma üretilen maddelerin kalitesini göz ardı etti ve bu da sağlam bir eğitim temelinin eksikliğini gösteriyor. Bu nedenle, AIG'de LLM'leri değerlendirmek için eğitim temellerini geliştirmek üzere iki öneri paylaşıyoruz ve LLM'lerin faydasını ve potansiyelini kullanmak için disiplinler arası işbirliklerini savunuyoruz.
Kaynakça
- Ackerman, R., & Balyan, R. (2023). Automatic multilingual question generation for health data using LLMs. In International Conference on AI-generated Content (pp. 1-11). Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-99-7587-7_1
- Agrawal, A., & Shukla, P. (2023). Context aware automatic subjective and objective question generation using Fast Text to text transfer learning. International Journal of Advanced Computer Science and Applications, 14(4), 456-463.
- Aigo, K., Tsunakawa, T., Nishida, M., & Nishimura, M. (2021). Question generation using knowledge graphs with the T5 language model and masked self-attention. In 2021 IEEE 10th Global Conference on Consumer Electronics (pp. 85 87). IEEE. https://doi.org/10.1109/GCCE53005.2021.9621874
- Akyön, F.Ç., Cavusoglu, A.D.E., Cengiz, C., Altinuç, S.O., & Temizel, A. (2022). Automated question generation and question answering from Turkish texts. Turkish Journal of Electrical Engineering and Computer Sciences, 30(5), 1931 1940. https://doi.org/10.55730/1300-0632.3914
- Alsubait, T., Parsia, B., & Sattler, U. (2016). Ontology-based multiple choice question generation. KI-Künstliche Intelligenz, 30, 183-188. https://doi.org/10.1007/s13218-015-0405-9
- Alves, C.B., Gierl, M.J., & Lai, H. (2010, April). Using automated item generation to promote test design and development [Paper presentation]. American Educational Research Association Annual Meeting, Denver, CO, United States.
- Arksey, H., & O'malley, L. (2005). Scoping studies: towards a methodological framework. International Journal of Social Research Methodology, 8(1), 19 32. https://doi.org/10.1080/1364557032000119616
- Attali, Y., Runge, A., LaFlair, G.T., Yancey, K., Goodwin, S., Park, Y., & Von Davier, A.A. (2022). The interactive reading task: Transformer-based automatic item generation. Frontiers in Artificial Intelligence, 5, 903077. https://doi.org/10.3389/frai.2022.903077
- Berger, G., Rischewski, T., Chiruzzo, L., & Rosá, A. (2022). Generation of English question answer exercises from texts using transformers-based models. In 2022 IEEE Latin American Conference on Computational Intelligence (pp. 1-5). IEEE. https://doi.org/10.1109/LA-CCI54402.2022.9981171
- Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7-74. https://doi.org/10.1080/0969595980050102
- Bulathwela, S., Muse, H., & Yilmaz, E. (2023). Scalable educational question generation with pre-trained language models. In International Conference on Artificial Intelligence in Education (pp. 327-339). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-36272-9_27
- Bulut, O., & Yildirim-Erbasli, S.N. (2022). Automatic story and item generation for reading comprehension assessments with transformers. International Journal of Assessment Tools in Education, 9(Special Issue), 72-87. https://doi.org/10.21449/ijate.1124382
- Bulut, O., Gorgun, G., Yildirim‐Erbasli, S.N., Wongvorachan, T., Daniels, L.M., Gao, Y., ... & Shin, J. (2023). Standing on the shoulders of giants: Online formative assessments as the foundation for predictive learning analytics models. British Journal of Educational Technology, 54(1), 19-39. https://doi.org/10.1111/bjet.13276
- Ch, D.R., & Saha, S.K. (2018). Automatic multiple choice question generation from text: A survey. IEEE Transactions on Learning Technologies, 13(1), 14 25. https://doi.org/10.1109/TLT.2018.2889100
- Chiang, S.H., Wang, S.C., & Fan, Y.C. (2024). Cdgp: Automatic cloze distractor generation based on pre trained language model. arXiv preprint arXiv:2403.10326. https://doi.org/10.18653/v1/2022.findings-emnlp.429
- Chughtai, R., Azam, F., Anwar, M.W., But, W.H., & Farooq, M.U. (2022). A lecture centric automated distractor generation for post-graduate software engineering courses. In 2022 International Conference on Frontiers of Information Technology (FIT) (pp. 100-105). IEEE. https://doi.org/10.1109/FIT57066.2022.00028
- Chung, H.L., Chan, Y.H., & Fan, Y.C. (2020). A BERT-based distractor generation scheme with multi tasking and negative answer training strategies. arXiv preprint arXiv:2010.05384. https://arxiv.org/abs/2010.05384
- Dalby, D., & Swan, M. (2019). Using digital technology to enhance formative assessment in mathematics classrooms. British Journal of Educational Technology, 50(2), 832-845. https://doi.org/10.1111/bjet.12606
- Dembitzer, L., Zelikovitz, S., & Kettler, R.J. (2017). Designing computer-based assessments: Multidisciplinary findings and student perspectives. International Journal of Educational Technology, 4(3), 20 31. https://educationaltechnology.net/ijet/index.php/ijet/article/view/47
- Desai, T. (2021). Discourse parsing and its application to question generation [Unpublished dissertation]. The University of Texas at Dallas.
- Devlin, J., Chang, M.W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805
- Dijkstra, R., Genç, Z., Kayal, S., & Kamps, J. (2022). Reading comprehension quiz generation using generative pre-trained transformers. In S. Sosnovsky, P. Brusilovsky, & A. Lan (Eds.), Proceedings of the Fourth International Workshop on Intelligent Textbooks 2022 (pp. 4–7). CEUR-WS. http://ceur-ws.org/Vol-3192/
- Drori, I., Zhang, S., Shuttleworth, R., Tang, L., Lu, A., Ke, E., ... & Strang, G. (2022). A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level. Proceedings of the National Academy of Sciences, 119(32), e2123433119. https://doi.org/10.1073/pnas.2123433119
- Falcão, F., Costa, P., & Pêgo, J.M. (2022). Feasibility assurance: a review of automatic item generation in medical assessment. Advances in Health Sciences Education, 27(2), 405-425. https://doi.org/10.1007/s10459-022-10092-z
- Femi, J.G., & Nayak, A.K. (2022). EQGTL: An Ensemble Model for Relevant Question Generation using Transfer Learning. In 2022 International Conference on Machine Learning, Computer Systems and Security (pp. 253-258). IEEE. https://doi.org/10.1109/MLCSS57186.2022.00054
- Fuadi, M., & Wibawa, A.D. (2022). Automatic question generation from indonesian texts using text-to-text transformers. In 2022 International Conference on Electrical and Information Technology (IEIT) (pp. 84-89). IEEE. https://doi.org/10.1109/IEIT56384.2022.9967858
- Fung, Y.C., Kwok, J.C.W., Lee, L.K., Chui, K.T., & U, L.H. (2020). Automatic question generation system for english reading comprehension. In Technology in Education. Innovations for Online Teaching and Learning: 5th International Conference, ICTE 2020, Macau, China, August 19-22, 2020, Revised Selected Papers 5 (pp. 136-146). Springer Singapore. https://doi.org/10.1007/978-981-33-4594-2_12
- Fung, Y.C., Lee, L.K., & Chui, K.T. (2023). An automatic question generator for Chinese comprehension. Inventions, 8(1), 31. https://doi.org/10.3390/inventions8010031
- Ghanem, B., Coleman, L.L., Dexter, J.R., von der Ohe, S.M., & Fyshe, A. (2022). Question generation for reading comprehension assessment by modeling how and what to ask. arXiv preprint arXiv:2204.02908. https://doi.org/10.48550/arXiv.2204.02908
- Gierl, M.J., & Lai, H. (2012). The role of item models in automatic item generation. International Journal of Testing, 12(3), 273 298. https://doi.org/10.1080/15305058.2011.635830
- Gierl, M.J., & Lai, H. (2015). Automatic item generation. In Handbook of test development (pp. 410-429). Routledge.
- Gierl, M.J., & Lai, H. (2016). A process for reviewing and evaluating generated test items. Educational Measurement: Issues and Practice, 35(4), 6 20. https://doi.org/10.1111/emip.12129
- Gierl, M.J., Lai, H., & Tanygin, V. (2021). Advanced methods in automatic item generation. Routledge.
- Godslove, J.F., & Nayak, A.K. (2023). Generative model for formulating relevant questions and answers using transfer learning. In AIP Conference Proceedings (Vol. 2819, No. 1). AIP Publishing. https://doi.org/10.1063/5.0136892
- Gopal, A. (2022). Automatic question generation for Hindi and Marathi. In 2022 International Conference on Advanced Learning Technologies (ICALT) (pp. 19-21). IEEE. https://doi.org/10.1109/ICALT55010.2022.00012
- Goyal, R., Kumar, P., & Singh, V.P. (2023). Automated question and answer generation from texts using text-to-text transformers. Arabian Journal for Science and Engineering, 1-15. https://doi.org/10.1007/s13369-023-07840-7
- Granić, A. (2022). Educational technology adoption: A systematic review. Education and Information Technologies, 27(7), 9725-9744. https://doi.org/10.1007/s10639-022-10951-7
- Grover, K., Kaur, K., Tiwari, K., Rupali, & Kumar, P. (2021). Deep learning based question generation using t5 transformer. In Advanced Computing: 10th International Conference, IACC 2020, Panaji, Goa, India, December 5–6, 2020, Revised Selected Papers, Part I 10 (pp. 243-255). Springer Singapore. https://doi.org/10.1007/978-981-16-0401-0_18
- Han, Z. (2022). Unsupervised multilingual distractor generation for fill-in-the-blank questions [Unpublished thesis]. Uppsala University.
- Jiao, Y., Shridhar, K., Cui, P., Zhou, W., & Sachan, M. (2023). Automatic educational question generation with difficulty level controls. In International Conference on Artificial Intelligence in Education (pp. 476-488). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-36272-9_39
- Kalpakchi, D., & Boye, J. (2021). BERT-based distractor generation for Swedish reading comprehension questions using a small-scale dataset. arXiv preprint arXiv:2108.03973. https://doi.org/10.48550/arXiv.2108.03973
- Kasakowskij, R., Kasakowskij, T., & Seidel, N. (2022). Generation of multiple true false questions. 20. Fachtagung Bildungstechnologien. https://doi.org/10.18420/delfi2022-026
- Khandait, K., Bhura, S., & Asole, S.S. (2022). Automatic question generation through word vector synchronization using lamma. Indian Journal of Computer Science and Engineering, 13(4), 1083-1095. https://doi.org/10.21817/indjcse/2022/v13i4/221304046
- Kosh, A.E., Simpson, M.A., Bickel, L., Kellogg, M., & Sanford‐Moore, E. (2019). A cost–benefit analysis of automatic item generation. Educational Measurement: Issues and Practice, 38(1), 48-53. https://doi.org/10.1111/emip.12237
- Kumar, A., Kharadi, A., Singh, D., & Kumari, M. (2021). Automatic question-answer pair generation using deep learning. In 2021 Third International Conference on Inventive Research in Computing Applications (pp. 794 799). IEEE. https://doi.org/10.1109/ICIRCA51532.2021.9544654
- Kumar, N.S., Mali, R., Ratnam, A., Kurpad, V., & Magapu, H. (2022). Identification and addressal of knowledge gaps in students. In 2022 3rd International Conference for Emerging Technology (pp. 1-6). IEEE. https://doi.org/10.1109/INCET54531.2022.9824483
- Kumar, S., Chauhan, A., & Kumar C, P. (2022). Learning enhancement using question-answer generation for e-book using contrastive fine-tuned T5. In International Conference on Big Data Analytics (pp. 68 87). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-24094-2_5
- Kumari, V., Keshari, S., Sharma, Y., & Goel, L. (2022). Context-based question answering system with suggested questions. In 2022 12th International Conference on Cloud Computing, Data Science & Engineering (pp. 368 373). IEEE. https://doi.org/10.1109/Confluence52989.2022.9734207
- Kuo, C.Y., & Wu, H.K. (2013). Toward an integrated model for designing assessment systems: An analysis of the current status of computer-based assessments in science. Computers & Education, 68, 388-403. https://doi.org/10.1016/j.compedu.2013.06.002
- Kurdi, G., Leo, J., Parsia, B., Sattler, U., & Al-Emari, S. (2020). A systematic review of automatic question generation for educational purposes. International Journal of Artificial Intelligence in Education, 30, 121-204. https://doi.org/10.1007/s40593-019-00186-y
- Lai, H., Alves, C., & Gierl, M.J. (2009). Using automatic item generation to address item demands for CAT. In D.J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. www.psych.umn.edu/psylabs/CATCentral/
- Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2019). Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942. https://doi.org/10.48550/arXiv.1909.11942
- Lee, H., Chung, H.Q., Zhang, Y., Abedi, J., & Warschauer, M. (2020). The effectiveness and features of formative assessment in US K-12 education: A systematic review. Applied Measurement in Education, 33(2), 124 140. https://doi.org/10.1080/08957347.2020.1732383
- Lim, Y.S. (2019). Students’ perception of formative assessment as an instructional tool in medical education. Medical Science Educator, 29(1), 255 263. https://doi.org/10.1007/s40670-018-00687-w
- Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. https://doi.org/10.48550/arXiv.1907.11692
- Lu, O.H.T., Huang, A.Y.Q., Tsai, D.C.L., & Yang, S.J.H. (2021). Expert-authored and machine-generated short- answer questions for assessing students’ learning performance. Educational Technology & Society, 24(3), 159–173. https://www.jstor.org/stable/27032863
- Maheen, F., Asif, M., Ahmad, H., Ahmad, S., Alturise, F., Asiry, O., & Ghadi, Y.Y. (2022). Automatic computer science domain multiple-choice questions generation based on informative sentences. PeerJ Computer Science, 8, e1010. https://doi.org/10.7717/peerj-cs.1010
- Malhar, A., Sawant, P., Chhadva, Y., & Kurhade, S. (2022). Deep learning-based Answering Questions using T5 and Structured Question Generation System. In 2022 6th International Conference on Intelligent Computing and Control Systems (pp. 1544-1549). IEEE. https://doi.org/10.1109/ICICCS53718.2022.9788264
- Mathur, A., & Suchithra, M. (2022). Application of abstractive summarization in multiple choice question generation. In 2022 International Conference on Computational Intelligence and Sustainable Engineering Solutions (pp. 409 413). IEEE. https://doi.org/10.1109/CISES54857.2022.9844396
- Matsumori, S., Okuoka, K., Shibata, R., Inoue, M., Fukuchi, Y., & Imai, M. (2023). Mask and cloze: Automatic open cloze question generation using a masked language model. IEEE Access, 11, 9835-9850. https://doi.org/10.1109/ACCESS.2023.3239005
- Maurya, K.K., & Desarkar, M.S. (2020). Learning to distract: A hierarchical multi-decoder network for automated generation of long distractors for multiple-choice questions for reading comprehension. In Proceedings of the 29th ACM international conference on information & knowledge management (pp. 1115 1124). https://doi.org/10.1145/3340531.3411997
- Mazzullo, E., Bulut, O., Wongvorachan, T., & Tan, B. (2023). Learning Analytics in the Era of Large Language Models. Analytics, 2(4), 877 898. https://doi.org/10.3390/analytics2040046
- Min, B., Ross, H., Sulem, E., Veyseh, A.P.B., Nguyen, T.H., Sainz, O., ... & Roth, D. (2023). Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys, 56(2), 1-40. https://doi.org/10.1145/3605943
- Muse, H., Bulathwela, S., & Yilmaz, E. (2022). Pre-training with scientific text improves educational question generation. arXiv preprint arXiv:2212.03869. https://doi.org/10.48550/arXiv.2212.03869
- Newton, P.E. (2007). Clarifying the purposes of educational assessment. Assessment in education, 14(2), 149-170. https://doi.org/10.1080/09695940701478321
- Nguyen, H.A., Bhat, S., Moore, S., Bier, N., & Stamper, J. (2022). Towards generalized methods for automatic question generation in educational domains. In European conference on technology enhanced learning (pp. 272-284). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-16290-9_20
- Nittala, S., Agarwal, P., Vishnu, R., & Shanbhag, S. (2022). Speaker Diarization and BERT-Based Model for Question Set Generation from Video Lectures. In Information and Communication Technology for Competitive Strategies ICT: Applications and Social Interfaces (pp. 441 452). Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-19-0095-2_42
- Offerijns, J., Verberne, S., & Verhoef, T. (2020). Better distractions: Transformer-based distractor generation and multiple-choice question filtering. arXiv preprint arXiv:2010.09598. https://doi.org/10.48550/arXiv.2010.09598
- Pochiraju, D., Chakilam, A., Betham, P., Chimulla, P., & Rao, S.G. (2023). Extractive summarization and multiple-choice question generation using XLNet. In 2023 7th International Conference on Intelligent Computing and Control Systems (pp. 1001-1005). IEEE. https://doi.org/10.1109/ICICCS56967.2023.10142220
- Pugh, D., De Champlain, A., Gierl, M., Lai, H., & Touchie, C. (2020). Can automated item generation be used to develop high quality MCQs that assess application of knowledge?. Research and Practice in Technology Enhanced Learning, 15, 1 13. https://doi.org/10.1186/s41039-020-00134-8
- Qiu, X., Xue, H., Liang, L., Xie, Z., Liao, S., & Shi, G. (2021). Automatic generation of multiple-choice cloze-test questions for lao language learning. In 2021 International Conference on Asian Language Processing (pp. 125 130). IEEE. https://doi.org/10.1109/IALP54817.2021.9675153
- Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140), 1-67. https://doi.org/10.48550/arXiv.1910.10683
- Raina, V., & Gales, M. (2022). Multiple-choice question generation: Towards an automated assessment framework. arXiv preprint arXiv:2209.11830. https://doi.org/10.48550/arXiv.2209.11830
- Ratcheva, M.G., Navale, R., & Desai, B.C. (2022). An online MCQ sub-system for CrsMgr. In Proceedings of the 26th International Database Engineered Applications Symposium (pp. 128-133). https://doi.org/10.1145/3548785.3548789
- Rodriguez-Torrealba, R., Garcia-Lopez, E., & Garcia-Cabot, A. (2022). End-to-end generation of multiple-xhoice questions using text-to-text transfer transformer models. Expert Systems with Applications, 208, 118258. https://doi.org/10.1016/j.eswa.2022.118258
- Rudner, L.M. (2009). Implementing the graduate management admission test computerized adaptive test. In Elements of adaptive testing (pp. 151-165). Springer New York. https://doi.org/10.1007/978-0-387-85461-8_8
- Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. https://doi.org/10.48550/arXiv.1910.01108
- Sayin, A., & Gierl, M. (2024). Using OpenAI GPT to generate reading comprehension items. Educational Measurement: Issues and Practice, 43(1), 5 18. https://doi.org/10.1111/emip.12590
- Shan, J., Nishihara, Y., Maeda, A., & Yamanishi, R. (2022). Question generation for reading comprehension test complying with types of question. Journal of Information Science & Engineering, 38(3). https://doi.org/10.6688/JISE.202205_38(3).0005
- Shan, J., Nishihara, Y., Yamanishi, R., & Maeda, A. (2019). Question generation for reading comprehension of language learning test: A method using Seq2Seq approach with transformer model. In 2019 International Conference on Technologies and Applications of Artificial Intelligence (pp. 1-6). IEEE. https://doi.org/10.1109/TAAI48200.2019.8959903
- Shridhar, K., Macina, J., El-Assady, M., Sinha, T., Kapur, M., & Sachan, M. (2022). Automatic generation of socratic subquestions for teaching math word problems. arXiv preprint arXiv:2211.12835. https://doi.org/10.48550/arXiv.2211.12835
- Singh, J., McCann, B., Socher, R., & Xiong, C. (2019). BERT is not an interlingua and the bias of tokenization. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019) (pp. 47-55). https://doi.org/10.18653/v1/D19-6106
- Spector, J.M., Ifenthaler, D., Samspon, D., Yang, L., Mukama, E., Warusavitarana, A., … Gibson, D.C. (2016). Technology enhanced formative assessment for 21st century learning. Educational Technology & Society, 19(3), 58 71. https://www.jstor.org/stable/jeductechsoci.19.3.58
- Srivastava, M., & Goodman, N. (2021). Question generation for adaptive education. arXiv preprint arXiv:2106.04262. https://doi.org/10.48550/arXiv.2106.04262
- Steuer, T., Filighera, A., & Rensing, C. (2020). Exploring artificial jabbering for automatic text comprehension question generation. In Addressing Global Challenges and Quality Education: 15th European Conference on Technology Enhanced Learning, EC-TEL 2020, Heidelberg, Germany, September 14–18, 2020, Proceedings 15 (pp. 1-14). Springer International Publishing. https://doi.org/10.1007/978-3-030-57717-9_1
- Tsai, D.C., Chang, W., & Yang, S. (2021). Short answer questions generation by Fine-Tuning BERT and GPT-2. In Proceedings of the 29th International Conference on Computers in Education Conference (Vol. 64). https://icce2021.apsce.net/wp-content/uploads/2021/12/ICCE2021-Vol.II-PP.-508-514.pdf
- von Davier, M. (2019). Training Optimus prime, MD: Generating medical certification items by fine-tuning OpenAI's gpt2 transformer model. arXiv preprint arXiv:1908.08594. https://doi.org/10.48550/arXiv.1908.08594
- Vu, N., & Van Nguyen, K. (2022). Enhancing Vietnamese question generation with reinforcement learning. In Asian Conference on Intelligent Information and Database Systems (pp. 559 570). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-21743-2_45
- Wang, B., Yao, T., Chen, W., Xu, J., & Wang, X. (2021). Multi-lingual question generation with language agnostic language model. In Findings of the Association for Computational Linguistics: ACL IJCNLP 2021 (pp. 2262 2272). https://aclanthology.org/2021.findings-acl.199.pdf
- Wang, H.C., Maslim, M., & Kan, C.H. (2023). A question–answer generation system for an asynchronous distance learning platform. Education and Information Technologies, 28(9), 12059-12088. https://doi.org/10.1007/s10639-023-11675-y
- Wang, Z., Valdez, J., Basu Mallick, D., & Baraniuk, R.G. (2022). Towards human-like educational question generation with large language models. In International conference on artificial intelligence in education (pp. 153-166). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-11644-5_13
- Wylie, E.C., & Lyon, C.J. (2015). The fidelity of formative assessment implementation: Issues of breadth and quality. Assessment in Education: Principles, Policy & Practice, 22(1), 140-160. https://doi.org/10.1080/0969594X.2014.990416
- Xie, J., Peng, N., Cai, Y., Wang, T., & Huang, Q. (2021). Diverse distractor generation for constructing high-quality multiple choice questions. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 280 291. https://doi.org/10.1109/TASLP.2021.3138706
- Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., & Le, Q.V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32. https://dl.acm.org/doi/10.5555/3454287.3454804
- Yen, Y.-C., Ho, R.-G., Liao, W.-W., & Chen, L.-J. (2012). Reducing the impact of inappropriate items on reviewable computerized adaptive testing. Educational Technology & Society, 15(2), 231–243. https://www.jstor.org/stable/jeductechsoci.15.2.231
- Zhang, C. (2023). Automatic Generation of Multiple-Choice Questions. arXiv preprint arXiv: 2303.14576v1. https://doi.org/10.48550/arXiv.2303.14576
- Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., ... & Wen, J.R. (2023). A survey of large language models. arXiv preprint arXiv:2303.18223. https://doi.org/10.48550/arXiv.2303.18223
- Zhao, Z., Hou, Y., Wang, D., Yu, M., Liu, C., & Ma, X. (2022). Educational question generation of children storybooks via question type distribution learning and event-centric summarization. arXiv preprint arXiv:2203.14187. https://doi.org/10.48550/arXiv.2203.14187
A review of automatic item generation techniques leveraging large language models
Yıl 2025,
Cilt: 12 Sayı: 2, 317 - 340, 01.06.2025
Bin Tan
,
Nour Armoush
,
Elisabetta Mazzullo
Okan Bulut
,
Mark Gierl
Öz
This study reviews existing research on the use of large language models (LLMs) for automatic item generation (AIG). We performed a comprehensive literature search across seven research databases, selected studies based on predefined criteria, and summarized 60 relevant studies that employed LLMs in the AIG process. We identified the most commonly used LLMs in current AIG literature, their specific applications in the AIG process, and the characteristics of the generated items. We found that LLMs are flexible and effective in generating various types of items across different languages and subject domains. However, many studies have overlooked the quality of the generated items, indicating a lack of a solid educational foundation. Therefore, we share two suggestions to enhance the educational foundation for leveraging LLMs in AIG, advocating for interdisciplinary collaborations to exploit the utility and potential of LLMs.
Kaynakça
- Ackerman, R., & Balyan, R. (2023). Automatic multilingual question generation for health data using LLMs. In International Conference on AI-generated Content (pp. 1-11). Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-99-7587-7_1
- Agrawal, A., & Shukla, P. (2023). Context aware automatic subjective and objective question generation using Fast Text to text transfer learning. International Journal of Advanced Computer Science and Applications, 14(4), 456-463.
- Aigo, K., Tsunakawa, T., Nishida, M., & Nishimura, M. (2021). Question generation using knowledge graphs with the T5 language model and masked self-attention. In 2021 IEEE 10th Global Conference on Consumer Electronics (pp. 85 87). IEEE. https://doi.org/10.1109/GCCE53005.2021.9621874
- Akyön, F.Ç., Cavusoglu, A.D.E., Cengiz, C., Altinuç, S.O., & Temizel, A. (2022). Automated question generation and question answering from Turkish texts. Turkish Journal of Electrical Engineering and Computer Sciences, 30(5), 1931 1940. https://doi.org/10.55730/1300-0632.3914
- Alsubait, T., Parsia, B., & Sattler, U. (2016). Ontology-based multiple choice question generation. KI-Künstliche Intelligenz, 30, 183-188. https://doi.org/10.1007/s13218-015-0405-9
- Alves, C.B., Gierl, M.J., & Lai, H. (2010, April). Using automated item generation to promote test design and development [Paper presentation]. American Educational Research Association Annual Meeting, Denver, CO, United States.
- Arksey, H., & O'malley, L. (2005). Scoping studies: towards a methodological framework. International Journal of Social Research Methodology, 8(1), 19 32. https://doi.org/10.1080/1364557032000119616
- Attali, Y., Runge, A., LaFlair, G.T., Yancey, K., Goodwin, S., Park, Y., & Von Davier, A.A. (2022). The interactive reading task: Transformer-based automatic item generation. Frontiers in Artificial Intelligence, 5, 903077. https://doi.org/10.3389/frai.2022.903077
- Berger, G., Rischewski, T., Chiruzzo, L., & Rosá, A. (2022). Generation of English question answer exercises from texts using transformers-based models. In 2022 IEEE Latin American Conference on Computational Intelligence (pp. 1-5). IEEE. https://doi.org/10.1109/LA-CCI54402.2022.9981171
- Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7-74. https://doi.org/10.1080/0969595980050102
- Bulathwela, S., Muse, H., & Yilmaz, E. (2023). Scalable educational question generation with pre-trained language models. In International Conference on Artificial Intelligence in Education (pp. 327-339). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-36272-9_27
- Bulut, O., & Yildirim-Erbasli, S.N. (2022). Automatic story and item generation for reading comprehension assessments with transformers. International Journal of Assessment Tools in Education, 9(Special Issue), 72-87. https://doi.org/10.21449/ijate.1124382
- Bulut, O., Gorgun, G., Yildirim‐Erbasli, S.N., Wongvorachan, T., Daniels, L.M., Gao, Y., ... & Shin, J. (2023). Standing on the shoulders of giants: Online formative assessments as the foundation for predictive learning analytics models. British Journal of Educational Technology, 54(1), 19-39. https://doi.org/10.1111/bjet.13276
- Ch, D.R., & Saha, S.K. (2018). Automatic multiple choice question generation from text: A survey. IEEE Transactions on Learning Technologies, 13(1), 14 25. https://doi.org/10.1109/TLT.2018.2889100
- Chiang, S.H., Wang, S.C., & Fan, Y.C. (2024). Cdgp: Automatic cloze distractor generation based on pre trained language model. arXiv preprint arXiv:2403.10326. https://doi.org/10.18653/v1/2022.findings-emnlp.429
- Chughtai, R., Azam, F., Anwar, M.W., But, W.H., & Farooq, M.U. (2022). A lecture centric automated distractor generation for post-graduate software engineering courses. In 2022 International Conference on Frontiers of Information Technology (FIT) (pp. 100-105). IEEE. https://doi.org/10.1109/FIT57066.2022.00028
- Chung, H.L., Chan, Y.H., & Fan, Y.C. (2020). A BERT-based distractor generation scheme with multi tasking and negative answer training strategies. arXiv preprint arXiv:2010.05384. https://arxiv.org/abs/2010.05384
- Dalby, D., & Swan, M. (2019). Using digital technology to enhance formative assessment in mathematics classrooms. British Journal of Educational Technology, 50(2), 832-845. https://doi.org/10.1111/bjet.12606
- Dembitzer, L., Zelikovitz, S., & Kettler, R.J. (2017). Designing computer-based assessments: Multidisciplinary findings and student perspectives. International Journal of Educational Technology, 4(3), 20 31. https://educationaltechnology.net/ijet/index.php/ijet/article/view/47
- Desai, T. (2021). Discourse parsing and its application to question generation [Unpublished dissertation]. The University of Texas at Dallas.
- Devlin, J., Chang, M.W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805
- Dijkstra, R., Genç, Z., Kayal, S., & Kamps, J. (2022). Reading comprehension quiz generation using generative pre-trained transformers. In S. Sosnovsky, P. Brusilovsky, & A. Lan (Eds.), Proceedings of the Fourth International Workshop on Intelligent Textbooks 2022 (pp. 4–7). CEUR-WS. http://ceur-ws.org/Vol-3192/
- Drori, I., Zhang, S., Shuttleworth, R., Tang, L., Lu, A., Ke, E., ... & Strang, G. (2022). A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level. Proceedings of the National Academy of Sciences, 119(32), e2123433119. https://doi.org/10.1073/pnas.2123433119
- Falcão, F., Costa, P., & Pêgo, J.M. (2022). Feasibility assurance: a review of automatic item generation in medical assessment. Advances in Health Sciences Education, 27(2), 405-425. https://doi.org/10.1007/s10459-022-10092-z
- Femi, J.G., & Nayak, A.K. (2022). EQGTL: An Ensemble Model for Relevant Question Generation using Transfer Learning. In 2022 International Conference on Machine Learning, Computer Systems and Security (pp. 253-258). IEEE. https://doi.org/10.1109/MLCSS57186.2022.00054
- Fuadi, M., & Wibawa, A.D. (2022). Automatic question generation from indonesian texts using text-to-text transformers. In 2022 International Conference on Electrical and Information Technology (IEIT) (pp. 84-89). IEEE. https://doi.org/10.1109/IEIT56384.2022.9967858
- Fung, Y.C., Kwok, J.C.W., Lee, L.K., Chui, K.T., & U, L.H. (2020). Automatic question generation system for english reading comprehension. In Technology in Education. Innovations for Online Teaching and Learning: 5th International Conference, ICTE 2020, Macau, China, August 19-22, 2020, Revised Selected Papers 5 (pp. 136-146). Springer Singapore. https://doi.org/10.1007/978-981-33-4594-2_12
- Fung, Y.C., Lee, L.K., & Chui, K.T. (2023). An automatic question generator for Chinese comprehension. Inventions, 8(1), 31. https://doi.org/10.3390/inventions8010031
- Ghanem, B., Coleman, L.L., Dexter, J.R., von der Ohe, S.M., & Fyshe, A. (2022). Question generation for reading comprehension assessment by modeling how and what to ask. arXiv preprint arXiv:2204.02908. https://doi.org/10.48550/arXiv.2204.02908
- Gierl, M.J., & Lai, H. (2012). The role of item models in automatic item generation. International Journal of Testing, 12(3), 273 298. https://doi.org/10.1080/15305058.2011.635830
- Gierl, M.J., & Lai, H. (2015). Automatic item generation. In Handbook of test development (pp. 410-429). Routledge.
- Gierl, M.J., & Lai, H. (2016). A process for reviewing and evaluating generated test items. Educational Measurement: Issues and Practice, 35(4), 6 20. https://doi.org/10.1111/emip.12129
- Gierl, M.J., Lai, H., & Tanygin, V. (2021). Advanced methods in automatic item generation. Routledge.
- Godslove, J.F., & Nayak, A.K. (2023). Generative model for formulating relevant questions and answers using transfer learning. In AIP Conference Proceedings (Vol. 2819, No. 1). AIP Publishing. https://doi.org/10.1063/5.0136892
- Gopal, A. (2022). Automatic question generation for Hindi and Marathi. In 2022 International Conference on Advanced Learning Technologies (ICALT) (pp. 19-21). IEEE. https://doi.org/10.1109/ICALT55010.2022.00012
- Goyal, R., Kumar, P., & Singh, V.P. (2023). Automated question and answer generation from texts using text-to-text transformers. Arabian Journal for Science and Engineering, 1-15. https://doi.org/10.1007/s13369-023-07840-7
- Granić, A. (2022). Educational technology adoption: A systematic review. Education and Information Technologies, 27(7), 9725-9744. https://doi.org/10.1007/s10639-022-10951-7
- Grover, K., Kaur, K., Tiwari, K., Rupali, & Kumar, P. (2021). Deep learning based question generation using t5 transformer. In Advanced Computing: 10th International Conference, IACC 2020, Panaji, Goa, India, December 5–6, 2020, Revised Selected Papers, Part I 10 (pp. 243-255). Springer Singapore. https://doi.org/10.1007/978-981-16-0401-0_18
- Han, Z. (2022). Unsupervised multilingual distractor generation for fill-in-the-blank questions [Unpublished thesis]. Uppsala University.
- Jiao, Y., Shridhar, K., Cui, P., Zhou, W., & Sachan, M. (2023). Automatic educational question generation with difficulty level controls. In International Conference on Artificial Intelligence in Education (pp. 476-488). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-36272-9_39
- Kalpakchi, D., & Boye, J. (2021). BERT-based distractor generation for Swedish reading comprehension questions using a small-scale dataset. arXiv preprint arXiv:2108.03973. https://doi.org/10.48550/arXiv.2108.03973
- Kasakowskij, R., Kasakowskij, T., & Seidel, N. (2022). Generation of multiple true false questions. 20. Fachtagung Bildungstechnologien. https://doi.org/10.18420/delfi2022-026
- Khandait, K., Bhura, S., & Asole, S.S. (2022). Automatic question generation through word vector synchronization using lamma. Indian Journal of Computer Science and Engineering, 13(4), 1083-1095. https://doi.org/10.21817/indjcse/2022/v13i4/221304046
- Kosh, A.E., Simpson, M.A., Bickel, L., Kellogg, M., & Sanford‐Moore, E. (2019). A cost–benefit analysis of automatic item generation. Educational Measurement: Issues and Practice, 38(1), 48-53. https://doi.org/10.1111/emip.12237
- Kumar, A., Kharadi, A., Singh, D., & Kumari, M. (2021). Automatic question-answer pair generation using deep learning. In 2021 Third International Conference on Inventive Research in Computing Applications (pp. 794 799). IEEE. https://doi.org/10.1109/ICIRCA51532.2021.9544654
- Kumar, N.S., Mali, R., Ratnam, A., Kurpad, V., & Magapu, H. (2022). Identification and addressal of knowledge gaps in students. In 2022 3rd International Conference for Emerging Technology (pp. 1-6). IEEE. https://doi.org/10.1109/INCET54531.2022.9824483
- Kumar, S., Chauhan, A., & Kumar C, P. (2022). Learning enhancement using question-answer generation for e-book using contrastive fine-tuned T5. In International Conference on Big Data Analytics (pp. 68 87). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-24094-2_5
- Kumari, V., Keshari, S., Sharma, Y., & Goel, L. (2022). Context-based question answering system with suggested questions. In 2022 12th International Conference on Cloud Computing, Data Science & Engineering (pp. 368 373). IEEE. https://doi.org/10.1109/Confluence52989.2022.9734207
- Kuo, C.Y., & Wu, H.K. (2013). Toward an integrated model for designing assessment systems: An analysis of the current status of computer-based assessments in science. Computers & Education, 68, 388-403. https://doi.org/10.1016/j.compedu.2013.06.002
- Kurdi, G., Leo, J., Parsia, B., Sattler, U., & Al-Emari, S. (2020). A systematic review of automatic question generation for educational purposes. International Journal of Artificial Intelligence in Education, 30, 121-204. https://doi.org/10.1007/s40593-019-00186-y
- Lai, H., Alves, C., & Gierl, M.J. (2009). Using automatic item generation to address item demands for CAT. In D.J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. www.psych.umn.edu/psylabs/CATCentral/
- Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2019). Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942. https://doi.org/10.48550/arXiv.1909.11942
- Lee, H., Chung, H.Q., Zhang, Y., Abedi, J., & Warschauer, M. (2020). The effectiveness and features of formative assessment in US K-12 education: A systematic review. Applied Measurement in Education, 33(2), 124 140. https://doi.org/10.1080/08957347.2020.1732383
- Lim, Y.S. (2019). Students’ perception of formative assessment as an instructional tool in medical education. Medical Science Educator, 29(1), 255 263. https://doi.org/10.1007/s40670-018-00687-w
- Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. https://doi.org/10.48550/arXiv.1907.11692
- Lu, O.H.T., Huang, A.Y.Q., Tsai, D.C.L., & Yang, S.J.H. (2021). Expert-authored and machine-generated short- answer questions for assessing students’ learning performance. Educational Technology & Society, 24(3), 159–173. https://www.jstor.org/stable/27032863
- Maheen, F., Asif, M., Ahmad, H., Ahmad, S., Alturise, F., Asiry, O., & Ghadi, Y.Y. (2022). Automatic computer science domain multiple-choice questions generation based on informative sentences. PeerJ Computer Science, 8, e1010. https://doi.org/10.7717/peerj-cs.1010
- Malhar, A., Sawant, P., Chhadva, Y., & Kurhade, S. (2022). Deep learning-based Answering Questions using T5 and Structured Question Generation System. In 2022 6th International Conference on Intelligent Computing and Control Systems (pp. 1544-1549). IEEE. https://doi.org/10.1109/ICICCS53718.2022.9788264
- Mathur, A., & Suchithra, M. (2022). Application of abstractive summarization in multiple choice question generation. In 2022 International Conference on Computational Intelligence and Sustainable Engineering Solutions (pp. 409 413). IEEE. https://doi.org/10.1109/CISES54857.2022.9844396
- Matsumori, S., Okuoka, K., Shibata, R., Inoue, M., Fukuchi, Y., & Imai, M. (2023). Mask and cloze: Automatic open cloze question generation using a masked language model. IEEE Access, 11, 9835-9850. https://doi.org/10.1109/ACCESS.2023.3239005
- Maurya, K.K., & Desarkar, M.S. (2020). Learning to distract: A hierarchical multi-decoder network for automated generation of long distractors for multiple-choice questions for reading comprehension. In Proceedings of the 29th ACM international conference on information & knowledge management (pp. 1115 1124). https://doi.org/10.1145/3340531.3411997
- Mazzullo, E., Bulut, O., Wongvorachan, T., & Tan, B. (2023). Learning Analytics in the Era of Large Language Models. Analytics, 2(4), 877 898. https://doi.org/10.3390/analytics2040046
- Min, B., Ross, H., Sulem, E., Veyseh, A.P.B., Nguyen, T.H., Sainz, O., ... & Roth, D. (2023). Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys, 56(2), 1-40. https://doi.org/10.1145/3605943
- Muse, H., Bulathwela, S., & Yilmaz, E. (2022). Pre-training with scientific text improves educational question generation. arXiv preprint arXiv:2212.03869. https://doi.org/10.48550/arXiv.2212.03869
- Newton, P.E. (2007). Clarifying the purposes of educational assessment. Assessment in education, 14(2), 149-170. https://doi.org/10.1080/09695940701478321
- Nguyen, H.A., Bhat, S., Moore, S., Bier, N., & Stamper, J. (2022). Towards generalized methods for automatic question generation in educational domains. In European conference on technology enhanced learning (pp. 272-284). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-16290-9_20
- Nittala, S., Agarwal, P., Vishnu, R., & Shanbhag, S. (2022). Speaker Diarization and BERT-Based Model for Question Set Generation from Video Lectures. In Information and Communication Technology for Competitive Strategies ICT: Applications and Social Interfaces (pp. 441 452). Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-19-0095-2_42
- Offerijns, J., Verberne, S., & Verhoef, T. (2020). Better distractions: Transformer-based distractor generation and multiple-choice question filtering. arXiv preprint arXiv:2010.09598. https://doi.org/10.48550/arXiv.2010.09598
- Pochiraju, D., Chakilam, A., Betham, P., Chimulla, P., & Rao, S.G. (2023). Extractive summarization and multiple-choice question generation using XLNet. In 2023 7th International Conference on Intelligent Computing and Control Systems (pp. 1001-1005). IEEE. https://doi.org/10.1109/ICICCS56967.2023.10142220
- Pugh, D., De Champlain, A., Gierl, M., Lai, H., & Touchie, C. (2020). Can automated item generation be used to develop high quality MCQs that assess application of knowledge?. Research and Practice in Technology Enhanced Learning, 15, 1 13. https://doi.org/10.1186/s41039-020-00134-8
- Qiu, X., Xue, H., Liang, L., Xie, Z., Liao, S., & Shi, G. (2021). Automatic generation of multiple-choice cloze-test questions for lao language learning. In 2021 International Conference on Asian Language Processing (pp. 125 130). IEEE. https://doi.org/10.1109/IALP54817.2021.9675153
- Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140), 1-67. https://doi.org/10.48550/arXiv.1910.10683
- Raina, V., & Gales, M. (2022). Multiple-choice question generation: Towards an automated assessment framework. arXiv preprint arXiv:2209.11830. https://doi.org/10.48550/arXiv.2209.11830
- Ratcheva, M.G., Navale, R., & Desai, B.C. (2022). An online MCQ sub-system for CrsMgr. In Proceedings of the 26th International Database Engineered Applications Symposium (pp. 128-133). https://doi.org/10.1145/3548785.3548789
- Rodriguez-Torrealba, R., Garcia-Lopez, E., & Garcia-Cabot, A. (2022). End-to-end generation of multiple-xhoice questions using text-to-text transfer transformer models. Expert Systems with Applications, 208, 118258. https://doi.org/10.1016/j.eswa.2022.118258
- Rudner, L.M. (2009). Implementing the graduate management admission test computerized adaptive test. In Elements of adaptive testing (pp. 151-165). Springer New York. https://doi.org/10.1007/978-0-387-85461-8_8
- Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. https://doi.org/10.48550/arXiv.1910.01108
- Sayin, A., & Gierl, M. (2024). Using OpenAI GPT to generate reading comprehension items. Educational Measurement: Issues and Practice, 43(1), 5 18. https://doi.org/10.1111/emip.12590
- Shan, J., Nishihara, Y., Maeda, A., & Yamanishi, R. (2022). Question generation for reading comprehension test complying with types of question. Journal of Information Science & Engineering, 38(3). https://doi.org/10.6688/JISE.202205_38(3).0005
- Shan, J., Nishihara, Y., Yamanishi, R., & Maeda, A. (2019). Question generation for reading comprehension of language learning test: A method using Seq2Seq approach with transformer model. In 2019 International Conference on Technologies and Applications of Artificial Intelligence (pp. 1-6). IEEE. https://doi.org/10.1109/TAAI48200.2019.8959903
- Shridhar, K., Macina, J., El-Assady, M., Sinha, T., Kapur, M., & Sachan, M. (2022). Automatic generation of socratic subquestions for teaching math word problems. arXiv preprint arXiv:2211.12835. https://doi.org/10.48550/arXiv.2211.12835
- Singh, J., McCann, B., Socher, R., & Xiong, C. (2019). BERT is not an interlingua and the bias of tokenization. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019) (pp. 47-55). https://doi.org/10.18653/v1/D19-6106
- Spector, J.M., Ifenthaler, D., Samspon, D., Yang, L., Mukama, E., Warusavitarana, A., … Gibson, D.C. (2016). Technology enhanced formative assessment for 21st century learning. Educational Technology & Society, 19(3), 58 71. https://www.jstor.org/stable/jeductechsoci.19.3.58
- Srivastava, M., & Goodman, N. (2021). Question generation for adaptive education. arXiv preprint arXiv:2106.04262. https://doi.org/10.48550/arXiv.2106.04262
- Steuer, T., Filighera, A., & Rensing, C. (2020). Exploring artificial jabbering for automatic text comprehension question generation. In Addressing Global Challenges and Quality Education: 15th European Conference on Technology Enhanced Learning, EC-TEL 2020, Heidelberg, Germany, September 14–18, 2020, Proceedings 15 (pp. 1-14). Springer International Publishing. https://doi.org/10.1007/978-3-030-57717-9_1
- Tsai, D.C., Chang, W., & Yang, S. (2021). Short answer questions generation by Fine-Tuning BERT and GPT-2. In Proceedings of the 29th International Conference on Computers in Education Conference (Vol. 64). https://icce2021.apsce.net/wp-content/uploads/2021/12/ICCE2021-Vol.II-PP.-508-514.pdf
- von Davier, M. (2019). Training Optimus prime, MD: Generating medical certification items by fine-tuning OpenAI's gpt2 transformer model. arXiv preprint arXiv:1908.08594. https://doi.org/10.48550/arXiv.1908.08594
- Vu, N., & Van Nguyen, K. (2022). Enhancing Vietnamese question generation with reinforcement learning. In Asian Conference on Intelligent Information and Database Systems (pp. 559 570). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-21743-2_45
- Wang, B., Yao, T., Chen, W., Xu, J., & Wang, X. (2021). Multi-lingual question generation with language agnostic language model. In Findings of the Association for Computational Linguistics: ACL IJCNLP 2021 (pp. 2262 2272). https://aclanthology.org/2021.findings-acl.199.pdf
- Wang, H.C., Maslim, M., & Kan, C.H. (2023). A question–answer generation system for an asynchronous distance learning platform. Education and Information Technologies, 28(9), 12059-12088. https://doi.org/10.1007/s10639-023-11675-y
- Wang, Z., Valdez, J., Basu Mallick, D., & Baraniuk, R.G. (2022). Towards human-like educational question generation with large language models. In International conference on artificial intelligence in education (pp. 153-166). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-11644-5_13
- Wylie, E.C., & Lyon, C.J. (2015). The fidelity of formative assessment implementation: Issues of breadth and quality. Assessment in Education: Principles, Policy & Practice, 22(1), 140-160. https://doi.org/10.1080/0969594X.2014.990416
- Xie, J., Peng, N., Cai, Y., Wang, T., & Huang, Q. (2021). Diverse distractor generation for constructing high-quality multiple choice questions. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 280 291. https://doi.org/10.1109/TASLP.2021.3138706
- Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., & Le, Q.V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32. https://dl.acm.org/doi/10.5555/3454287.3454804
- Yen, Y.-C., Ho, R.-G., Liao, W.-W., & Chen, L.-J. (2012). Reducing the impact of inappropriate items on reviewable computerized adaptive testing. Educational Technology & Society, 15(2), 231–243. https://www.jstor.org/stable/jeductechsoci.15.2.231
- Zhang, C. (2023). Automatic Generation of Multiple-Choice Questions. arXiv preprint arXiv: 2303.14576v1. https://doi.org/10.48550/arXiv.2303.14576
- Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., ... & Wen, J.R. (2023). A survey of large language models. arXiv preprint arXiv:2303.18223. https://doi.org/10.48550/arXiv.2303.18223
- Zhao, Z., Hou, Y., Wang, D., Yu, M., Liu, C., & Ma, X. (2022). Educational question generation of children storybooks via question type distribution learning and event-centric summarization. arXiv preprint arXiv:2203.14187. https://doi.org/10.48550/arXiv.2203.14187