20th Workshop on Innovative Use of NLP for Building Educational Applications: Schedule

Time Zone
Europe/Vienna: CEST (Central European Summer Time), UTC+2
Location
In-person: Room 1.85–86
Virtual: Underline.io (TBD)
Add to Calendar
Stay on schedule—download the full workshop program here: Download ICS

Thursday, July 31, 2025

Time Description
09:00 - 10:30 Tutorial Session A
10:30 - 11:00 Coffee Break
11:00 - 12:30 Tutorial Session B
12:30 - 14:00 Lunch Break / Birds of a Feather
14:00 - 15:30 Oral Session A
14:00 - 14:15 A Bayesian Approach to Inferring Prerequisite Structures and Topic Difficulty in Language Learning (Anh-Duc Vu, Jue Hou, Anisia Katinskaia, Ching-Fan Sheu, Roman Yangarber)
ORAL MAIN
14:15 - 14:30 Enhancing Arabic Automated Essay Scoring with Synthetic Data and Error Injection (Chatrine Qwaider, Bashar Alhafni, Kirill Chirkunov, Nizar Habash, Ted Briscoe)
ORAL MAIN
14:30 - 14:45 Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring (Mina Almasi, Ross Kristensen-McLachlan)
ORAL MAIN
14:45 - 15:00 You Shall Know a Word’s Difficulty by the Family It Keeps: Word Family Features in Personalised Word Difficulty Classifiers for L2 Spanish (Jasper Degraeuwe)
ORAL MAIN
15:00 - 15:15 Assessing Critical Thinking Components in Romanian Secondary School Textbooks: A Data Mining Approach to the ROTEX Corpus (Madalina Chitez, Liviu Dinu, Marius Micluta-Campeanu, Ana-Maria Bucur, Roxana Rogobete)
ORAL MAIN
15:15 - 15:30 Unsupervised Automatic Short Answer Grading and Essay Scoring: A Weakly Supervised Explainable Approach (Felipe Urrutia, Cristian Buc, Roberto Araya, Valentin Barriere)
ORAL MAIN
15:30 - 16:00 Coffee Break
16:00 - 17:30 Poster Session A
18 A Survey on Automated Distractor Evaluation in Multiple-Choice Tasks (Luca Benedetto, Shiva Taslimipoor, Paula Buttery)
POSTER MAIN
43 Increasing the Generalizability of Similarity-Based Essay Scoring Through Cross-Prompt Training (Marie Bexte, Yuning Ding, Andrea Horbach)
POSTER MAIN
34 Automatic concept extraction for learning domain modeling: A weakly supervised approach using contextualized word embeddings (Kordula De Kuthy, Leander Girrbach, Detmar Meurers)
POSTER MAIN
44 Automated Scoring of a German Written Elicited Imitation Test (Mihail Chifligarov, Jammila Laâguidi, Max Schellenberg, Alexander Dill, Anna Timukova, Anastasia Drackert, Ronja Laarmann-Quante)
POSTER MAIN
56 Challenges for AI in Multimodal STEM Assessments: a Human-AI Comparison (Aymeric de Chillaz, Anna Sotnikova, Patrick Jermann, Antoine Bosselut)
POSTER MAIN
64 Don’t Score too Early! Evaluating Argument Mining Models on Incomplete Essays (Nils-Jonathan Schaller, Yuning Ding, Thorben Jansen, Andrea Horbach)
POSTER MAIN
72 LangEye: Toward ‘Anytime’ Learner-Driven Vocabulary Learning From Real-World Objects (Mariana Shimabukuro, Deval Panchal, Christopher Collins)
POSTER MAIN
80 Explaining Holistic Essay Scores in Comparative Judgment Assessments by Predicting Scores on Rubrics (Michiel De Vrindt, Renske Bouwer, Wim Van Den Noortgate, Marije Lesterhuis, Anaïs Tack)
POSTER MAIN
109 Name of Thrones: How Do LLMs Rank Student Names in Status Hierarchies Based on Race and Gender? (Annabella Sakunkoo, Jonathan Sakunkoo)
POSTER MAIN
136 Enhancing Security and Strengthening Defenses in Automated Short-Answer Grading Systems (Sahar Yarmohammadtoosky, Yiyun Zhou, Victoria Yaneva, Peter Baldwin, Saed Rezayi, Brian Clauser, Polina Harik)
POSTER MAIN
150 EduCSW: Building a Mandarin-English Code-Switched Generation Pipeline for Computer Science Learning (Ruishi Chen, Yiling Zhao)
POSTER MAIN
158 Paragraph-level Error Correction and Explanation Generation: Case Study for Estonian (Martin Vainikko, Taavi Kamarik, Karina Kert, Krista Liin, Silvia Maine, Kais Allkivi, Annekatrin Kaivapalu, Mark Fishel)
POSTER MAIN
167 Can LLMs Reliably Simulate Real Students’ Abilities in Mathematics and Reading Comprehension? (KV Aditya Srivatsa, Kaushal Maurya, Ekaterina Kochmar)
POSTER MAIN
32 Transformer Architectures for Vocabulary Test Item Difficulty Prediction (Lucy Skidmore, Mariano Felice, Karen Dunn)
POSTER MAIN
5 Comparing human and LLM proofreading in L2 writing: Impact on lexical and syntactic features (Hakyung Sung, Karla Csuros, Min-Chang Sung)
POSTER MAIN
12 MateInfoUB: A Real-World Benchmark for Testing LLMs in Competitive, Multilingual, and Multimodal Educational Tasks (Marius Dumitran, Mihnea Buca, Theodor Moroianu)
POSTER MAIN
71 Investigating Methods for Mapping Learning Objectives to Bloom’s Revised Taxonomy in Course Descriptions for Higher Education (Zahra Kolagar, Frank Zalkow, Alessandra Zarcone)
POSTER MAIN
108 Using NLI to Identify Potential Collocation Transfer in L2 English (Haiyin Yang, Zoey Liu, Stefanie Wulff)
POSTER MAIN
122 Improving In-context Learning Example Retrieval for Classroom Discussion Assessment with Re-ranking and Label Ratio Regulation (Nhat Tran, Diane Litman, Benjamin Pierce, Richard Correnti, Lindsay Clare Matsumura)
POSTER MAIN
144 Comparing Behavioral Patterns of LLM and Human Tutors: A Population-level Analysis with the CIMA Dataset (Aayush Kucheria, Nitin Sawhney, Arto Hellas)
POSTER MAIN
155 UPSC2M: Benchmarking Adaptive Learning from Two Million MCQ Attempts (Kevin Shi, Karttikeya Mangalam)
POSTER MAIN
36 Multilingual Grammatical Error Annotation: Combining Language-Agnostic Framework with Language-Specific Flexibility (Mengyang Qiu, Tran Minh Nguyen, Zihao Huang, Zelong Li, Yang Gu, Qingyu Gao, SILIANG LIU, Jungyeul Park)
POSTER MAIN
70 Automatic Generation of Inference Making Questions for Reading Comprehension Assessments (Wanjing (Anya) Ma, Michael Flor, Zuowei Wang)
POSTER MAIN
107 Lessons Learned in Assessing Student Reflections with LLMs (Mohamed Elaraby, Diane Litman)
POSTER MAIN
69 Automated L2 Proficiency Scoring: Weak Supervision, Large Language Models, and Statistical Guarantees (Aitor Arronte Alvarez, Naiyi Xie Fincham)
POSTER MAIN
74 Advances in Auto-Grading with Large Language Models: A Cross-Disciplinary Survey (Tania Amanda Nkoyo Frederick Eneye, Chukwuebuka Fortunate Ijezue, Ahmad Imam Amjad, Maaz Amjad, Sabur Butt, Gerardo Castañeda-Garza)
POSTER MAIN
123 Exploring LLMs for Predicting Tutor Strategy and Student Outcomes in Dialogues (Fareya Ikram, Alexander Scarlatos, Andrew Lan)
POSTER MAIN
145 Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic (Zhenjiang Mao, Artem Bisliouk, Rohith Nama, Ivan Ruchkin)
POSTER MAIN
18:00 - 21:00 Workshop Dinner

Friday, August 1, 2025

Time Description
09:00 - 09:45 Keynote Talk by Kostia Omelianchuk
09:45 - 10:30 Oral Session B
09:45 - 10:00 LLMs in alliance with Edit-based models: advancing In-Context Learning for Grammatical Error Correction by Specific Example Selection (Alexey Sorokin, Regina Nasyrova)
ORAL MAIN
10:00 - 10:15 Findings of the BEA 2025 Shared Task on Pedagogical Ability Assessment of AI-powered Tutors (Ekaterina Kochmar, Kaushal Maurya, Kseniia Petukhova, KV Aditya Srivatsa, Anaïs Tack, Justin Vasselli)
ORAL SHARED TASK
10:15 - 10:30 MSA at BEA 2025 Shared Task: Disagreement-Aware Instruction Tuning for Multi-Dimensional Evaluation of LLMs as Math Tutors (Baraa Hikal, Mohmaed Basem, Islam Oshallah, Ali Hamdi)
ORAL SHARED TASK
10:30 - 11:00 Coffee Break
11:00 - 12:30 Poster Session B
20 Leveraging Generative AI for Enhancing Automated Assessment in Programming Education Contests (Stefan Dascalescu, Marius Dumitran, Mihai Alexandru Vasiluta)
POSTER MAIN
31 Is Lunch Free Yet? Overcoming the Cold-Start Problem in Supervised Content Scoring using Zero-Shot LLM-Generated Training Data (Marie Bexte, Torsten Zesch)
POSTER MAIN
35 Towards a Real-time Swedish Speech Analyzer for Language Learning Games: A Hybrid AI Approach to Language Assessment (Tianyi Geng, David Alfter)
POSTER MAIN
50 LEVOS: Leveraging Vocabulary Overlap with Sanskrit to Generate Technical Lexicons in Indian Languages (Karthika N J, Krishnakant Bhatt, Ganesh Ramakrishnan, Preethi Jyothi)
POSTER MAIN
62 The Need for Truly Graded Lexical Complexity Prediction (David Alfter)
POSTER MAIN
66 Educators’ Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting (Sankalan Pal Chowdhury, Terry Jingchen Zhang, Donya Rooein, Dirk Hovy, Tanja Käser, Mrinmaya Sachan)
POSTER MAIN
73 Costs and Benefits of AI-Enabled Topic Modeling in P-20 Research: The Case of School Improvement Plans (Syeda Sabrina Akter, Seth Hunter, David Woo, Antonios Anastasopoulos)
POSTER MAIN
93 Are Large Language Models for Education Reliable Across Languages? (Vansh Gupta, Sankalan Pal Chowdhury, Vilém Zouhar, Donya Rooein, Mrinmaya Sachan)
POSTER MAIN
138 Span Labeling with Large Language Models: Shell vs. Meat (Phoebe Mulcaire, Nitin Madnani)
POSTER MAIN
153 STAIR-AIG: Optimizing the Automated Item Generation Process through Human-AI Collaboration for Critical Thinking Assessment (Euigyum Kim, Seewoo Li, Salah Khalil, Hyo Jeong Shin)
POSTER MAIN
164 End-to-End Automated Item Generation and Scoring for Adaptive English Writing Assessment with Large Language Models (Kamel Nebhi, Amrita Panesar, Hans Bantilan)
POSTER MAIN
175 bea-jh at BEA 2025 Shared Task: Evaluating AI-powered Tutors through Pedagogically-Informed Reasoning (Jihyeon Roh, Jinhyun Bang)
POSTER SHARED TASK
185 K-NLPers at BEA 2025 Shared Task: Evaluating the Quality of AI Tutor Responses with GPT-4.1 (Geon Park, Jiwoo Song, Gihyeon Choi, Juoh Sun, Harksoo Kim)
POSTER SHARED TASK
190 IALab UC at BEA 2025 Shared Task: LLM-Powered Expert Pedagogical Feature Extraction (Sofía Correa Busquets, Valentina Córdova Véliz, Jorge Baier)
POSTER SHARED TASK
188 TBA at BEA 2025 Shared Task: Transfer-Learning from DARE-TIES Merged Models for the Pedagogical Ability Assessment of LLM-Powered Math Tutors (Sebastian Gombert, Fabian Zehner, Hendrik Drachsler)
POSTER SHARED TASK
30 COGENT: A Curriculum-oriented Framework for Generating Grade-appropriate Educational Content (Zhengyuan Liu, Stella Xin Yin, Dion Hoe-Lian Goh, Nancy Chen)
POSTER MAIN
88 Analyzing Interview Questions via Bloom’s Taxonomy to Enhance the Design Thinking Process (Fatemeh Kazemi Vanhari, Christopher Anand, Charles Welch)
POSTER MAIN
110 Exploring LLM-Based Assessment of Italian Middle School Writing: A Pilot Study (Adriana Mirabella, Dominique Brunato)
POSTER MAIN
129 Beyond Linear Digital Reading: An LLM-Powered Concept Mapping Approach for Reducing Cognitive Load (Junzhi Han, Jinho D. Choi)
POSTER MAIN
179 BLCU-ICALL at BEA 2025 Shared Task: Multi-Strategy Evaluation of AI Tutors (Jiyuan An, Xiang Fu, Bo Liu, Xuquan Zong, Cunliang Kong, Shuliang Liu, Shuo Wang, Zhenghao Liu, Liner Yang, Hanghang Fan, Erhong Yang)
POSTER SHARED TASK
173 Jinan Smart Education at BEA 2025 Shared Task: Dual Encoder Architecture for Tutor Identification via Semantic Understanding of Pedagogical Conversations (Lei Chen)
POSTER SHARED TASK
176 CU at BEA 2025 Shared Task: A BERT-Based Cross-Attention Approach for Evaluating Pedagogical Responses in Dialogue (Zhihao Lyu)
POSTER SHARED TASK
178 SYSUpporter Team at BEA 2025 Shared Task: Class Compensation and Assignment Optimization for LLM-generated Tutor Identification (Longfeng Chen, Zeyu Huang, Zheng Xiao, Yawen Zeng, Jin Xu)
POSTER SHARED TASK
181 Emergent Wisdom at BEA 2025 Shared Task: From Lexical Understanding to Reflective Reasoning for Pedagogical Ability Assessment (Raunak Jain, Srinivasan Rengarajan)
POSTER SHARED TASK
186 Henry at BEA 2025 Shared Task: Improving AI Tutor’s Guidance Evaluation Through Context-Aware Distillation (Henry Pit)
POSTER SHARED TASK
192 TutorMind at BEA 2025 Shared Task: Leveraging Fine-Tuned LLMs and Data Augmentation for Mistake Identification (FATIMA DEKMAK, Christian Khairallah, Wissam Antoun)
POSTER SHARED TASK
198 BD at BEA 2025 Shared Task: MPNet Ensembles for Pedagogical Mistake Identification and Localization in AI Tutor Responses (Shadman Rohan, Ishita Sur Apan, Muhtasim Shochcho, Md Fahim, Mohammad Rahman, AKM Mahbubur Rahman, Amin Ali)
POSTER SHARED TASK
168 LLM-Assisted, Iterative Curriculum Writing: A Human-Centered AI Approach in Finnish Higher Education (Leo Huovinen, Mika Hämäläinen)
POSTER MAIN
12:30 - 14:00 Lunch Break / Birds of a Feather
14:00 - 15:30 Poster Session C
27 Can LLMs Effectively Simulate Human Learners? Teachers’ Insights from Tutoring LLM Students (Daria Martynova, Jakub Macina, Nico Daheim, Nilay Yalcin, Xiaoyu Zhang, Mrinmaya Sachan)
POSTER MAIN
29 Adapting LLMs for Minimal-edit Grammatical Error Correction (Ryszard Staruch, Filip Gralinski, Daniel Dzienisiewicz)
POSTER MAIN
53 Do LLMs Give Psychometrically Plausible Responses in Educational Assessments? (Andreas Säuberli, Diego Frassinelli, Barbara Plank)
POSTER MAIN
63 Towards Automatic Formal Feedback on Scientific Documents (Louise Bloch, Johannes Rückert, Christoph Friedrich)
POSTER MAIN
67 Transformer-Based Real-Word Spelling Error Feedback with Configurable Confusion Sets (Torsten Zesch, Dominic Gardner, Marie Bexte)
POSTER MAIN
75 Unsupervised Sentence Readability Estimation Based on Parallel Corpora for Text Simplification (Rina Miyata, Toru Urakawa, Hideaki Tamori, Tomoyuki Kajiwara)
POSTER MAIN
100 Exploiting the English Vocabulary Profile for L2 word-level vocabulary assessment with LLMs (Stefano Banno, Kate Knill, Mark Gales)
POSTER MAIN
128 Improving AI assistants embedded in short e-learning courses with limited textual content (Jacek Marciniak, Marek Kubis, Michał Gulczyński, Adam Szpilkowski, Adam Wieczarek, Marcin Szczepański)
POSTER MAIN
134 GermDetect: Verb Placement Error Detection Datasets for Learners of Germanic Languages (Noah-Manuel Michael, Andrea Horbach)
POSTER MAIN
147 Automated Scoring of Communication Skills in Physician-Patient Interaction: Balancing Performance and Scalability (Saed Rezayi, Le An Ha, Yiyun Zhou, Andrew Houriet, Angelo D’Addario, Peter Baldwin, Polina Harik, Ann King, Victoria Yaneva)
POSTER MAIN
156 Can GPTZero’s AI Vocabulary Distinguish Between LLM-Generated and Student-Written Essays? (Veronica Schmalz, Anaïs Tack)
POSTER MAIN
166 A Framework for Proficiency-Aligned Grammar Practice in LLM-Based Dialogue Systems (Luisa Ribeiro-Flucht, Xiaobin Chen, Detmar Meurers)
POSTER MAIN
184 RETUYT-INCO at BEA 2025 Shared Task: How Far Can Lightweight Models Go in AI-powered Tutor Evaluation? (Santiago Góngora, Ignacio Sastre, Santiago Robaina, Ignacio Remersaro, Luis Chiruzzo, Aiala Rosá)
POSTER SHARED TASK
194 Archaeology at BEA 2025 Shared Task: Are Simple Baselines Good Enough? (Ana Roșu, Iani Ispas, Sergiu Nisioi)
POSTER SHARED TASK
195 NLIP at BEA 2025 Shared Task: Evaluation of Pedagogical Ability of AI Tutors (Trishita Saha, Shrenik Ganguli, Maunendra Sankar Desarkar)
POSTER SHARED TASK
40 LLM-based post-editing as reference-free GEC evaluation (Robert Östling, Murathan Kurfali, Andrew Caines)
POSTER MAIN
91 Estimation of Text Difficulty in the Context of Language Learning (Anisia Katinskaia, Anh-Duc Vu, Jue Hou, Ulla Vanhatalo, Yiheng Wu, Roman Yangarber)
POSTER MAIN
115 Exploring task formulation strategies to evaluate the coherence of classroom discussions with GPT-4o (Yuya Asano, Beata Beigman Klebanov, Jamie Mikeska)
POSTER MAIN
137 EyeLLM: Using Lookback Fixations to Enhance Human-LLM Alignment for Text Completion (Astha Singh, Mark Torrance, Evgeny Chukharev)
POSTER MAIN
148 Decoding Actionability: A Computational Analysis of Teacher Observation Feedback (Mayank Sharma, Jason Zhang)
POSTER MAIN
199 Thapar Titan/s : Fine-Tuning Pretrained Language Models with Contextual Augmentation for Mistake Identification in Tutor–Student Dialogues (Harsh Dadwal, Sparsh Rastogi, Jatin Bedi)
POSTER SHARED TASK
174 Wonderland_EDU@HKU at BEA 2025 Shared Task: Fine-tuning Large Language Models to Evaluate the Pedagogical Ability of AI-powered Tutors (Deliang Wang, Chao Yang, Gaowei Chen)
POSTER SHARED TASK
177 BJTU at BEA 2025 Shared Task: Task-Aware Prompt Tuning and Data Augmentation for Evaluating AI Math Tutors (Yuming Fan, Chuangchuang Tan, Wenyu Song)
POSTER SHARED TASK
183 SmolLab_SEU at BEA 2025 Shared Task: A Transformer-Based Framework for Multi-Track Pedagogical Evaluation of AI-Powered Tutors (Md. Abdur Rahman, MD AL AMIN, Sabik Aftahee, Muhammad Junayed, Md Ashiqur Rahman)
POSTER SHARED TASK
189 LexiLogic at BEA 2025 Shared Task: Fine-tuning Transformer Language Models for the Pedagogical Skill Evaluation of LLM-based tutors (Souvik Bhattacharyya, Billodal Roy, Niranjan M, Pranav Gupta)
POSTER SHARED TASK
197 DLSU at BEA 2025 Shared Task: Towards Establishing Baseline Models for Pedagogical Response Evaluation Tasks (Maria Monica Manlises, Mark Edward Gonzales, Lanz Lim)
POSTER SHARED TASK
58 LookAlike: Consistent Distractor Generation in Math MCQs (Nisarg Parikh, Alexander Scarlatos, Nigel Fernandez, Simon Woodhead, Andrew Lan)
POSTER MAIN
77 From End-Users to Co-Designers: Lessons from Teachers (Martina Galletti, Valeria Cesaroni)
POSTER MAIN
15:30 - 16:00 Coffee Break
16:00 - 17:15 Oral Session C
16:00 - 16:15 Down the Cascades of Omethi: Hierarchical Automatic Scoring in Large-Scale Assessments (Fabian Zehner, Hyo Jeong Shin, Emily Kerzabi, Andrea Horbach, Sebastian Gombert, Frank Goldhammer, Torsten Zesch, Nico Andersen)
ORAL MAIN
16:15 - 16:30 Direct Repair Optimization: Training Small Language Models For Educational Program Repair Improves Feedback (Charles Koutcheme, Nicola Dainese, Arto Hellas)
ORAL MAIN
16:30 - 16:45 Advancing Question Generation with Joint Narrative and Difficulty Control (Bernardo Leite, Henrique Lopes Cardoso)
ORAL MAIN
16:45 - 17:00 Intent Matters: Enhancing AI Tutoring with Fine-Grained Pedagogical Intent Annotation (Kseniia Petukhova, Ekaterina Kochmar)
ORAL MAIN
17:00 - 17:15 LLMs Protégés: Tutoring LLMs with Knowledge Gaps Improves Student Learning Outcome (Andrei Kucharavy, Cyril Vallez, Dimitri Percia David)
ORAL MAIN
17:15 - 17:30 Closing Remarks