09:00 - 09:45 |
Keynote Talk by Kostia Omelianchuk |
09:45 - 10:30 |
Oral Session B |
09:45 - 10:00 |
LLMs in alliance with Edit-based models: advancing In-Context Learning for Grammatical Error Correction by Specific Example Selection (Alexey Sorokin, Regina Nasyrova)  |
10:00 - 10:15 |
Findings of the BEA 2025 Shared Task on Pedagogical Ability Assessment of AI-powered Tutors (Ekaterina Kochmar, Kaushal Maurya, Kseniia Petukhova, KV Aditya Srivatsa, Anaïs Tack, Justin Vasselli)  |
10:15 - 10:30 |
MSA at BEA 2025 Shared Task: Disagreement-Aware Instruction Tuning for Multi-Dimensional Evaluation of LLMs as Math Tutors (Baraa Hikal, Mohmaed Basem, Islam Oshallah, Ali Hamdi)  |
10:30 - 11:00 |
Coffee Break |
11:00 - 12:30 |
Poster Session B |
20 |
Leveraging Generative AI for Enhancing Automated Assessment in Programming Education Contests (Stefan Dascalescu, Marius Dumitran, Mihai Alexandru Vasiluta)  |
31 |
Is Lunch Free Yet? Overcoming the Cold-Start Problem in Supervised Content Scoring using Zero-Shot LLM-Generated Training Data (Marie Bexte, Torsten Zesch)  |
35 |
Towards a Real-time Swedish Speech Analyzer for Language Learning Games: A Hybrid AI Approach to Language Assessment (Tianyi Geng, David Alfter)  |
50 |
LEVOS: Leveraging Vocabulary Overlap with Sanskrit to Generate Technical Lexicons in Indian Languages (Karthika N J, Krishnakant Bhatt, Ganesh Ramakrishnan, Preethi Jyothi)  |
62 |
The Need for Truly Graded Lexical Complexity Prediction (David Alfter)  |
66 |
Educators’ Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting (Sankalan Pal Chowdhury, Terry Jingchen Zhang, Donya Rooein, Dirk Hovy, Tanja Käser, Mrinmaya Sachan)  |
73 |
Costs and Benefits of AI-Enabled Topic Modeling in P-20 Research: The Case of School Improvement Plans (Syeda Sabrina Akter, Seth Hunter, David Woo, Antonios Anastasopoulos)  |
93 |
Are Large Language Models for Education Reliable Across Languages? (Vansh Gupta, Sankalan Pal Chowdhury, Vilém Zouhar, Donya Rooein, Mrinmaya Sachan)  |
138 |
Span Labeling with Large Language Models: Shell vs. Meat (Phoebe Mulcaire, Nitin Madnani)  |
153 |
STAIR-AIG: Optimizing the Automated Item Generation Process through Human-AI Collaboration for Critical Thinking Assessment (Euigyum Kim, Seewoo Li, Salah Khalil, Hyo Jeong Shin)  |
164 |
End-to-End Automated Item Generation and Scoring for Adaptive English Writing Assessment with Large Language Models (Kamel Nebhi, Amrita Panesar, Hans Bantilan)  |
175 |
bea-jh at BEA 2025 Shared Task: Evaluating AI-powered Tutors through Pedagogically-Informed Reasoning (Jihyeon Roh, Jinhyun Bang)  |
185 |
K-NLPers at BEA 2025 Shared Task: Evaluating the Quality of AI Tutor Responses with GPT-4.1 (Geon Park, Jiwoo Song, Gihyeon Choi, Juoh Sun, Harksoo Kim)  |
190 |
IALab UC at BEA 2025 Shared Task: LLM-Powered Expert Pedagogical Feature Extraction (Sofía Correa Busquets, Valentina Córdova Véliz, Jorge Baier)  |
188 |
TBA at BEA 2025 Shared Task: Transfer-Learning from DARE-TIES Merged Models for the Pedagogical Ability Assessment of LLM-Powered Math Tutors (Sebastian Gombert, Fabian Zehner, Hendrik Drachsler)  |
30 |
COGENT: A Curriculum-oriented Framework for Generating Grade-appropriate Educational Content (Zhengyuan Liu, Stella Xin Yin, Dion Hoe-Lian Goh, Nancy Chen)  |
88 |
Analyzing Interview Questions via Bloom’s Taxonomy to Enhance the Design Thinking Process (Fatemeh Kazemi Vanhari, Christopher Anand, Charles Welch)  |
110 |
Exploring LLM-Based Assessment of Italian Middle School Writing: A Pilot Study (Adriana Mirabella, Dominique Brunato)  |
129 |
Beyond Linear Digital Reading: An LLM-Powered Concept Mapping Approach for Reducing Cognitive Load (Junzhi Han, Jinho D. Choi)  |
179 |
BLCU-ICALL at BEA 2025 Shared Task: Multi-Strategy Evaluation of AI Tutors (Jiyuan An, Xiang Fu, Bo Liu, Xuquan Zong, Cunliang Kong, Shuliang Liu, Shuo Wang, Zhenghao Liu, Liner Yang, Hanghang Fan, Erhong Yang)  |
173 |
Jinan Smart Education at BEA 2025 Shared Task: Dual Encoder Architecture for Tutor Identification via Semantic Understanding of Pedagogical Conversations (Lei Chen)  |
176 |
CU at BEA 2025 Shared Task: A BERT-Based Cross-Attention Approach for Evaluating Pedagogical Responses in Dialogue (Zhihao Lyu)  |
178 |
SYSUpporter Team at BEA 2025 Shared Task: Class Compensation and Assignment Optimization for LLM-generated Tutor Identification (Longfeng Chen, Zeyu Huang, Zheng Xiao, Yawen Zeng, Jin Xu)  |
181 |
Emergent Wisdom at BEA 2025 Shared Task: From Lexical Understanding to Reflective Reasoning for Pedagogical Ability Assessment (Raunak Jain, Srinivasan Rengarajan)  |
186 |
Henry at BEA 2025 Shared Task: Improving AI Tutor’s Guidance Evaluation Through Context-Aware Distillation (Henry Pit)  |
192 |
TutorMind at BEA 2025 Shared Task: Leveraging Fine-Tuned LLMs and Data Augmentation for Mistake Identification (FATIMA DEKMAK, Christian Khairallah, Wissam Antoun)  |
198 |
BD at BEA 2025 Shared Task: MPNet Ensembles for Pedagogical Mistake Identification and Localization in AI Tutor Responses (Shadman Rohan, Ishita Sur Apan, Muhtasim Shochcho, Md Fahim, Mohammad Rahman, AKM Mahbubur Rahman, Amin Ali)  |
168 |
LLM-Assisted, Iterative Curriculum Writing: A Human-Centered AI Approach in Finnish Higher Education (Leo Huovinen, Mika Hämäläinen)  |
12:30 - 14:00 |
Lunch Break / Birds of a Feather |
14:00 - 15:30 |
Poster Session C |
27 |
Can LLMs Effectively Simulate Human Learners? Teachers’ Insights from Tutoring LLM Students (Daria Martynova, Jakub Macina, Nico Daheim, Nilay Yalcin, Xiaoyu Zhang, Mrinmaya Sachan)  |
29 |
Adapting LLMs for Minimal-edit Grammatical Error Correction (Ryszard Staruch, Filip Gralinski, Daniel Dzienisiewicz)  |
53 |
Do LLMs Give Psychometrically Plausible Responses in Educational Assessments? (Andreas Säuberli, Diego Frassinelli, Barbara Plank)  |
63 |
Towards Automatic Formal Feedback on Scientific Documents (Louise Bloch, Johannes Rückert, Christoph Friedrich)  |
67 |
Transformer-Based Real-Word Spelling Error Feedback with Configurable Confusion Sets (Torsten Zesch, Dominic Gardner, Marie Bexte)  |
75 |
Unsupervised Sentence Readability Estimation Based on Parallel Corpora for Text Simplification (Rina Miyata, Toru Urakawa, Hideaki Tamori, Tomoyuki Kajiwara)  |
100 |
Exploiting the English Vocabulary Profile for L2 word-level vocabulary assessment with LLMs (Stefano Banno, Kate Knill, Mark Gales)  |
128 |
Improving AI assistants embedded in short e-learning courses with limited textual content (Jacek Marciniak, Marek Kubis, Michał Gulczyński, Adam Szpilkowski, Adam Wieczarek, Marcin Szczepański)  |
134 |
GermDetect: Verb Placement Error Detection Datasets for Learners of Germanic Languages (Noah-Manuel Michael, Andrea Horbach)  |
147 |
Automated Scoring of Communication Skills in Physician-Patient Interaction: Balancing Performance and Scalability (Saed Rezayi, Le An Ha, Yiyun Zhou, Andrew Houriet, Angelo D’Addario, Peter Baldwin, Polina Harik, Ann King, Victoria Yaneva)  |
156 |
Can GPTZero’s AI Vocabulary Distinguish Between LLM-Generated and Student-Written Essays? (Veronica Schmalz, Anaïs Tack)  |
166 |
A Framework for Proficiency-Aligned Grammar Practice in LLM-Based Dialogue Systems (Luisa Ribeiro-Flucht, Xiaobin Chen, Detmar Meurers)  |
184 |
RETUYT-INCO at BEA 2025 Shared Task: How Far Can Lightweight Models Go in AI-powered Tutor Evaluation? (Santiago Góngora, Ignacio Sastre, Santiago Robaina, Ignacio Remersaro, Luis Chiruzzo, Aiala Rosá)  |
194 |
Archaeology at BEA 2025 Shared Task: Are Simple Baselines Good Enough? (Ana Roșu, Iani Ispas, Sergiu Nisioi)  |
195 |
NLIP at BEA 2025 Shared Task: Evaluation of Pedagogical Ability of AI Tutors (Trishita Saha, Shrenik Ganguli, Maunendra Sankar Desarkar)  |
40 |
LLM-based post-editing as reference-free GEC evaluation (Robert Östling, Murathan Kurfali, Andrew Caines)  |
91 |
Estimation of Text Difficulty in the Context of Language Learning (Anisia Katinskaia, Anh-Duc Vu, Jue Hou, Ulla Vanhatalo, Yiheng Wu, Roman Yangarber)  |
115 |
Exploring task formulation strategies to evaluate the coherence of classroom discussions with GPT-4o (Yuya Asano, Beata Beigman Klebanov, Jamie Mikeska)  |
137 |
EyeLLM: Using Lookback Fixations to Enhance Human-LLM Alignment for Text Completion (Astha Singh, Mark Torrance, Evgeny Chukharev)  |
148 |
Decoding Actionability: A Computational Analysis of Teacher Observation Feedback (Mayank Sharma, Jason Zhang)  |
199 |
Thapar Titan/s : Fine-Tuning Pretrained Language Models with Contextual Augmentation for Mistake Identification in Tutor–Student Dialogues (Harsh Dadwal, Sparsh Rastogi, Jatin Bedi)  |
174 |
Wonderland_EDU@HKU at BEA 2025 Shared Task: Fine-tuning Large Language Models to Evaluate the Pedagogical Ability of AI-powered Tutors (Deliang Wang, Chao Yang, Gaowei Chen)  |
177 |
BJTU at BEA 2025 Shared Task: Task-Aware Prompt Tuning and Data Augmentation for Evaluating AI Math Tutors (Yuming Fan, Chuangchuang Tan, Wenyu Song)  |
183 |
SmolLab_SEU at BEA 2025 Shared Task: A Transformer-Based Framework for Multi-Track Pedagogical Evaluation of AI-Powered Tutors (Md. Abdur Rahman, MD AL AMIN, Sabik Aftahee, Muhammad Junayed, Md Ashiqur Rahman)  |
189 |
LexiLogic at BEA 2025 Shared Task: Fine-tuning Transformer Language Models for the Pedagogical Skill Evaluation of LLM-based tutors (Souvik Bhattacharyya, Billodal Roy, Niranjan M, Pranav Gupta)  |
197 |
DLSU at BEA 2025 Shared Task: Towards Establishing Baseline Models for Pedagogical Response Evaluation Tasks (Maria Monica Manlises, Mark Edward Gonzales, Lanz Lim)  |
58 |
LookAlike: Consistent Distractor Generation in Math MCQs (Nisarg Parikh, Alexander Scarlatos, Nigel Fernandez, Simon Woodhead, Andrew Lan)  |
77 |
From End-Users to Co-Designers: Lessons from Teachers (Martina Galletti, Valeria Cesaroni)  |
15:30 - 16:00 |
Coffee Break |
16:00 - 17:15 |
Oral Session C |
16:00 - 16:15 |
Down the Cascades of Omethi: Hierarchical Automatic Scoring in Large-Scale Assessments (Fabian Zehner, Hyo Jeong Shin, Emily Kerzabi, Andrea Horbach, Sebastian Gombert, Frank Goldhammer, Torsten Zesch, Nico Andersen)  |
16:15 - 16:30 |
Direct Repair Optimization: Training Small Language Models For Educational Program Repair Improves Feedback (Charles Koutcheme, Nicola Dainese, Arto Hellas)  |
16:30 - 16:45 |
Advancing Question Generation with Joint Narrative and Difficulty Control (Bernardo Leite, Henrique Lopes Cardoso)  |
16:45 - 17:00 |
Intent Matters: Enhancing AI Tutoring with Fine-Grained Pedagogical Intent Annotation (Kseniia Petukhova, Ekaterina Kochmar)  |
17:00 - 17:15 |
LLMs Protégés: Tutoring LLMs with Knowledge Gaps Improves Student Learning Outcome (Andrei Kucharavy, Cyril Vallez, Dimitri Percia David)  |
17:15 - 17:30 |
Closing Remarks |