| 09:00 - 10:30 |
Oral Session B |
| 09:00 - 09:15 |
The Aftermath of DrawEduMath: Vision Language Models Underperform with Struggling Students and Misdiagnose Errors (Li Lucy, Albert Zhang, Nathan Anderson, Ryan Knight, Kyle Lo) |
| 09:15 - 09:30 |
Interpretable Difficulty-Aware Knowledge Tracing in Tutor-Student Dialogues (Shuyan Huang, Alexander Scarlatos, Jaewook Lee, Andrew Lan) |
| 09:30 - 09:45 |
Measuring Optimal Challenge: Trajectory-Based Difficulty Alignment in Open-Ended Language Tutoring (Ziqi Shu, Shuman Wang, Michael Hardy) |
| 09:45 - 10:00 |
Findings of the BEA 2026 Shared Task on Vocabulary Difficulty Prediction for English Learners (Mariano Felice, Lucy Skidmore) |
| 10:00 - 10:15 |
Sakura at BEA 2026 Shared Task 1: What Makes Vocabulary Difficult? (Adam Nohejl, Xuanxin Wu, Yusuke Ide, Maria Riera Machin, Yi-Ning Chang) |
| 10:15 - 10:30 |
Report on the BEA 2026 Shared Task on Rubric-based Short Answer Scoring for German (Sebastian Gombert, Zhifan Sun, Fabian Zehner, Jannik Lossjew, Tobias Wyrwich, Berrit Czinczel, David Bednorz, Sascha Bernholt, Knut Neumann, Ute Harms, Aiso Heinze, Hendrik Drachsler) |
| 10:30 - 11:00 |
Coffee Break |
| 11:00 - 12:30 |
Oral Session C |
| 11:00 - 11:15 |
EduMUSE: A Multimodal Educational Dataset with Automatically Extracted Instructional Context (Andreea Dutulescu, Stefan Ruseti, Mihai Dascalu, Danielle McNamara) |
| 11:15 - 11:30 |
Confirming Correct, Missing the Rest: LLM Tutoring Agents Struggle Where Feedback Matters Most (Tahreem Yasir, Wenbo Li, Sam Gilson, Sutapa Tithi, Xiaoyi Tian, Tiffany Barnes) |
| 11:30 - 11:45 |
Towards Just-in-Time Adaptive Feedback: Enhancing Student Learning via Knowledge-Grounded LLM (Younghun Lee, Amir Bralin, Nobel Sanjay Rebello, Dan Goldwasser) |
| 11:45 - 12:00 |
Evaluating LLM Workflows for Generating Clinical Communication Assessment Items: A Comparative Study with Subject-Matter Experts (Christopher Runyon, Peter Baldwin, Ian Micir, Kevin Frome, Stephanie Mann, Saed Rezayi, Keelan Evanini, Victoria Yaneva) |
| 12:00 - 12:15 |
Zero Shot Phonics: Evaluating Constraint-Adherent Phonics Story Generation in Large Language Models (Maria Monica Manlises, Ethel Ong) |
| 12:15 - 12:30 |
Evaluating Adaptive Personalization of Educational Readings with Simulated Learners (Ryan Woo, Anmol Rao, Aryan Keluskar, Yinong Chen) |
| 12:30 - 14:00 |
Lunch Break / Birds of a Feather |
| 14:00 - 15:30 |
Poster Session B |
| |
Investigating Context-aware CTC for Pronunciation Assessment: Mitigating Peaky Behavior and Context Independency Assumption (Jiun-Ting Li, Tien-Hong Lo, Bi-Cheng Yan, Shih-Hsuan Chiu, Fu-An Chao, Berlin Chen) |
| |
A Survey of Automated Presentation Coaching: Systems, Methods, and Open Challenges (Wen Liang, Li Siyan, Zackary Rackauckas, Julia Hirschberg) |
| |
Criterial Features in German: Towards Interpretable NLP in Readability Assessment (Denise Loefflad, Sofia Kathmann, Heiko Holz, Detmar Meurers) |
| |
RABIT: Rationale-Based Distillation Towards Interpretable Automatic Speaking Assessment via a Small Language Model (Bi-Cheng Yan, Hong-Yun Lin, Fu-An Chao, Jiun-Ting Li, Berlin Chen) |
| |
Challenges in Machine Translation of Interactive Multimodal Exercises (Lucie Polakova, Miroslav Hrabal, Věra Kloudová, Michal Novák, Mariia Anisimova, Martin Popel) |
| |
Towards Self-Referential Analytic Assessment: A Profile-Based Approach to L2 Writing Evaluation with LLMs (Stefano Banno, Kate Knill, Mark Gales) |
| |
Assessing the Quality and Consistency of Automated Knowledge Component Generation using Instructor-generated Questions and LLMs (Jordan Esiason, Priyanka Khare, Wookhee Min, Seung Lee, Gamze Ozogul, Xiaoying Zheng, Yeil Jeong) |
| |
Intent vs. Surface: Recovering Acoustic Realization from Modern ASR for Pronunciation Training (Seongjin Park) |
| |
Opportunities and Challenges of LLMs in Education: An NLP Perspective (Sowmya Vajjala, Bashar Alhafni, Stefano Banno, Kaushal Maurya, Ekaterina Kochmar) |
| |
Quality-Conditioned Agreement in Automated Short Answer Scoring: Mid-Range Degradation and the Impact of Task-Specific Adaptation (Abigail Gurin Schleifer, Moriah Ariely, Beata Beigman Klebanov, Asaf Salman, Giora Alexandron) |
| |
Using LLMs for item creation: Validating the potential of automatically generated sentence repetition test items for language assessment (Sarah Löber, Björn Rudzewitz, Yuan Chu, Mengyuan He, Shiqin Liu, Yushan Ye, Xiaobin Chen) |
| |
FinnGEC: Benchmarking Grammatical Error Correction for Finnish (Anh-Duc Vu, Mikhail Zolotilin, Jue Hou, Anisia Katinskaia, Yiheng Wu, Roman Yangarber) |
| |
From Metrics to Meaning: Rule-Grounded LLM Explanations for Data Literacy in the Case of Youth Football (Tomasz Piłka, Tomasz Kuczyński, Mateusz Czajka) |
| |
Classification of Student Struggle in Mathematics (Hannah Levin, Madhura Padwal, Nchimunya Mwiinga) |
| |
PERSA: Reinforcement Learning for Professor-Style Personalized Feedback with LLMs (Ravi Kumar, Utkarsh Grover, Xiaomin Lin, Agoritsa Polyzou) |
| |
Data-lean fine-tuning of models for evaluating teacher performance in a GenAI-led elicitation simulation (Beata Beigman Klebanov, Andrew Hoang, Jamie Mikeska, Benny Longwill, Sanjna Kashyap, Shreyashi Halder, Aakanksha Bhatia) |
| |
Noise Steering for Controlled Text Generation: Improving Diversity and Reading-Level Fidelity in Arabic Educational Story Generation (Haziq Khalid, Salsabeel Shapsough, Imran Zualkernan) |
| |
PeerMathDial: A Middle School Dialogue Dataset for Student Collaborative Math Problem Solving (Murong Yue, Desmond Mcglone, Emily Slutz, Wenhan Lyu, Yixuan Zhang, Jennifer Suh, Ziyu Yao) |
| |
Evaluating LLM-Generated Formative Feedback for Undergraduate Mathematics Through the Lens of Feedback Theory (Aron Gohr, Marie-Amelie Lawn, Kevin Gao, Inigo Serjeant, Stephen Heslip) |
| |
Retrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solving in AI Education (Mragisha Jain, Tirth Bhatt, Griffin Pitts, Aum Pandya, Peter Brusilovsky, Narges Norouzi, Arto Hellas, Juho Leinonen, Bita Akram) |
| |
Edit-level Majority Voting Mitigates Over-Correction in LLM-based Grammatical Error Correction (Takumi Goto, Yusuke Sakai, Taro Watanabe) |
| |
AIDA at BEA 2026 Shared Task 1: A Two-Stage Framework for L1-Aware Vocabulary Difficulty Prediction with Representation Diversity and Residual Calibration (Seok Hyeon Cho, JunHyeok Choi, Sangeun Ji, Sung Won Han) |
| |
BoostedCats at BEA 2026 Shared Task 1: What Makes a Word Hard to Learn? Modeling L1 Influence on English Vocabulary Difficulty (Jonas Mayer Martins, Zhuojing Huang, Aaricia Herygers, Lisa Beinborn) |
| |
uogal at BEA 2026 Shared Task 1: Ensemble of Multilingual Encoders with NMT Augmentation for L1-Aware Vocabulary Difficulty Prediction (bernardo stearns, John P. McCrae, Thomas Gaillat, Jefkine Kafunah) |
| |
Jinnie’s Lab at BEA 2026 Shared Task 1: Precalibration of Vocabulary Item Difficulty with Multilingual Transformers and Multi-Task Learning (Zhe Li, Pauline Aguinalde, Jinnie Shin) |
| |
IWM-DKM at BEA 2026 Shared Task 2: Supplementing Supervised Fine-Tuning for Rubric-Based Short Answer Scoring (Kate Belcher, Marius De Kuthy Meurers, Kordula De Kuthy, Detmar Meurers) |
| |
RETUYT-INCO at BEA 2026 Shared Task 2: Meta-prompting in Rubric-based Scoring for German (Ignacio Sastre, Ignacio Remersaro, Facundo Díaz, Nicolás De Horta, Luis Chiruzzo, Aiala Rosá, Santiago Góngora) |
| |
UOL@IDEM at BEA 2026 Shared Task 1: Neural Fusion and Feature-Rich Modeling for L1-Aware Vocabulary Difficulty Prediction (Nouran Khallaf, Serge Sharoff) |
| 15:30 - 16:00 |
Coffee Break |
| 16:00 - 16:45 |
Panel |
| 16:45 - 17:15 |
Oral Session D |
| 16:45 - 17:00 |
Incentives Of EdTech: A Systematic Review Of EduNLP Research (Gabrielle Gaudeau, Aoife O’Driscoll, Jasper Degraeuwe, Andrew Caines, Donya Rooein, Zeerak Talat) |
| 17:00 - 17:15 |
Effects of Varying LLM Access on Essay Writing Behavior (Julia Christenson, Karin de Langis, Shirley Anugrah Hayati, Dongyeop Kang) |
| 17:15 - 17:30 |
Closing Remarks |