14th Workshop on Innovative Use of NLP for Building Educational Applications

florence-landscape

Quick Info
Co-located with	ACL 2019
Location	Florence, Italy
Deadline	Friday, April 26, 2019 11:59pm EST Monday, April 29, 2019 11:59pm EST (extended)
Date	Friday, August 2, 2019 @ Hall 2
Organizers	Helen Yannakoudakis, Ekaterina Kochmar, Claudia Leacock, Nitin Madnani, Ildikó Pilán, and Torsten Zesch
Contact	bea.nlp.workshop@gmail.com

Workshop Proceedings

The Proceedings of the 14th Workshop on Innovative Use of NLP for Building Educational Application are now available online.

Workshop Description

The BEA Workshop is a leading venue for NLP innovation in the context of educational applications. It is one of the largest one-day workshops in the ACL community with over 80 attendees in the past several years. The growing interest in educational applications and a diverse community of researchers involved resulted in the creation of the Special Interest Group in Educational Applications (SIGEDU) in 2017, which currently has 177 members.

The workshop’s continuing growth highlights the alignment between societal needs and technological advances. NLP capabilities can now support an array of learning domains, including writing, speaking, reading, science, and mathematics, as well as the related intra-personal (e.g., self-confidence) and inter-personal (e.g., peer collaboration) skills. Within these areas, the community continues to develop and deploy innovative NLP approaches for use in educational settings. In the writing and speech domains, automated writing evaluation (AWE) and speech scoring applications, respectively, are commercially deployed in high-stakes assessment and in instructional contexts (e.g., Massive Open Online Courses (MOOCs) and K-12 classrooms). Commercially-deployed plagiarism detection is also commonly used in both K-12 and higher education settings. For writing, the focus is on innovations that support writing tasks requiring source use, argumentative discourse, and factual content accuracy. For speech, there is an interest in advancing automated scoring to include the evaluation of discourse and content features in responses to spoken assessments. General advances in speech technology have promoted a renewed interest in spoken dialog and multimodal systems for instruction and assessment, for instance, for workplace interviews and simulated teaching environments. The explosive growth of mobile applications for game-based and simulation-based applications is another area where NLP has begun to play a large role, especially for language learning.

NLP for educational applications has gained visibility outside of the NLP community. First, the Hewlett Foundation reached out to the public and private sectors and sponsored two competitions: one for automated essay scoring, and the other for scoring of short response items. The motivation driving these competitions was to engage the larger scientific community in this enterprise. Learning @ Scale is a relatively new venue for NLP research in education. MOOCs now incorporate AWE systems to manage several thousand assignments that may be received during a single MOOC course. MOOCs for Refugees have more recently emerged in response to the current social situations. Courses include language learning, and we can imagine that AWE and other NLP capabilities could support coursework. Another breakthrough for educational applications within the CL community is the presence of a number of shared-task competitions over the past several years — including three shared tasks on grammatical error detection and correction. NLP/Education shared tasks have seen new areas of research, such as the Automated Evaluation of Scientific Writing at BEA 11, Native Language Identification at BEA 12, and Second Language Acquisition Modelling and Complex Word Identification both at BEA 13. These competitions increased the visibility of, and interest in, our field.

The 14th BEA workshop will have oral presentation sessions and a large poster session in order to maximize the amount of original work presented. We expect that the workshop will continue to highlight novel technologies and opportunities for educational NLP in English as well as other languages. The workshop will solicit both full papers and short papers for either oral or poster presentation. We will solicit papers that incorporate NLP methods, including, but not limited to: automated scoring of open-ended textual and spoken responses; game-based instruction and assessment; educational data mining; intelligent tutoring; peer review; grammatical error detection and correction; learner cognition; spoken dialog; multimodal applications; tools for teachers and test developers; and use of corpora. Specific topics include:

Automated scoring/evaluation for written student responses (across multiple genres)

Content analysis for scoring/assessment
Detection and correction of grammatical and other types of errors
Argumentation, discourse, sentiment, stylistic analysis, & non-literal language

Intelligent Tutoring (IT), Collaborative Learning Environments

Educational Data Mining: Collection of user log data from educational applications
Game-based learning
Multimodal communication (including dialog systems) between students and computers

Learner cognition

Assessment of learners’ language and cognitive skill levels
Systems that detect and adapt to learners’ cognitive or emotional states
Tools for learners with special needs

Use of corpora in educational tools

Data mining of learner and other corpora for tool building
Annotation standards and schemas / annotator agreement

Tools and applications for classroom teachers and/or test developers

NLP tools for second and foreign language learners
Semantic-based access to instructional materials to identify appropriate texts
Tools that automatically generate test questions

Shared Task on Grammatical Error Correction

Task Description

Grammatical error correction (GEC) is the task of automatically correcting grammatical errors in text; e.g. [I follows his advices → I followed his advice]. It can be used to not only help language learners improve their writing skills, but also alert native speakers to accidental mistakes or typos.

GEC gained significant attention in the HOO and CoNLL shared tasks between 2011 and 2014 (Dale and Kilgarriff, 2011; Dale et al., 2012; Ng et al., 2013; Ng et al., 2014), but has since become much more difficult to evaluate given a lack of standardised experimental settings. In particular, recent systems have been trained, tuned and tested on different combinations of corpora using different metrics (Yannakoudakis et al., 2017; Chollampatt and Ng, 2018a; Ge et al., 2018; Junczys-Dowmunt et al., 2018). One of the main aims of this proposal is hence to once again provide a platform where different approaches can be evaluated under the same test conditions.

Another significant problem facing the field is that system performance is still primarily benchmarked against the CoNLL-2014 test set, even though this 5-year-old dataset only represents a very narrow domain of first year, South-East Asian undergraduates in Singapore. This means systems have increasingly overfit to a very specific type of English and so do not generalise well to other domains. Our proposal hence introduces a new dataset that represents a much more diverse cross-section of English language domains.

More information can be found on the task webpage.

Task Organizers

Christopher Bryant, University of Cambridge
Mariano Felice, University of Cambridge
Øistein Andersen, University of Cambridge
Ted Briscoe, University of Cambridge

Important Dates

Submission Deadline: Friday, April 26, 2019, 11:59pm EST Monday, April 29, 11:59pm EST (extended)
Notification of Acceptance: Friday, May 24, 2019
Camera-ready Papers Due: Monday, June 3, 2019
Workshop: Friday, August 2, 2019

Schedule

August 2, 2019
8:30–9:00	Loading of Oral Presentations
9:00-9:15	Opening Remarks
9:15–9:40	The many dimensions of algorithmic fairness in educational applications. Anastassia Loukina, Nitin Madnani and Klaus Zechner.
9:40–10:05	Predicting the Difficulty of Multiple Choice Questions in a High-stakes Medical Exam. Le An Ha, Victoria Yaneva, Peter Baldwin and Janet Mee.
10:05–10:30	[Ambassador paper] Effects of the self-view window during video-mediated survey interviews: An eye-tracking study. Shelley Feuer.
10:30–11:00	Coffee Break
11:00–11:25	An Intelligent Testing Strategy for Vocabulary Assessment of Chinese Second Language Learners. Wei Zhou, Renfen Hu, Feipeng Sun and Ronghuai Huang.
11:25–11:50	Computationally Modeling the Impact of Task-Appropriate Language Complexity and Accuracy on Human Grading of German Essays. Zarah Weiß, Anja Riemenschneider, Pauline Schröter and Detmar Meurers.
11:50–12:10	Analysing Rhetorical Structure as a Key Feature of Summary Coherence. Jan Šnajder, Tamara Sladoljev-Agejev and Svjetlana Kolić-Vehovec.
12:10–12:30	[Shared Task Report] The BEA-2019 Shared Task on Grammatical Error Correction. Christopher Bryant, Mariano Felice, Øistein E. Andersen, Ted Briscoe.
12:30–14:00	Lunch
14:00–15:30	BEA14 Poster and Shared Task Sessions
14:00–14:45	Poster Session A
	A Benchmark Corpus of English Misspellings and a Minimally-supervised Model for Spelling Correction. Michael Flor, Michael Fried and Alla Rozovskaya. Content Modeling for Automated Oral Proficiency Scoring System. Su-Youn Yoon and Chong Min Lee. Regression or classification? Automated Essay Scoring for Norwegian. Stig Johan Berggren, Taraka Rama and Lilja Øvrelid. Context is Key: Grammatical Error Detection with Contextual Word Representations. Samuel Bell, Helen Yannakoudakis and Marek Rei. How to account for mispellings: Quantifying the benefit of character representations in neural content scoring models. Brian Riordan, Michael Flor and Robert Pugh. The Unreasonable Effectiveness of Transformer Language Models in Grammatical Error Correction. Dimitris Alikaniotis, Vipul Raheja and Joel Tetreault. (Almost) Unsupervised Grammatical Error Correction using a Synthetic Comparable Corpus. Satoru Katsumata and Mamoru Komachi. Learning to combine Grammatical Error Corrections. Yoav Kantor, Yoav Katz, Leshem Choshen, Edo Cohen-Karlik, Naftali Liberman, Assaf Toledo, Amir Menczel and Noam Slonim. Erroneous data generation for Grammatical Error Correction. Shuyao Xu, Jiehao Zhang, Jin Chen and Long Qin. The LAIX Systems in the BEA-2019 GEC Shared Task. Ruobing Li, Chuan Wang, Yefei Zha, Yonghong Yu, Shiman Guo, Qiang Wang, Yang Liu and Hui Lin. The CUED's Grammatical Error Correction Systems for BEA-2019. Felix Stahlberg and Bill Byrne. The AIP-Tohoku System at the BEA-2019 Shared Task. Hiroki Asano, Masato Mita, Tomoya Mizumoto and Jun Suzuki. CUNI System for the Building Educational Applications 2019 Shared Task: Grammatical Error Correction. Jakub Náplava and Milan Straka. Noisy Channel for Low Resource Grammatical Error Correction. Simon Flachs, Ophélie Lacroix and Anders Søgaard. The BLCU System in the BEA 2019 Shared Task. Liner Yang, Chencheng Wang, Tianxin Liao and Erhong Yang. TMU Transformer System Using BERT for Re-ranking at BEA 2019 Grammatical Error Correction on Restricted Track. Masahiro Kaneko, Kengo Hotate, Satoru Katsumata and Mamoru Komachi. A Neural Grammatical Error Correction System Built On Better Pre-training and Sequential Transfer Learning. Yo Joong Choe, Jiyeon Ham, Kyubyong Park and Yeoil Yoon. Neural and FST-based approaches to grammatical error correction. Zheng Yuan, Felix Stahlberg, Marek Rei, Bill Byrne and Helen Yannakoudakis. Improving Precision of Grammatical Error Correction with Cheat Sheet. Mengyang Qiu, Xuejiao Chen, Maggie Liu, Krishna Parvathala, Apurva Patil and Jungyeul Park. Multi-headed Architecture Based on BERT for Grammatical Errors Correction. Julia Shaptala and Bohdan Didenko. Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data. Roman Grundkiewicz, Marcin Junczys-Dowmunt and Kenneth Heafield. The Unbearable Weight of Generating Artificial Errors for Grammatical Error Correction. Phu Mon Htut and Joel Tetreault.
14:45–15:30	Poster Session B
	Evaluation of automatic collocation extraction methods for language learning. Vishal Bhalla and Klara Klimcikova. Anglicized Words and Misspelled Cognates in Native Language Identification. Ilia Markov, Vivi Nastase and Carlo Strapparava. Linguistically-Driven Strategy for Concept Prerequisites Learning on Italian. Alessio Miaschi, Chiara Alzetta, Franco Alberto Cardillo and Felice Dell'Orletta. Grammatical-Error-Aware Incorrect Example Retrieval System for Learners of Japanese as a Second Language. Mio Arai, Masahiro Kaneko and Mamoru Komachi. Toward Automated Content Feedback Generation for Non-native Spontaneous Speech. Su-Youn Yoon, Ching-Ni Hsieh, Klaus Zechner, Matthew Mulholland, Yuan Wang, and Nitin Madnani. Analytic Score Prediction and Justification Identification in Automated Short Answer Scoring. Tomoya Mizumoto, Hiroki Ouchi, Yoriko Isobe, Paul Reisert, Ryo Nagata, Satoshi Sekine and Kentaro Inui. Content Customization for Micro Learning using Human Augmented AI Techniques. Ayush Shah, Tamer Abuelsaad, Jae-Wook Ahn, Prasenjit Dey, Ravi Kokku, Ruhi Sharma Mittal, Aditya Vempaty and Mourvi Sharma. Curio SmartChat : A system for Natural Language Question Answering for Self-Paced K-12 Learning. Srikrishna Raamadhurai, Ryan S Baker and Vikraman Poduval. Supporting content evaluation of student summaries by Idea Unit embedding. Marcello Gecchele, Hiroaki Yamada, Takenobu Tokunaga and Yasuyo Sawaki. On Understanding the Relation between Expert Annotations of Text Readability and Target Reader Comprehension. Sowmya Vajjala and Ivana Lucic. Measuring Text Complexity for Italian as a Second Language Learning Purposes. Luciana Forti, Alfredo Milani, Luisa Piersanti, Filippo Santarelli, Valentino Santucci and Stefania Spina. Simple Construction of Mixed-Language Texts for Vocabulary Learning. Adithya Renduchintala, Philipp Koehn and Jason Eisner. Analyzing Linguistic Complexity and Accuracy in Academic Language Development of German across Elementary and Secondary School. Zarah Weiß and Detmar Meurers. Artificial Error Generation with Fluency Filtering. Mengyang Qiu and Jungyeul Park. Learning Outcomes and Their Relatedness in a Medical Curriculum. Sneha Mondal, Tejas Dhamecha, Shantanu Godbole, Smriti Pathak, Red Mendoza, K Gayathri Wijayarathna, Nabil Zary, Swarnadeep Saha and Malolan Chetlur. Measuring text readability with machine comprehension: a pilot study. Marc Benzahra and François Yvon. Metaphors in Text Simplification: To change or not to change, that is the question. Yulia Clausen and Vivi Nastase. Application of an Automatic Plagiarism Detection System in a Large-scale Assessment of English Speaking Proficiency. Xinhao Wang, Keelan Evanini, Matthew Mulholland, Yao Qian and James V. Bruno. Equity Beyond Bias in Language Technologies for Education. Elijah Mayfield, Michael Madaio, Shrimai Prabhumoye, David Gerritsen, Brittany McLaughlin, Ezekiel Dixon-Román and Alan W Black. From Receptive to Productive: Learning to Use Confusing Words through Automatically Selected Example Sentences. Chieh-Yang Huang, Yi-Ting Huang, Mei-Hua Chen and Lun-Wei Ku. Equipping Educational Applications with Domain Knowledge. Tarek Sakakini, Hongyu Gong, Jong Yoon Lee, Robert Schloss, JinJun Xiong and Suma Bhat.
15:30–16:00	Mid-Afternoon Snacks
16:00–16:25	Automated Essay Scoring with Discourse-Aware Neural Models. Farah Nadeem, Huy Nguyen, Yang Liu and Mari Ostendorf.
16:25–16:50	Modeling language learning using specialized Elo ratings. Jue Hou, Maximilian W. Koppatz, José María Hoya Quecedo, Nataliya Stoyanova, Mikhail Kopotev and Roman Yangarber.
16:50–17:15	Rubric Reliability and Annotation of Content and Argument in Source-Based Argument Essays. Yanjun Gao, Alex Driban, Brennan Xavier McManus, Elena Musi, Patricia M. Davies, Smaranda Muresan and Rebecca J. Passonneau.
17:15–17:30	Closing Remarks
19:00–22:00	Post-workshop Dinner at Trattoria Coco Lezzone

Submission Information

We will be using the ACL Submission Guidelines for the BEA Workshop this year. Authors are invited to submit a full paper of up to eight (8) pages of content, plus unlimited references; final versions of long papers will be given one additional page of content (up to 9 pages) so that reviewers’ comments can be taken into account. We also invite short papers of up to of up to four (4) pages of content, plus unlimited references. Upon acceptance, short papers will be given five (5) content pages in the proceedings. Authors are encouraged to use this additional page to address reviewers’ comments in their final versions.

Papers which describe systems are also invited to give a demo of their system. If you would like to present a demo in addition to presenting the paper, please make sure to select either “full paper + demo” or “short paper + demo” under “Submission Category” in the START submission page.

Previously published papers cannot be accepted. The submissions will be reviewed by the program committee. As reviewing will be blind, please ensure that papers are anonymous. Self-references that reveal the author’s identity, e.g., “We previously showed (Smith, 1991) …”, should be avoided. Instead, use citations such as “Smith previously showed (Smith, 1991) …”.

We have also included conflict of interest in the submission form. You should mark all potential reviewers who have been authors on the paper, are from the same research group or institution, or who have seen versions of this paper or discussed it with you.

We will be using the START conference system to manage submissions: https://www.softconf.com/acl2019/bea/

Double Submission Policy

We will follow the official ACL double-submission policy. Specificially:

Papers being submitted both to BEA and another conference or workshop must:

Note on the title page the other conference or workshop to which they are being submitted.
State on the title page that if the authors choose to present their paper at BEA (assuming it was accepted), then the paper will be withdrawn from other conferences and workshops.

Organizing Committee

Helen Yannakoudakis, University of Cambridge
Ekaterina Kochmar, University of Cambridge
Claudia Leacock, Grammarly
Nitin Madnani, Educational Testing Service
Ildikó Pilán, Develop Diverse
Torsten Zesch, University of Duisburg-Essen

Program Committee

Tazin Afrin, University of Pittsburgh
David Alfter, University of Gothenburg
Dimitrios Alikaniotis, Grammarly
Rajendra Banjade, Audible Inc.
Timo Baumann, Carnegie Mellon University
Lee Becker, Pearson
Beata Beigman Klebanov, Educational Testing Service
Kay Berkling, Cooperation State university Karlsruhe Germany
Suma Bhat, University of Illinois at Urbana-Champaign
Sameer Bhatnagar, Polytechnique Montreal
Joachim Bingel, University of Copenhagen
Karim Bouzoubaa, Mohammed V University in Rabat
Chris Brew, Facebook
Ted Briscoe, University of Cambridge
Julian Brooke, University of British Columbia
Dominique Brunato, Institute for Computational Linguistics, ILC-CNR, Pisa, Italy
James Bruno, Educational Testing Service
Christopher Bryant, University of Cambridge
Paula Buttery, University of Cambridge
Aoife Cahill, Educational Testing Service
Andrew Caines, University of Cambridge
Mei-Hua Chen, Department of Foreign Languages and Literature, Tunghai University
Martin Chodorow, ETS & City University of New York
Shamil Chollampatt, National University of Singapore
Mark Core, University of Southern California
Vidas Daudaravicius, UAB VTEX
Kordula De Kuthy, University of Tübingen
Carrie Demmans Epp, University of Alberta
Yo Ehara, Faculty of Informatics, Shizuoka Institute of Science and Technology
Keelan Evanini, Educational Testing Service
Mariano Felice, University of Cambridge
Michael Flor, Educational Testing Service
Thomas François, Université catholique de Louvain
Yoko Futagi, Educational Testing Service
Michael Gamon, Microsoft Research
Dipesh Gautam, The University of Memphis
Christian Gold, University of Bergen
Sian Gooding, University of Cambridge
Jonathan Gordon, Vassar College
Cyril Goutte, National Research Council Canada
Iryna Gurevych, UKP Lab, TU Darmstadt
Binod Gyawali, Educational Testing Service
Na-Rae Han, University of Pittsburgh
Jiangang Hao, Educational Testing Service
Homa Hashemi, Microsoft
Trude Heift, Simon Fraser University
Derrick Higgins, American Family Insurance
Heiko Holz, LEAD Graduate School & Research Network at the University of Tuebingen
Andrea Horbach, University Duisburg-Essen
Chung-Chi Huang, Frostburg State University
Yi-Ting Huang, Academia Sinica
Radu Tudor Ionescu, University of Bucharest
Lifeng Jin, The Ohio State University
Pamela Jordan, University of Pittsburgh
Taraka Kasicheyanula, University of Oslo
Elma Kerz, RWTH Aachen
Fazel Keshtkar, St. John’s University
Mamoru Komachi, Tokyo Metropolitan University
Lun-Wei Ku, Academia Sinica
Chong Min Lee, Educational Testing Service
Ji-Ung Lee, UKP Lab, Technische Universität Darmstadt
John Lee, City University of Hong Kong
Lung-Hao Lee, National Central University
Ben Leong, Educational Testing Service
James Lester, North Carolina State University
Chen Liang, Facebook
Diane Litman, University of Pittsburgh
Yang Liu, Laix
Peter Ljunglöf, University of Gothenburg
Anastassia Loukina, Educational Testing Service
Xiaofei Lu, Pennsylvania State University
Luca Lugini, University of Pittsburgh
Nabin Maharjan, University of Memphis
Jean Maillard, University of Cambridge
Shervin Malmasi, Harvard Medical School
Montse Maritxalar, University of the Basque Country
Ditty Mathew, IIT Madras
Julie Medero, Harvey Mudd College
Beata Megyesi, Uppsala University
Detmar Meurers, Universität Tübingen
Margot Mieskes, University of Applied Sciences, Darmstadt
Elham Mohammadi, CLaC Laboratory, Concordia University
Maria Moritz, German Research Center for Artificial Intelligence
William Murray, Pearson
Courtney Napoles, Grammarly
Diane Napolitano, LCSR, Rutgers University
Hwee Tou Ng, National University of Singapore
Huy Nguyen, LingoChamp
Rodney Nielsen, University of North Texas
Nobal Niraula, Boeing Research and Technology
Yoo Rhee Oh, Electronics and Telecommunications Research Institute (ETRI)
Constantin Orasan, University of Wolverhampton
Ulrike Pado, HFT Stuttgart
Alexis Palmer, University of North Texas
Martí Quixal, Universität Tübingen
Vipul Raheja, Grammarly
Zahra Rahimi, University of Pittsburgh
Lakshmi Ramachandran, Amazon Search
Vikram Ramanarayanan, Educational Testing Service
Hanumant Redkar, Indian Institute of Technology Bombay
Marek Rei, University of Cambridge
Robert Reynolds, Brigham Young University
Brian Riordan, Educational Testing Service
Kat Robb, University of Leeds
Andrew Rosenberg, Google
Mark Rosenstein, Pearson
Alla Rozovskaya, City University of New York
C. Anton Rytting, University of Maryland
Keisuke Sakaguchi, Allen Institute for Artificial Intelligence
Allen Schmaltz, Harvard University
Mat Schulze, San Diego State University
Burr Settles, Duolingo
Serge Sharoff, University of Leeds
Swapna Somasundaran, Educational Testing Service
Richard Sproat, Google
Helmer Strik, Centre for Language and Speech Technology (CLST), Centre for Language Studies (CLS), Radboud University Nijmegen
Jan Švec, NTIS, University of West Bohemia
Anaïs Tack, UCLouvain & KU Leuven
Joel Tetreault, Grammarly
Yuen-Hsien Tseng, National Taiwan Normal University
Giulia Venturi, Institute for Computational Linguistics “A. Zampolli”, Italy
Aline Villavicencio, Federal University of Rio Grande do Sul (Brazil) and University of Essex (UK)
Carl Vogel, Trinity College Dublin
Elena Volodina, University of Gothenburg
Shuting Wang, Facebook Inc
Xinhao Wang, Educational Testing Service
Michael White, The Ohio State University
Michael Wojatzki, LTL, University of Duisburg-Essen
Magdalena Wolska, Eberhard Karls Universität Tübingen
Huichao Xue, LinkedIn
Victoria Yaneva, National Board of Medical Examiners / University of Wolverhampton
Zheng Yuan, University of Cambridge
Marcos Zampieri, University of Wolverhampton (UK)
Klaus Zechner, Educational Testing Service
Fan Zhang, Google
Haoran Zhang, University of Pittsburgh
Ramon Ziai, University of Tübingen