Kharlamova, D. (2026). Discovering the Potential of Automated Phraseological Interference Error Detection: A Transformer-Based Approach. Journal of the European Second Language Association, 10(1), 1–16. DOI: https://doi.org/10.22599/jesla.141
Part of Collection: Multiple views on phraseology in second language acquisition
Abstract
Formulaic language may help language learners in second language (L2) acquisition. However, interference with the first language (L1) can also cause errors in L2 production. The present paper explores the possibilities of detecting L1 Russian interference errors connected with phraseologisms in English learner texts with a fine-tuned Transformer-based neural network. Across a dataset of 3,600 erroneous sentences from the essays in the REALEC corpus, the mistakes were classified as Synonyms, Copying Expression, and Tense Semantics. For the Transformer training, the SpaCy library and RoBERTa-base were used. In two versions of the neural network, the data was split into a training set and a development set with a 70/30 ratio. The first was a pipeline consisting of three separately trained Transformers, one for each of the tags. This detected the majority of mistakes but also mixed up categories and gives false positive results. The second was a single Transformer and detected all types of mistakes. While it overlooked many of the errors, its predictions were mostly correct. The main conclusion is that, given a sufficient training dataset, Transformers can effectively detect L1-motivated phraseological mistakes.