The role of Event-Related Potentials (ERPs) as sensitive measures in L2 vocabulary acquisition research

Neurocognitive measures have only scarcely been used in second language (L2) vocabulary research. Traditionally, L2 vocabulary knowledge has been gauged by using off-line measures that allow for conscious thinking and attentional control. Yet, it has been argued that more research is warranted on the role of measures that have the sensitivity to tap into on-line lexical processing. Recording Event-Related Potentials (ERPs) may be an effective technique in order to refine our understanding of L2 vocabulary knowledge. In the current article, we provide a comprehensive review of the relevant literature in order to examine the extent to which ERP research may be valuable to L2 vocabulary research. This review focuses on the potential of ERPs to address the multifaceted nature of vocabulary knowledge. It also examines the role of ERPs to elucidate the neurocognitive mechanisms underlying the incremental nature of L2 vocabulary learning. Finally, this paper discusses the extent to which ERPs might contribute to understanding factors that affect L2 vocabulary learning.


Introduction
Research on second language (L2) vocabulary acquisition has shown that solid lexical knowledge is key to efficient L2 performance in all skill areas (Webb & Nation, 2017). While a learner's vocabulary size is shown to be a reliable predictor of overall L2 proficiency (Schmitt, 2010), lexical knowledge also involves the degree to which a wide variety of form-(e.g., orthography), meaning-(e.g., associates) and use-related (e.g., collocations) aspects of word knowledge are mastered (Nation, 2013). In addition, lexical knowledge includes the speed at which all of these facets can be retrieved and used during L2 performance (Pellicer-Sánchez, 2015). Furthermore, these form-, meaning-, and use-related aspects are said to develop incrementally as a result of repeated encounters with lexical items (Schmitt, 2010). Taken together, becoming lexically proficient in a L2 implies the ability to process form-, meaning-, and use-related aspects of word knowledge with increasing speed and accuracy.
Numerous studies have measured word knowledge by using tests that allow for conscious efforts to retrieve knowledge, such as paper-and-pencil multiple-choice meaning recognition tests. Indeed, few L2 vocabulary studies have used measures that were not available to learners' conscious control and that, therefore, are said to tap into word knowledge that underlies online L2 processing (Godfroid, 2020;Pellicer-Sánchez, 2015). Moreover, few L2 vocabulary learning studies used techniques that were sensitive enough to reveal small knowledge gains (for a discussion, see Schmitt, 2010). Event-Related Potentials (ERPs) obtained through Electro-Encephalography (EEG) may address this gap. First, they can provide a fine-grained insight into the neurocognitive mechanisms that underlie ongoing L2 processing (Elgort et al., 2015). Furthermore, they have the sensitivity to track small amounts of knowledge and provide insights into the early stages of the incremental lexical learning process (Schmitt, 2010, p. 115).
The aim of this review is to provide a comprehensive overview of the existing literature on the value of ERP measures for L2 vocabulary research. By doing so, we aim to contribute to the growing interest in the benefits of sensitive measures for L2 vocabulary research. In the same vein, two recently published comprehensive reviews discuss the role of reaction times (Godfroid, 2020) and eye-tracking (Pellicer-Sánchez & Siyanova-Chanturia, 2018) as sensitive measures in L2 vocabulary research.

ERPs and L2 research
In L2 research, ERPs are used to map the neurocognitive processes underlying online language processing. ERPs are obtained through EEG, that is, the recording of electrical activity in the brain by means of electrodes placed on the scalp (Kaan, 2007). ERP refers to the sequence of positiveand negative-going waveform deflections that are modulated by the onset of a critical stimulus, that is, the event (e.g., the visual or auditory presentation of a word).
ERP components are specific portions of the brainwave that reflect cognitive processes related to the event (Kaan, 2007). The N400 component, for instance, reflects semantic processing (Kutas & Federmeier, 2011). The N means that the component is a negative-going waveform and '400' indicates that the component usually peaks in the 300-500 milliseconds (ms) post stimulus, at about 400 ms. N400 is usually largest at the central and posterior electrodes over both the left and right hemispheres. Hence, ERP components contain several dimensions and can be described with respect to (a) amplitude (i.e., the measured voltage of the oscillation) expressed in microvolts (µv); (b) latency (i.e., the time course during which they occur) expressed in milliseconds (ms) and (c) topography (i.e., the electrode site(s) at which the EEG signal is recorded). Variations in each of these dimensions may cater to a functionally different interpretation. As such, at the first stages of L2 word learning, the latency of N400 can be delayed in comparison to the canonical latency (e.g., in the 550-850 ms time window) as it was the case for the low proficiency learners in Ojima, Nakata, and Kakigi (2005). With respect to topography, newly learnt words can yield N400 effects over frontal electrode sites, instead of over the canonical centro-parietal sites (Elgort et al., 2015). Both the delayed latency and frontal distribution have been explained by lower proficiency and more effortful processing.
Finally, an ERP effect refers to any reliable difference with respect to latency, amplitude and/or topography when two or more conditions are compared (Morgan-Short . Figure 1 exemplifies how semantic and syntactic ERP effects were detected in Weber and Lavric (2008). In this example, a group of native English speakers were presented with sentences that were either correct (black waveform), ended in a semantic violation (red waveform), or ended in a morpho-syntactic violation (blue waveform).
While the semantic effect was indexed by the abovediscussed N400 component, the morpho-syntactic effect was indexed by the P600 component, that is, a positivegoing (P in P600) brainwave peaking between 600-900 ms post stimulus and sensitive to rule-governed (morpho-) syntactic violations (Morgan-Short & Tanner, 2014). As can be seen from Figure 1, the amplitude of the red waveform (semantic violation) was more negative-going than the black waveform (no violation) in the 300-500 ms time window and was indicative of an N400 effect. Additionally, in the 600-900 ms time window, the blue waveform (syntactic violation) was more positive-going than the black waveform (no violation), and was a P600 effect.
While the ERP technique has proven to be beneficial for understanding language processing, it also presents some limitations. First, as ERP data collection is sensitive to noise, participants are instructed to relax and to minimize movements and eye blinks during experimental trials. Consequently, insights gained from ERP research usually reflect receptive knowledge obtained through the presentation of visual or auditory stimuli (Morgan-Short & Ullman, 2014). However, ERPs have also been elicited through productive paradigms. Wang, Chen, and Schiller (2019) investigated whether the classifier feature (a feature comparable to grammatical gender) was activated during overt naming of bare nouns in Chinese first-language (L1) speakers. Picture stimuli were presented and participants were required to name as accurately and quickly as possible what they saw in the picture by using a bare noun. Results suggested that lexico-semantic features were activated in bare noun production. Another limitation concerns the fact that ERP studies typically compare averaged brainwaves across trials and individuals. It has been argued that this averaging procedure may obscure the heterogeneity that exists at the individual level . Another limitation of ERPs is that they are informative of the time-course of linguistic processes, but not about the brain areas that are most active during those processes. Conversely, neuroimaging techniques such as fMRI (functional Magnetic Resonance Imaging) have an excellent spatial resolution. fMRI reveals blood oxygenation changes in targeted brain structures that are thought to reflect increased or decreased neural activity due to e.g., cognitive processing (Morgan-Short & Figure 1: Example of N400 (semantic violation) and P600 (syntactic violation). Positivity plotted upwards. Adapted from "Syntactic anomaly elicits a lexico-semantic (N400) ERP effect in second language but not the first", by K.  Ullman, 2014). Therefore, in order to better understand the functional significance of ERP components, an alignment approach has been suggested (Brouwer & Hoeks, 2013), in which EEG measures first identify neurocognitive processes of interest, before neuroimaging techniques such as fMRI are used to pinpoint the brain areas hosting these processes.

The multifaceted nature of word knowledge
The lexico-semantic N400 and the morpho-syntactic P600 are arguably the most studied ERP components in L2 ERP research (Morgan-Short, 2014). In this way, the existing body of ERP studies echoes the traditional vocabulary-grammar dichotomy, that is, item-learning has been associated with vocabulary and rule-learning with grammar (Siyanova-Chanturia et al., 2017). Yet, word knowledge has been conceptualized as a multifaceted construct comprising not only item-based and rule-governed aspects (Godfroid, 2020), but also frequency-based instances such as multi-word units (Siyanova-Chanturia et al., 2017). Consequently, an increasing number of L2 vocabulary studies has measured different aspects included in word knowledge (Pellicer-Sánchez, 2015). The most comprehensive model on the multifaceted nature of L2 word knowledge (Figure 2) is Nation's (2013) framework, in which word knowledge entails the receptive and productive mastery of aspects of knowledge related to form (e.g., orthography), meaning (e.g., associates), and use (e.g., grammatical functions). Consequently, Nation's framework blurs the lines between components traditionally assigned to either lexis or grammar (Godfroid, 2020).
Taking into account these different aspects, various ERP components may be informative to vocabulary research. With respect to orthography, N170 has shown to be sensitive to both alphabetic and logographic script processing (Yum et al., 2014), while Mismatch Negativity (MMN, i.e., a negative deflection between 150-200 ms) reflects phonological processing in speech perception (Kaan, 2007). On the level of collocational knowledge, P300 may reflect internalized lexical templates, which may be relevant for the study of multi-word units (Siyanova-Chanturia et al., 2017). Further, the (morpho-) syntactic P600 is sometimes accompanied by the (Early) Left Anterior Negativity ((E)LAN). This negative-going component is left and/or anteriorly distributed, can begin at 100 ms (ELAN), and peaks at about 300-500 ms (LAN). (E)LAN has been linked to automatic processing of phrase structure and high proficiency. Apart from morpho-syntax (for an overview of P600, see Swaab et al., 2012, pp. 419-426), P600 is also sensitive to other rulegoverned aspects of L2 processing, such as phonotactic rules (Osterhout et al., 2019). Moreover, a semantic P600 effect has been found in L1 speakers and L2 learners (Zheng & Lemhöfer, 2019) for sentences in which the thematic roles of agent and patient were reversed. Finally, P600 may also be a marker of word form recognition. In this context, P600 has been termed Late Positive Complex (LPC) (Perfetti et al., 2005).
This section has revealed that ERP components are sensitive to diverse aspects comprised in word knowledge. Therefore, the next section deals with how paradigms used in previous L1 and L2 ERP research can address the form-, meaning-, and use-related aspects of word knowledge described in Nation's framework. Nation (2013) distinguishes between knowledge about the written form of the word, the spoken form of the word, and knowledge about word parts. With respect to the written word form, research has revealed that P600/LPC is a marker of recognition of recently learned word forms.  Perfetti et al. (2005), and Balass et al. (2010) investigated whether rare English words learned during treatment would signal a recognition effect in comparison with either unlearned rare English words and familiar words that were not recently presented. English native-speaking high-and less-skilled readers were presented with word pairs and were asked to judge the meaning relatedness of the words after presentation of the second word. Importantly, ERPs were also recorded upon presentation of the first word (i.e., the newly-learnt rare word). Results in both studies indicated that newly-learnt words yielded more positive amplitudes for P600/LPC than unlearned rare words and not encountered familiar words. However, the ability to distinguish between recently learned words and other words was only found in the high-skilled group. Yum et al. (2014) investigated the first stages of vocabulary acquisition in L1 English learners of L2 Chinese. Hence, learners were confronted with two different orthographic systems (i.e., the L1 alphabetic and the L2 logographic system). The Chinese logographic system uses characters as graphic units that represent meaning-bearing syllables instead of phonemes. One of the targeted components was the script-sensitive N170. It was found that fast and slow learners showed differences with respect to N170 latency and topography, in that fast learners showed a left-lateralized sensitivity to N170, whereas slow learners showed a right-lateralized sensitivity. The authors tentatively argued that fast learners were able to develop a structural representation of Chinese words, that is, they relied on the efficient detection of relevant word parts and the spatial relations between them. Contrarily, slow learners were believed to use the entire information contained in the Chinese word, which resulted in qualitatively different word processing. It was concluded that fast learners were less affected by script complexity and word length.

Form-related aspects
Finally, on the level of knowledge about word parts (for an overview of morphological processing in the brain, see Leminen et al., 2019), McKinnon et al. (2003) studied whether morphologically complex words would be processed as unanalysed wholes or as the combination of their constituent parts. It was hypothesized that words with bound stems, that is stems (-ceive) that must be attached to another morpheme (con-) in order to become meaningful (the meaning of conceive is not predictable from the combination of con and -ceive), were stored and processed as unanalysed wholes rather than as complex morphemes. Morphologically complex words consisting of a prefix and a bound stem (retain, intrude) were compared to pseudo-words consisting of the same prefixes and bound stems (*intain, *retrude) and pseudo-words containing no morphemes (*flermuf). If L1 English readers treated pseudo-words as unanalysed wholes, then pseudo-words would elicit larger N400s than real words. Conversely, if words and pseudo-words were decomposed into their constituent morphemes, then real words and complex pseudo-words might elicit similar N400 amplitudes. No N400 differences were found between the words and the morphologically complex pseudo-words. In contrast, the unanalysable pseudo-words yielded large N400s, suggesting that decomposition actually is a processing mechanism for morphologically complex words.

Meaning-related aspects
With respect to meaning, Nation's (2013) framework entails knowledge about the form-meaning link, concepts and referents, and associational knowledge. Most ERP research on meaning-related knowledge has focused on the N400 component (for an in-depth review, see Kutas & Federmeier, 2011). N400 is said to reflect processing difficulty related to semantic integration of a stimulus in the preceding context (e.g., a sentence or another word) but has also shown to be sensitive to other features, such as word frequency, concreteness, plausibility, expectancy, lexical status, lexical neighbourhood size, meaning associates, etc. (Kutas & Federmeier, 2011). Previous L2 research has elicited N400 through different types of stimuli such as sentences (Bowden et al., 2013), word pairs (McLaughlin et al., 2004), L1/L2 equivalents (Guo et al., 2012), pictures (Ojima et al., 2011), and the out-ofcontext presentation of words or letter strings (Laszlo & Federmeier, 2011).
In a L2 word learning study, Elgort et al. (2015) investigated whether reading contexts were amenable to learning L2 English rare words in low and high proficiency learners. The newly learnt words were used in sentencefinal position and served as primes for a subsequently presented related or unrelated word (e.g., … arguments from both sides were so COGENT, followed by the related probe convincing). ERPs were calculated twice: The first time upon presentation of the sentence-final critical stimulus and the second time upon presentation of the (un)related meaning probe. For the first ERP measurement, results showed a non-canonical frontal N400 effect in both proficiency groups, which was explained to index effortful meaning processing while reading sentences. In the semantic relatedness test, however, the high proficiency learners showed a solid canonical centro-parietal N400 effect. The authors therefore concluded that the ability to learn word meanings from context was contingent upon L2 proficiency.
Proficiency effects and N400 were also focused upon in studies that compared L2 learners with native speakers. Ojima et al. (2005) compared the neural processing in L1 and in L2 learned after childhood. Therefore, both semantic and syntactic anomalies were compared with correct sentences in three groups: (a) High, (b) intermediate proficient adult Japanese learners of L2 English, and (c) English natives. With respect to the semantic aspect, critical words were embedded in sentence contexts (e.g., This house has ten *CITIES in total). Although large N400 amplitudes were detected in all groups, the latency of the effect was delayed (550-850 ms) in the low proficiency group. These results suggested that semantic processing in early stages of L2 learning is already robust as indexed by amplitude but requires more processing time in comparison with high proficiency learners and natives. It was concluded that with increasing proficiency, L1 and L2 semantics are processed in a similar way. Accordingly, Bowden et al. (2013) investigated semantic and syntactic processing differences between L2 Spanish learners with immersion experience abroad, native Spanish speakers, and low proficiency L2 Spanish learners. For the semantic aspect of the study, critical words were placed in final sentence position in order to allow for semantic build-up (e.g., La profesora espera ir en autobus a la *SEMANA. 'The professor hopes to go by the *WEEK'). In all groups, N400 was detected in the canonical 300-500 ms time window. It was concluded that the neurocognitive mechanisms that guided L1 and L2 semantic processing were very similar, irrespective of previous language experience. Laszlo and Federmeier (2011) investigated whether and to what extent the presentation of a written input would elicit N400 in the absence of a preceding context. More specifically, it was investigated whether semantic activation of a written input either followed or happened in parallel with word recognition in L1 English speakers. In staged models of word processing, recognition (i.e., comparing orthographic inputs to internal representations of items in the mental lexicon) is hypothesized to take place before semantic access. If this is true, N400 responses could only be yielded by items with a lexical representation. However, the authors argued that previous research had shown that N400 could be elicited by items without lexical representation, such as pseudo-words and illegal letter strings. Critical items in this study were words, pseudo-words, acronyms, and illegal letter strings. The item features that were manipulated had proven to be conducive to N400 elicitation (i.e., the number of orthographic neighbours and lexical associates, as well as the frequency of the lexical neighbours and associates). N400 effects were found for all manipulated features in all stimuli types, although with varying amplitudes. These results suggested that stimulus recognition and semantic access take place in parallel, not in a serial way. Moreover, all types of orthographic input seem to attempt lexical access, even in the absence of context or a preceding word. A similar out-of-context word presentation paradigm was used in Soskey et al.'s (2016) longitudinal L2 word learning study. L1 English learners of L2 Spanish were followed over the course of a trimester in order to investigate whether and to what extent the neural mechanisms involved in L2 word processing would evolve over time. In three experimental sessions spread over one trimester, a lexical decision task (LDT) was used in order to compare N400 responses on L1 words and L2 words that were taught in the curriculum. In line with previous findings, the N400 amplitude in L2 was reduced when compared to L1 during the first experimental session. However, this difference declined over the course of L2 learning. It was concluded that in the course of one semester, with increasing proficiency, the processing of newly learnt L2 words evolves towards the L1 pattern. The two previous studies point to the usefulness of N400 for both word recognition and word meaning. As such, word-learning studies could be designed in such a way that they investigate to what extent newly learnt words elicit P600/LPC, which would be indicative of word form recognition only, and/or N400, which would also indicate the degree of facilitated access to meaning.

Use-related aspects
On the level of use, Nation's (2013) framework distinguishes between grammatical functions (i.e., the patterns in which a word is used), collocations, usage constraints, register and frequency. With respect to grammatical functions, an ample body of studies using sentence violation paradigms have shown that ERPs are sensitive to morpho-syntactic aspects. Some of these aspects (for critical reviews about morpho-syntactic processing in the brain, see Kaan, 2009;Kotz, 2009;and Steinhauer & Drury, 2012) relate to the grammatical patterns in which a word occurs. Examples include agreement violations between article and noun (e.g., A *BOOKS are on the table; Tanner et al., 2013), erroneous inflections (Davidson & Indefrey, 2009) and word-class violations (e.g., The man in the *DRINKS a coffee; Rossi et al., 2006). Morgan-Short, Steinhauer et al. (2012) studied the impact of implicit and explicit instruction on the learning of an artificial language called Brocanto2, consisting of 13 pseudo-words (1 article, 4 nouns, 2 adjectives, 4 verbs and 2 adverbs). Word order in noun phrases is based on the word category (i.e., noun-[adjective]-determiner). In the explicit condition, participants were exposed to metalinguistic input structured around word categories, followed by examples and practice. Sentences were manipulated by violating the expected grammatical pattern. Among other results, explicit metalinguistic training yielded P600 in high proficient learners. Yet, the high proficient learners in the implicitly-trained group showed a processing pattern that was typical of native speakers (i.e., an anterior negativity (AN) followed by a P600 and a late AN).
With respect to collocations, Siyanova-Chanturia et al. (2017) investigated the ERP signature of the processing of L1 multi-word units (i.e., the co-occurrence of words in specific linguistic configurations (p. 111)). Due to their prevalence in daily-life language use, predictability is a main characteristic of multi-word units. Previous studies on predictability and idioms had focused on the N400 component and found a smaller N400 for idiomatic phrases, when compared to literal, nonsensical and violated phrases. This was considered as a marker of processing ease of formulaic language. However, previous studies (cited in Siyanova-Chanturia et al., 2017) also demonstrated that a P300 component, linked to expectancy confirmation and template matching, could be elicited in the context of idiomatic sentence completions (e.g., en cuerpo y ALMA, 'in body and SOUL') and highly constraining contexts (e.g., the opposite of black is … WHITE). Siyanova-Chanturia and colleagues ran two experiments. In experiment 1, participants were presented with three stimuli types: (a) frequent binomial expressions (e.g., knife and fork), (b) infrequent but associated nouns (e.g., spoon and fork) and (c) non-associated semantic violations (e.g., theme and fork). It was hypothesized that the conjunction and would activate the lexical template in the first stimulus type and elicit a P300 response. In experiment 2, identical stimuli were used, except for the conjunction and. Hence, stimuli types were: (a) knife-fork, (b) spoon-fork and (c) theme-fork. It was hypothesized that in the absence of the conjunction ' and', the template and the concomitant P300 would not be activated. Results confirmed the hypotheses, in that P300 was elicited in experiment 1 (idiomatic phrases with the conjunction and) but absent in experiment 2 (word pairs consisting of the same nouns, but without the conjunction and). It was concluded that P300 was a marker for template matching in idiomatic language use.
Register and discourse types may also modulate L2 processing. Berger and Coch (2010) compared semantic processing in texted messages and standard English. Texted English differs from standard language, in that it is informal and a hybrid form of spoken and written discourse. Moreover, texted English contains abbreviations, misspellings, acronyms and symbols or digits that may represent syllables. Further, aspects such as subject pronouns and punctuation are regularly omitted. Therefore, as texted and standard English have a distinct lexicon and syntax, it has been argued that they may be considered as separate languages (Berger & Coch, 2010, p. 136). It was hypothesized that texted English sentences could yield N400 responses when compared to standard English. In line with the hypotheses, semantic anomalies in texted English showed a delayed N400 and an extended duration into the 500-700 ms time window, which pointed to the sensitivity to register.
Finally, ERP research has also addressed frequency effects. In an L1 ERP study, Perfetti et al. (2005) created word pairs in such a way that the first word was either a trained rare word, an untrained rare word, or an untrained familiar word. Participants were required to make relatedness judgements. Comparison of the ERPs of the second word in either related or unrelated word pairs showed, amongst other results, an N400 effect for trained rare words and untrained familiar words, not for the untrained rare words. These results were explained as a learning effect for the trained rare words and a frequency effect for the untrained familiar words.
Taken together, the results presented in this section show that ERP components (e.g., N170, P300, N400, P600 and LPC), can address form-, meaning-, and use-related aspects of Nation's framework. In the next section, it will be discussed that ERPs may also be indicative of the degree to which these aspects of word knowledge are mastered.

The incremental nature of word knowledge
L2 vocabulary research has shown that not all aspects of word knowledge are acquired simultaneously. Initial exposure to a lexical item may instantly leave a trace of a word's written or phonological form, while use-related aspects may be the last aspects to be mastered (Schmitt, 2010). Moreover, word knowledge is attrition-prone and needs recurrent encounters in order to be consolidated (Webb & Nation, 2017). In this section, we argue that ERP measures may shed light on how word knowledge arises and develops over time. In this respect, Borovski et al. (2012) claim that ERPs have the potential of "assessing more subtle 'in progress' aspects of word learning" (p. 280).
Some studies revealed that neural signatures of lexical learning in L1 were quickly detectable after instruction. Borovski et al. (2010) and Borovski et al. (2012) investigated how fast meanings of artificial words could be extracted from linguistic contexts in highly (e.g., He tried to put the pieces of the broken plate back together with GLUE) and low constraining sentences (e.g., She walked across the room to Mike's messy desk to return his GLUE). Both studies found that novel words used in highly constraining sentences needed only a single exposure in order to reduce the N400. Similarly, Mestres-Missé et al. (2007) used pseudowords to investigate how much exposure to newly learnt words was needed in order to detect first traces of learning. Learning consisted in one-word-at-a-time self-paced reading of sentence triplets in two conditions; that is, sentences that allowed for meaning derivation (M+), and sentences in which meaning derivation was not possible (M-). In a relatedness judgement task, it was found that three M+ exposures sufficed to yield N400 effects that were identical to known words. Participants in Perfetti et al. (2005) and Balass et al. (2010) did not derive word meanings from contexts but were required to study L1 definitions of artificial words. A relatedness judgement task showed that studying definitions was conducive to meaning integration processes immediately after training.
While the aforementioned L1 learning studies showed that integration of novel words is possible immediately after learning, other studies investigated the effects of a post-learning consolidation period. Consolidation refers to the process during which words represented as episodic memory traces in the hippocampal memory system are integrated in the more stable neocortical memory system (Davis & Gaskell, 2009), whereby sleep seems to play an important role (Tamminen et al., 2013). Bakker et al. (2015) investigated whether a 24-hour period would lead to the consolidation of newly learnt pseudo-words. One of the targeted components was N400. A reduction of the N400 difference between existing L1 English words and pseudo-words that were learnt 24 hours before EEG recording, when compared to pseudo-words learnt just before EEG recording, would be indicative of lexical consolidation. One critical item set was learned on day 1, the other set on day 2. ERPs and reaction times (RT) were recorded during a relatedness task on day 2, so that half of the items had undergone a 24-hour consolidation period. For the set of items learned on day 1, a reduced N400 response was observed when compared to items learned on day 2. The authors concluded that a 24-hour consolidation period had led to deeper lexicalization. Likewise, Havas et al. (2017) tested the effects of semantic and morphological learning one day after the learning phase. L1 Spanish or Catalan participants learned the meaning of pseudo-words on day 1. Without the knowledge of the participants, target words contained not only a stem (i.e., semantic learning) but also a gender-marking suffix (i.e., morphological learning). For instance, participants saw target pseudo-words referring to animals, paired with a picture of that animal dressed as a man or dressed as a woman. In the recognition task on day 2, participants were presented with pictures and letter strings, including the newly learned pseudo-words. Stimuli were created in such a fashion that (a) both stem and suffix matched the picture, (b) only the stem matched (c) only the suffix matched or (d) neither morpheme matched. Lexical violations, indexed by stem violations, yielded enhanced N400 responses, and morphological violations, indexed by suffix violations, elicited P600 responses. It was concluded that one learning cycle followed by a 24-hour consolidation period sufficed for lexical and morphological learning.
Knowledge increments have also been studied from a longitudinal perspective. In their hallmark L2 vocabulary learning study, McLaughlin et al.'s (2004) goal was to determine how much L2 exposure was needed before brain responses would show learning effects. First year L1 English students in L2 French were presented with a LDT consisting of prime-target pairs of letter strings. Pairs were L2 French related words, unrelated words or pairs with a pseudo-word target. Learners were tested three times: After 14 hours, 60 hours and 140 hours of instruction. After the first test session, unrelated word pairs did not show an N400 effect, but pseudo-words elicited larger N400s than real words, although the participants' behavioural performance was at chance level. The authors interpreted this finding as the participants' ability to distinguish between existing and non-existent L2 word forms after 14 hours of L2 instruction. For unrelated word pairs, N400s were observed after 60 hours of instruction. In the last ERP session, N400s on pseudo-words and unrelated word pairs patterned to the typical L1 profile, even though behavioural performance remained near-chance. The authors concluded that L2 form-related word knowledge accrues with remarkable speed and is established before meaning knowledge. Moreover, this study showed that behavioural results might underestimate the learning that takes place on the neurocognitive level.
In a study carried out over the course of three years, Ojima et al. (2011) investigated the development of L2 word knowledge in Japanese children. Through a semantic relatedness paradigm, it was found that the developmental changes indexed by N400 followed a trajectory identical to N400 changes in L1. While no N400-like activity was detectable at the initial stage of word learning, the final stage was characterized by a N400 followed by a LPC, a pattern usually found in high proficiency learners and indicating qualitatively fully L1-like processing. Osterhout et al. (2019) investigated how first-year, adult learners of L2 Finnish and native speakers of Finnish processed L2 letter strings that either obeyed or violated the Finnish vowel harmony system. In a visual LDT, participants were presented with real Finnish words (tuoli, ' chair'), pseudo-words that followed the phonotactic vowel harmony rule (*louti) and pseudo-words violating the vowel harmony system (*tyoli). In a LDT, natives showed a P600 effect for pseudo-words that violated the vowel system, indicating that native-like phonotactic processing was indexed by P600. L2 Finnish immersion students were tested three times throughout a period of 9 months. It was found that in early stages of learning, violations caused an N400 effect, which was explained as a pseudoword effect. However, near the end of the immersion period, vowel harmony violations elicited a robust P600 effect, similar to that of natives. It was concluded that the shift from N400 to P600 reflected a gradual process towards native-like rule processing. Remarkably, a subset of participants was tested more than nine months after instruction, and the results showed again the N400 effect, which was interpreted as an attrition effect of the vowel harmony rule.
A developmental trajectory has also been proposed for morpho-syntactic features. A number of studies addressed the question whether learners could achieve native-like neural processing (Bowden et al., 2013;Morgan-Short, Steinhauer et al., 2012) and whether ERP patterns changed over the course of L2 acquisition. Findings in those studies have suggested a developmental model consisting of discrete phases (Steinhauer et al., 2009). In early stages or at low proficiency, violations are indexed by N400, which suggests that the morpho-syntactic anomaly is perceived as a lexical problem. This mechanism may reflect a semantic compensatory processing strategy or the interference of explicit rule knowledge. It may indicate that new linguistic rule-based items are first processed as unanalysed forms. A small and/or delayed P600 component emerges with beginning grammaticalization of new forms. At full proficiency, L2 learners show the biphasic pattern that characterizes native processing (i.e., (E)LAN followed by P600).
Recently, the generalizability of this developmental model has been questioned, as P600s were observed in relatively low proficiency L2 learner groups, but not in relatively proficient L2 learners (for an overview, see Tanner et al., 2014). Similarly, in contrast to the predictions of the developmental model, N400 had been reported in high proficiency groups. For these reasons, it has been argued that learners show a preference for either an N400 or a P600 processing stream, in that N400-dominant individuals might more heavily rely on memory-based heuristics while P600-dominant individuals might preferentially rely on combinatorial processing.
Taken together, previous investigations have shown that ERP research may be an effective technique in order to investigate the incremental nature of vocabulary knowledge. First, ERPs have the capability of indicating different strengths of knowledge representation, and second, ERPs can show patterns that reflect developmental stages of L2 learning.

ERP research and L2 instruction
Previous studies demonstrated that ERPs may be sensitive to variables related to L2 instruction, such as proficiency (Elgort et al., 2015), individual differences (Tanner et al., 2013), immersion (Bowden et al., 2013) and the mid-and long-term effects of L2 classroom instruction (Soskey et al., 2016). Yet, while it is widely accepted in L2 instruction that an explicit focus on grammatical and lexical items yields the best learning effects (Spada & Tomita, 2010;Webb & Nation, 2017), remarkably few ERP studies sought to shed light on knowledge that has been acquired through either explicit or implicit instruction (i.e., the presence or absence of an attentional focus on L2 features). Batterink and Neville (2011) conducted a study in which pseudo-words were either contextually embedded in a reading text (implicit) or learned through rote memorization (explicit). N400 recorded during a LDT showed that contextual embeddedness led to more robust lexical representations of new items. Conversely, in a 9-week vocabulary study, Chun et al. (2012) compared L2 English vocabulary learning in L1 adult Korean natives through extensive reading and paired-associate learning. In this study, behavioural and neurocognitive N400 findings converged and pointed towards the superiority of paired associate learning for long-term vocabulary retention. Yet, in a study on learning L2 English words in reading passages or through word-lists, Choi et al. (2014) found no ERP evidence of learning, although word-list learning yielded better behavioural results.
Some L2 laboratory studies also compared implicit and explicit training on the morpho-syntactic level. Batterink and Neville (2013) immersed two groups of participants, without previous experience with L2 French, for one hour in a miniature version of the target language. The group with implicit instruction was asked to try to understand L2 French sentences without additional metalinguistic information. Contrarily, the group with explicit instruction was exposed to the same sentences but learned also the rules related to article-noun agreement, subject-verb agreement and subject-verb-object word order. On the behavioural level, the explicitly trained group outperformed the implicit group. However, in both conditions, only learners who were capable of identifying violated sentences (i.e., article-noun, subject-verb and word order violations) on the behavioural level, showed P600 effects, indicating that P600 correlated with behavioural proficiency. Morgan-short, Steinhauer et al. (2012) studied the effects of implicit (i.e., approximating immersion settings) versus explicit training (i.e. approximating traditional L2 classroom settings) of the artificial language Brocanto2. Participants were administered three training sessions followed by a behavioural and ERP test session at the end of the third session. Stimuli were sentences that either followed or violated the Brocanto2 word order rules that are based on word category. Results showed that neither group outperformed the other on the behavioural level. However, ERPs showed that implicit learning led to developing an N400 in low-proficiency learners and a native-like biphasic AN-P600 pattern in high-proficiency learners. In contrast, in the explicit condition, lowproficiency learners did not show any significant ERP pattern and high proficiency learners showed a late anterior positivity followed by P600, pointing to the development of morpho-syntactic knowledge, albeit without the native-like AN-P600 pattern. In a follow-up experiment, in which 21 participants were tested again after 3 to 6 months without exposure (Morgan-Short, Finger et al., 2012), it was found that behavioural performance had remained stable. On the neurological level, however, findings suggested that periods with no L2 exposure could have beneficial effects, in that increased native-like ERP responses were found for both implicit and explicit instruction. Findings in the above discussed studies show diverging outcomes with respect to the ERP signature of implicit and explicit instruction. One explanation for the discrepancy between ERP and behavioural results is that ERPs may only reflect subset cognitive processes that contribute to L2 processing (Kutas & Federmeier, 2011). Another explanation is that ERPs may be markers of very early knowledge development that yet not has reached the threshold for behavioural detection (e.g., McLaughlin et al., 2004).
In order to efficiently address L2 pedagogy-driven research questions, more interaction between neurolinguists, SLA experts and language teachers has been advocated (Morgan-Short & Ullman, 2014;Rastelli, 2018). Ullman and Lovelett (2018), for instance, suggest how the declarative/procedural model can be used to empirically test the impact of L2 instruction in both the context of grammar and vocabulary. The model postulates two different but interacting memory systems (i.e., a declarative and a procedural memory system). In both L1 and L2, the declarative system is said to underlie knowledge available to conscious awareness and is thought to represent non-derivable information (e.g., form-meaning mapping, sound-meaning mappings, irregular morphology, idiomatic knowledge). The procedural system hosts knowledge that is not available to conscious awareness and subserves cognitive aspects such as categories, sequences and rules. In both L1 and L2, all types of knowledge are assumed to be learnt first in declarative memory, while, in parallel, the procedural system gradually acquires knowledge related to regularities, sequences and categories. Importantly, both systems independently acquire knowledge that may be analogous to some extent. It is assumed that learning and consolidation in the procedural memory system may peak during childhood and then decline, while declarative memory develops during childhood and improves until early adulthood. Therefore, L1 acquisition and L2 learning through instruction may use both memory systems differently, in that instructed L2 learners may show more reliance on their declarative system, especially at the initial stages of learning. Consequently, in order to optimize vocabulary learning and retention in the declarative system, investigating the neural underpinnings of, for example, spaced repetition (providing temporal gaps between repeated encounters with identical items) and retrieval practice (retrieving learned information from memory) may be a valuable research avenue.
The previous paragraphs have shown that variables related to L2 instruction, such as the role of explicit attention and proficiency, may actually have an impact on L2 neurocognitive processing. However, due to mixed patterns in the findings, it is not yet clear whether and how instruction and neurocognitive processing may correlate. Therefore, it has been suggested that future ERP research might also adopt an explanatory approach and focus on a deeper understanding of the observed processes (Morgan-Short & Ullman, 2014). Likewise, Brouwer and Hoeks (2013) state that the functional interpretation of language-related ERP components is not yet agreed upon. They therefore advocate a combination with other neuroimaging techniques such as fMRI, which could provide information about the brain areas that underlie the processes detected by ERPs. Additionally, complementary insights into learning processes may be gained through recent advances in technology such as portable EEG systems that allow for exploring real-world classroom dynamics (Dikker et al., 2017).

Conclusion
The studies reviewed in this article show that ERPs may be a valuable sensitive measure in L2 vocabulary research for two main reasons. First, ERPs may address form-, meaning-and use-related aspects of word knowledge. Second, ERPs may refine our understanding of the incremental nature of lexical knowledge. Furthermore, it has been argued that future ERP research might be guided by L2 pedagogy-driven research questions and focus on a more thorough understanding of the processes involved in L2 word processing. In sum, the present review adds to the increased interest of L2 vocabulary researchers in sensitive measurement techniques, which are expected to impact further research into and refine our understanding of L2 vocabulary learning.

Funding information
This work was funded by the Research Foundation -Flanders (FWO), grant G064116N.