Mental simulation of object orientation and size: A conceptual replication with second language learners

Previous research suggests that native (L1) speakers employ “mental simulations” for language comprehension. Empirical work shows that intrinsic object properties (shape, size and color) are indeed simulated, but the evidence for extrinsic properties (orientation) is less convincing. There is little work on simulation in second language (L2) learners, but since they have similar perceptual experiences as L1 speakers there is good reason to think that L2 learners too use simulation to comprehend L2 sentences. This paper aims to conceptually replicate previous simulation studies into object size and orientation with L2 learners (N = 223) and two L1 speaker control groups (N = 64). An important difference with previous work is that we use language-specific forms indicating size (Spanish augmentative suffixes) and orientation (German placement verbs). We expected that language-specific forms would cause simulation for both the intrinsic and extrinsic property under investigation. We employed a sentence-picture verification task and analyzed Yes/No responses and reaction times (RTs). RT results on mis/match trials reveal no orientation effect, but a size match effect. Findings support previous research with null results for orientation and add support for size simulation. We suggest that future studies examine whether L2 learners make simulations for both implied and explicit sentences, whether they simulate with or without prior language instruction and whether they also simulate shape and color.


Introduction
Successful second language (L2) learners can comprehend written text in their L2. Yet, how is this accomplished? Up until fifteen years ago, the mainstream view was that the human mind handles language as a computer does. This means it combines abstract, amodal and arbitrary symbols (i.e., words) with syntactic rules (e.g., Burgess & Lund, 1997;Chomsky, 1980;Fodor, 2000;Kintsch, 1988, Pinker, 1994. The main problem with this conceptualization of cognition is that it has no connection to actual experience. A classic example of this problem is the "Chinese Room" argument. Suppose a foreign visitor lands at a Chinese airport not knowing the local language, but carrying a Chinese dictionary. When interpreting airport signs, the traveler will become stuck in an endless loop of abstract symbols, as every definition in the dictionary refers to other symbols. This has been referred to as the "symbol grounding problem" (cf. Harnad, 1990). In recent years, "grounded cognition" theory has proposed another perspective on cognition: human thought and language are shaped by our bodily actions and grounded in our perceptual experiences with the world (Barsalou, 1999a(Barsalou, , 2008Glenberg, 1997;Glenberg & Robertson, 1999, 2000Lakoff, 1987).
From a grounded cognition perspective, Barsalou (1999aBarsalou ( , 1999b suggested that language comprehension is driven by so-called "mental simulations". For example, when reading the word cup, the human conceptual system construes the perceived word as an instance of the physical object. To accomplish this, the conceptual system binds the token in perception (i.e., the word cup) to knowledge for general types in memory (i.e., concepts) (Barsalou, 1999b). This process involves a reactivation of neural states with information about our experiences with cups in the real world (e.g., their shape, size, color and position) and is referred to as mental simulation. Empirical work has supported the idea that native speakers (L1) make simulations of intrinsic object properties, such as shape (Zwaan, Stanfield & Yaxley, 2002;Zwaan & Pecher, 2012), color (Connell, 2005(Connell, , 2007Zwaan & Pecher, 2012;Hoebaert Mannen, Dijkstra & Zwaan, 2017) and size (Koning et al., 2016;Koning et al., 2017). Empirical results for orientation, an extrinsic property, either support (Stanfield & Zwaan, 2001;Zwaan and Pecher, 2012) or do not support simulation (Rommers et al., 2013;Koning et al., 2017). In these studies participants read sentences such as "She looked at the bone of a dinosaur". Subsequently they see a bone that matches (large) or mismatches (small) the object size implied by the sentence (Koning et al., 2016). They are asked whether the depicted object was mentioned in the preceding sentence or not (see Figure 1). It is argued that lower reaction times (RTs) to matching pictures support the notion that speakers have mentally simulated object size during sentence comprehension. In other words, a comparison with a simulated model that matches takes less time to react to than a comparison with a mental simulation that mismatches.
As L2 learners have similar perceptual experiences with objects as L1 speakers there is good reason to think that L2 learners also use mental simulation to comprehend L2 sentences. Yet, to our knowledge, so far only two studies have addressed simulation in L2 learners. Vukovic and Williams (2014) found that advanced Dutch learners of L2 English simulate L1 meanings of interlingual homophones while comprehending L2 English. For example, participants heard "On the plate in front of you/at the far end of the table, you can see a bone". Subsequently they saw a bean, in Dutch "boon"/bo:n/, that varied in size (large/small), such that it mis/matched the distance implied by the different sentence introductions. Participants were slower to reject critical items (e.g., bean instead of bone) where perceptual features matched the implied distance relationship. This suggests that L2 learners activated task-irrelevant meanings of interlingual homophones and that during L2 processing, mental simulation in the L1 may take place. With a variation on the sentence-picture verification (SPV) task used in previous work, Tomczak and Ewert (2015) studied how advanced Polish learners of L2 English process sentences describing fictive and real motion. They presented participants with prime words (e.g., a verb indicating horizontal or vertical motion) that mis/matched with a subsequent Polish or English sentence. They asked participants to make meaning judgments about these sentences and registered Yes/No answers and RTs. They found longer RTs for fictive motion trials as compared with real motion trials in both Polish and English. They interpreted this result in favor of mental simulation of motion in L2 learners.
The aim of this study is to examine whether L2 learners simulate object orientation and size, by conceptually replicating previous studies with L1 speakers. Conceptual replications test the underlying hypothesis of the original study by using a different method or measure (Leow, 1995) and are needed to validate and expand previous findings (Marsden et al., 2018). Our hypothesis is that, like L1 speakers, L2 learners make simulations of object orientation and size during sentence comprehension. As in previous studies (e.g., Stanfield & Zwaan, 2001) we employ an SPV task. Different from previous studies, we do not use sentences where properties are implied by the context (e.g., "The carpenter hits the nail into the floor", implied orientation of the nail: vertical), but we use sentences with linguistic forms that explicitly indicate a property. For object orientation, these are the German verbs legen/stellen [lay/stand], that indicate the end position of an object being placed (Berthele, 2012). We expect that, contrary to previous null results for implied orientation, the explicit verbs lead to univocal simulations of object orientation. To examine object size, we employ Spanish augmentative suffixes that indicate large object size (Gooch, 1967). We expect that these suffixes lead to univocal simulations of object size, comparable with previous results for implied size. Another important difference with previous studies with L1 speakers is that L2 learners in this study had metalinguistic knowledge of the German verbs and Spanish augmentative suffixes. The learners were instructed on these forms, as it was essential to ensure their comprehension before embarking on the SPV task.

What affects Simulations?
The claim that readers activate sensory and motor information while comprehending language is supported by a growing number of studies (Barsalou, 2008;Pulvermüller, 2013;Lupyan & Bergen, 2015). Moreover, in a meta-analysis, Kiefer and Pulvermüller (2012), show evidence that action and perception circuits in the brain, which contribute to comprehension, are interdependent. Tomasello, Garagnani, Wennekers and Pulvermüller (2017) showed with a neurocomputational model that the activation of such areas is probably fast and nearsimultaneous. Koning and Schoot (2013) point out that much empirical work has focused on the visual modality, though mental representations should involve information from all sensory modalities. Recently, individual differences in simulation have gained attention. Vukovic and Williams (2015) examined whether their participants preferred egocentric or allocentric reference frames. In an egocentric frame, one represents the location of objects in space relative to one's body axes (left-right, frontback, up-down), whereas an allocentric frame encodes information about the location of one object with respect to other objects. In one of their experiments, they found that only the egocentric group showed a match effect for simulation of perspective marked by language with the SPV task. Simulation of multiple properties within subjects has also been examined recently. Koning et al.
(2017) applied a within-subjects design to the SPV-task and investigated whether the same participants simulated color, shape, size and orientation. Results showed that match effects were strongest for color, followed by shape and then size and that there was no effect for orientation (the latter result aligns with Rommers et al., 2013).
Importantly, several studies have started to unravel which components of an utterance drive simulation. Research has distinguished a critical role for lexical items (nouns and verbs) and sentential context. Considering sentential context, several authors have theorized a prominent role for grammar (Kaschak & Glenberg, 2000;Feldman, 2006). Bergen and Wheeler (2010), for example, argue that grammatical aspect affects mental simulation. Using the Action-sentence Compatibility Effect (ACE) methodology (Glenberg & Kaschak, 2002), they found that progressive sentences about hand motion facilitate manual actions in the same direction, while perfect sentences do not cause this effect. It is well-established that lexical items become active during language processing (Colunga & Smith, 2005;Pulvermüller, 2013). Sato et al. (2013) examined the effect of motion verbs on simulations of object shape. In contrast with head-initial languages like English, in headfinal languages like Japanese, verbs typically occur in sentence-final position. The authors found that Japanese speakers initiate mental simulations early in a sentence through semantic context. However, they rapidly modified presentations if the prior context was followed by a motion verb that implied a certain object shape (e.g., The kimono was torn/hung to dry), showing that speakers rapidly update simulations during sentence comprehension. In the current study, we look at German motion verbs that indicate object orientation and Spanish augmentatives that indicate object size. To our knowledge, no previous studies have looked at the role of these linguistic forms in mental simulation.

Placement Verbs and Augmentatives
To examine orientation simulation in the present study we employed German placement verbs. Placement verbs are employed to describe motion events, which have received much attention in cross-linguistic studies (Kopecka & Narasimhan, 2012). The main reason for this attention is that "putting" actions are part of everyday human experience, and verbs describing these actions are among the most frequent and earliest learned verbs in a language (Levinson, 2012). Research shows that Germanic languages (e.g., German, Dutch, Danish, Swedish) employ a set of colloquial placement verbs, which indicate the position (horizontal vs. vertical) of a given object with respect to a surface. In German, these are legen [lay] and stellen [stand] (Fagan, 1991;Berthele, 2012). In contrast, Romance languages (e.g., Spanish, French) do not employ such verbs. In Spanish for example, the L1 of learners in the present study, verbs like poner [put] or dejar [leave in a place] are used, that do not indicate object orientation in relation to a surface (Cadierno, Ibarratxe-Antuñano & Hijazo-Gascón, 2016). As German placement verbs indicate object position more explicitly as compared with previous studies where orientation was only implied, we expect to find support for orientation simulations with a SPV task (Hypothesis 1).
To examine size simulation in this study we employed Spanish augmentative suffixes. Augmentatives are morphological forms of a word that are primarily used to indicate large size (Gooch, 1967). 1 They are related to diminutives which are primarily used to indicate small size and are frequently used (Savickiene & Dressler, 2007), although their usage frequency may differ amongst Spanish speaking communities (Butt & Benjamin, 2005). Some languages (e.g., German, Dutch, English) use prefixes to indicate augmentation. For example, in German, the L1 of learners in our study, one needs to add nouns such as Riese(n)-, Bombe(n)-, or Spitze(n)-to the base noun one wishes to augment, as in Riesenstadt [very large city] (Lohde, 2006). Other languages (e.g., Spanish, Greek, Romanian) employ suffixes to augment nouns. Suffixes are morphemes added at the end of a word to form a derivative. In Spanish, large size is indicated by adding an augmentative suffix like -ón, -azo, or -ote to masculine nouns or -ona, -aza or -ota to feminine nouns (Gooch, 1967). For example, un libro [a book] would become un librote in case it is a large, heavy book. As Spanish augmentatives indicate size more explicitly than previous studies, we expect to find clear support for size simulation with an SPV task (Hypothesis 2).

Simulating L2 Forms
A prerequisite for completing an SPV task is knowing what linguistic forms mean. Whereas L1 speakers acquire language implicitly from naturalistic exposure in situations where caregivers naturally scaffold development, most foreign language learners acquire their L2 through classroom instruction (Ellis & Laporte, 1997). Norris and Ortega (2001) concluded from a meta-analysis with over 40 studies, that instruction is effective in helping L2 learners establish form-meaning connections (irrespective of type of instruction). In previous simulation studies with L2 learners, it is not reported whether and how authors have ensured that learners knew critical forms (Vukovic & Williams, 2014;Tomczak & Ewert, 2015). We think that authors assumed that learners knew critical forms as they worked with advanced learners. For the present study, we carried out two pilot studies with learners of L2 German (N = 34) and learners of L2 Spanish (N = 28). These studies showed that only three learners knew the meaning of legen/stellen [lay/stand] and four knew augmentative suffixes, irrespective of proficiency level.
We reasoned that it makes little sense to study simulation based on forms that many participants would not understand. We thus decided to instruct learners in the present study on the critical L2 forms before presenting them with the SPV task (see Table 1 for pre-and postinstruction test scores on an 18-item test). We realized that the instruction for L2 learners jeopardizes the comparability of our results with previous L1 simulation studies. In our study, L2 learners were primed to think about orientation and size before the SPV task, whereas L1 speakers in previous studies and the present study were not. In previous work with L1 speakers, mental simulations are taken to be performed in an unconscious, routinelike manner that is out of control of the comprehender (Zwaan & Pecher, 2012). Barsalou (1999b: 577) however, has written that "A perceptual state can contain two components: an unconscious neural representation of physical input, and an optional conscious experience." We acknowledge the possibility of this optional, conscious experience that Barsalou describes, arising in case L2 learners in the present study make perceptual simulations. Simultaneously, we point out that this optional, conscious experience is ecologically valid to at least some degree, as metalinguistic awareness is an inherent quality of L2 learners (Pavlenko, 2016).

Method
In Experiment 1 we investigated whether L1 German speakers and Spanish learners of L2 German simulate object orientation. In Experiment 2 we explored whether L1 Spanish speakers and German learners of L2 Spanish simulate object size. All participants completed an SPV task. L2 learners received instruction on the meaning of German placement verbs (Experiment 1) and Spanish augmentative suffixes (Experiment 2) before the SPV task.

Participants
For Experiment 1, we recruited 122 Spanish learners of L2 German and a control group of 30 L1 German speakers. The sample size of the L1 group approached the sample size in the original orientation study by Stanfield and Zwaan (2001). We admitted learners with proficiency level A2 2 and higher to test sessions as these learners can understand simple instructions and process simple sentences in the L2 without problems. The aim was to test learners with beginner (A2), intermediate (B1) and advanced (B2+) levels and constitute three groups with similar sample size. The L1 speakers were recruited at the University of Bremen in Germany (none knew Spanish); the L2 speakers studied German at the University of Seville or Granada in Spain. For Experiment 2, we recruited 100 German learners of L2 Spanish and a control group of 34 L1 Spanish speakers. The sample size of the L1 Spanish group was roughly matched to the L1 German group's size. Again, we admitted learners with proficiency level A2 and higher. The aim was to constitute a sample roughly comparable to the German sample. The L1 speakers were students at the University of Seville in Spain (none knew German); the L2 learners studied Spanish at University of Münster or Humboldt University Berlin in Germany. We paid subjects a nominal fee for participation. Details of participants are given in Table 1.

Materials
The (randomized) trials were sentences followed by blackand-white drawings of objects. The crux of each trial was to answer Yes or No to the question as to whether a preceding sentence had mentioned the object pictured (see Figure 1). Eight critical objects were identified in a pilot study as orientation-free objects, i.e., objects that could appear both in vertical and horizontal position (e.g., lipstick, battery, flashlight, bell, spool, deodorant, tube, glue stick). 3 In addition, the critical objects could appear in prototypical, real-life size (3.5 × 3.5 inch) as well as in large (screen filling) size on a (17 inch) computer screen. In the black-and-white drawings, the eight critical objects were presented in varying positions (horizontal, vertical) and sizes (large sized and normal sized). We then created critical sentences describing placement events (e.g., Mary puts the lipstick on the table). In German, these sentences marked object orientation by either legen [lay] or stellen [stand], while Spanish sentences marked object size by including either an object noun with an augmentative suffix or an object noun without an augmentative suffix. See Figure 2 for examples of critical sentence-picture pairs. Finally, we created filler sentences followed by black-and-white drawings of objects in their prototypical position. The filler sentences described people performing an action with different objects (e.g., lemon, sponge etc.) and did not indicate object orientation or size.

Design
We aimed to discover whether linguistic information that mis/matches with depicted objects, affected recognition time of these objects. For each object, we created four sentences in German obtained by 2 (match/mismatch) × 2 (vertical/horizontal position) design and four sentences in Spanish obtained by 2 (match/mismatch) × 2 (large/normal-sized) design, yielding 32 (4 × 8 objects) critical trials per language. To reduce the duration of the experiment, we distributed these trials over two lists (list A and B) with 16 trials each. In the German version, there were four match-horizontal, four mismatch-horizontal, four-match vertical and four-mismatch vertical trials. We selected either two matching or two mismatching sentences per object for the two lists. 4 Since the pilot study showed that objects were neutral with respect to orientation, we did not consider object as a factor in the analysis. The critical 16 trials per list were Yes sentences for the eight critical objects (we expected participants to answer Yes to both match and mismatch sentences, since the mentioned object was being depicted). We augmented each list with 16 further Yes trials for the same eight objects, yielding 32 trials for the eight objects. We further augmented the lists with 32 No trials for the same eight critical objects (here, an object appeared that was different from the one previously mentioned, thus the expected answer was No). Finally, we introduced 64 filler trials per list for eight different objects. Again, on half of the trials we expected Yes answers; on the other half, No answers. In total, each list contained 128 trials. Table 2 shows the distribution of subjects among the lists.

Procedures
Participants took part in the computer experiment in groups of 5-20 in a quiet computer room. L2 learners were instructed in the L2; L1 speakers in their L1. L2 participants had the option to ask questions, if they had any, in the L1. The instructions were: 'In this experiment you will read sentences, followed by pictures of objects. Your task is to determine whether the shown object was mentioned in the sentence you read before. Please answer the question 'Was this object mentioned in the previous sentence?' Give your answer by pressing Q(Yes) or P(No).' Participants were told that RTs were being measured. They were asked to make decisions about the pictures as quickly as possible and to keep their fingers on the Q and P button during the whole experiment. L2 learners participated in two instructional activities before embarking on the computer experiment. This was to ensure that all L2 learners knew the meaning of target object nouns, verbs and suffixes. First, they received class-fronted instruction led by the experimenter in the L2, where the relevant forms (placement verbs or augmentative suffixes) and their meanings were discussed (see Appendices 1 and 2 for instruction; and Table 1 for results). Second, learners studied a randomized list with 30 L2 object nouns (15 target words, 15 distractors) with L1 translations (Appendices 3 and 4). After the instructional activities, the students completed the computer experiment as described above (Appendices 1-4 are available through IRIS). 5

Results
Previous studies into mental simulation have analyzed aggregated RTs to mis/match trials with ANOVA. Yet there are several good statistical reasons to use linear mixed models (LMMs) instead (see Luke, 2004). A key reason is the fact that (multilevel natured) data needs to be aggregated to perform ANOVA, while LMMs consider all observations in a leveled or nested manner. In 4.1 and 4.2, we thus use LMMs to analyze RTs (measured in milliseconds) to pictures in orientation (Experiment 1) and size (Experiment 2) trials. We considered the variables of each language (L1, L2-A2, -B1, -B2+), trial type (mis/ match), list (A/B), trial and subject. Where we found that trial type was a significant factor in the LMM model that we defined, and that RTs to match trials were faster than  those to mismatch trials, this could be interpreted as support for mental simulation. Our model included the random effects "subject nested in list" and "trial nested in type and list". The other factors were considered fixed effects. Although controversial (see Bates et al., 2015), we report p values for significant predictors with the ANOVA Type III Satterwhite approximation. As in previous studies, we only analyze Yes responses to mis/match trials, but the percentage of No responses is reported below. To enable a more direct comparison with previous studies, we also provide mean and median RTs, p values and effect sizes calculated with ANOVA in section 4.3. We only report the most important results for reasons of space, but full analyses and R scripts are accessible online. 5,6

Experiment 1: Orientation in L2/L1 German
The total number of No responses to mis/match trials summed 5% for L1 speakers and 14.3% for L2 speakers and was separated from Yes responses. Analyses were performed with 1986 observations by 151 subjects after removal of subjects with missing values. The distribution of RTs had a longer tail on the right and it never reached zero. We therefore performed logarithmic transformation, which improved our diagnostic plots. We found that L1 speakers reacted significantly faster to mis/match trials than L2 speakers. The fixed effect "trial type" did not improve the model at a significance level of .05. This was confirmed by a likelihood ratio test. There were no significant interactions. Figure 3 shows that RTs for match trials were not significantly different from those to mismatch trials. All in all, this analysis does not support Hypothesis 1 (simulation of orientation).

Experiment 2: Size in L2/L1 Spanish
The total number of No responses to mis/match trials summed 11.6% for L1 speakers and 18.4% for L2 speakers and was separated from Yes responses. Analyses were performed with 1629 observations by 134 subjects. The RT distribution was similar to the distribution in Experiment 1, so we performed logarithmic transformation. We found that L1 speakers reacted significantly faster to mis/match trials than L2 speakers. The fixed effect "trial type" also improved the model significantly, with p = .0475. The indication was supported by a likelihood ratio test. There were no significant interactions. Figure 4 shows that RTs for mismatch trials were slower than RTs for match trials. All in all, this analysis supports Hypothesis 2 (simulation of size).

Comparison with previous studies
To enable a more direct comparison with previous studies, we also ran a mixed ANOVA with aggregated data for orientation and size (see Table 3). We crossed trial type (mis/match) with language group (L1, A2, B1, B2+) and looked for main effects for orientation and size, with faster reaction times for match trials. 5 We analyze orientation median RTs from untrimmed data to enable comparison with Stanfield and Zwaan (2001) and Zwaan and Pecher (2012). We also performed orientation and size analyses with mean RTs, where RTs faster than 300 and slower than 3000 ms were removed, as well as responses more than 2 SDs from the participant's mean in that condition. For orientation, trimming resulted in removal of 0% of the data for L1 speakers and 8.1% for L2 speakers. For size, 4.4% of the data for L1 speakers and 12.5% for L2 speakers was removed. This enabled a comparison with Rommers et al. (2013), Koning et al. (2016) and Koning et al. (2017). See Table 3 for results.
In line with analyses in 4.1 and 4.2, we found no main effect of trial type for orientation, with p > .176, but we found a main effect for size, with p = .007. Participants reacted faster to size match trials (M = 1084) than mismatch trials (M = 1211). We found no significant interactions. Both in Experiments 1 and 2, L1 speakers reacted faster than L2 speakers with p < .010, but there were no significant RT differences between the L2 groups. Table 3 shows collated L1 and L2 speaker RTs since main effects of trial type are based on collated RTs. We can observe that orientation RTs in our sample with L2 speakers are not much higher than those in previous studies with L1 speakers. In contrast, we see that size RTs in our sample with L2 speakers are higher than those of L1 speakers in previous studies but comparable to those of L1 speaking children (Koning et al., 2016). Also note that RTs for object size are generally higher than RTs for object orientation. The previously reported effect sizes  (d and η p 2 ) for both orientation and size effects correspond to ours and indicate small effect sizes.

Discussion
The aim of the present study was to examine whether L2 learners simulate object orientation and size, by conceptually replicating previous studies with L1 speakers. As in previous studies we employed an SPV task. Different from previous studies, we did not use sentences where object orientation or size was implied by the context, but employed sentences with linguistic forms that explicitly indicated the object property. For object orientation, these were the German verbs legen/stellen [lay/stand]; for object size, these were Spanish augmentative suffixes. We predicted that these explicit linguistic forms would lead to univocal simulations of object orientation and size in L2 learners. Different from L1 speakers in previous studies, our learners were instructed on the critical forms to ensure their comprehension before embarking on the SPV task.

Language-specific forms and Simulation
Our results did not support the hypothesis that German placement verbs lead to simulation of object orientation, even though these verbs explicate orientation, and even though L2 learners were instructed on their meaning before the SPV task. Our findings add to negative evidence for implied orientation simulation in L1 speakers in Rommers et al. (2013) and Koning et al. (2017). However, our results are not in line with match effects reported by Stanfield and Zwaan (2001) and Zwaan and Pecher (2012), and match effects for motion verbs that indicate shape, as reported by Sato et al. (2013), and effects of motion simulation through motion verbs reported by Tomczak and Ewert (2015). Therefore, how can we explain the different findings in simulation studies that focus on object orientation? Zwaan (2014) has argued that failures to replicate (e.g., Rommers et al. (2013) are due to deviations (e.g., different design, different stimuli, lack of comprehension questions and insufficient power) from Stanfield and Zwaan's (2001) original orientation experiment and thus no meaningful comparison is allowed. However, the study by Koning et al. (2017) contained four SPV tasks (for object color, shape, orientation and size) that were comparable in design, stimuli and followed the same procedures. It is thus unlikely that the absence of an orientation effect in their study emanates from differences between tasks. A more likely explanation is that, in line with Koning et al. (2017) and Connell (2005Connell ( , 2007, orientation, in contrast with size, is an extrinsic property and its role in simulation is therefore less important than previously thought. The absence of an orientation match effect in our study -even with verbs that explicitly mark orientation -adds support to this idea. An alternative explanation, discussed by Koning et al. (2017), is the following. Yaxley and Zwaan (2007) have suggested that extrinsic visual object properties initially do elicit automatic activations, but that through additional processes, such as mental rotation, corresponding mismatches are nullified. Koning et al. (2017) however, argue that the rotation explanation is less plausible, as a previous study showed no reliable correlation between mental rotation and the magnitude of the orientation match advantage (Stanfield & Zwaan, 2001). The authors also argue that additional time-consuming processes do not match with the fact that in their study, the overall RTs were faster for orientation than for the other visual properties. What the current study can contribute to this debate is that we too found faster orientation times as compared with size times. Tomasello et al. (2017) have also shown with neurocomputational modelling that activation of different brain mechanisms should be fast and near-simultaneous, which supports the argument by Koning et al. (2017). Future research could try to unravel the issue by directly addressing the recruited brain areas with time-sensitive measures (e.g., MEG/EEG). How can we explain the discrepancies with previous studies on motion verbs? Results for motion simulation through motion verbs by Tomczak and Ewert (2015) are not directly comparable to ours, as they did not employ an SPV task, but a meaning judgment task. Yet it is not surprising that they found different RTs, as the literature shows that motion verbs have a considerable neurological impact (see Pulvermüller, 2005;Vigliocco et al., 2011 for overviews). Wallentin et al. (2011) for example, showed that motion verbs in sentences activate the temporal cortex despite a static context (e.g., "The path comes into the garden"), whereas static verbs do not. Future studies need to uncover whether placement verbs too affect neurological processes (to our knowledge, there are none so far). Yet even when such neurological effects can be documented, this does not have to lead to mental simulations per se. Thus far, it is unknown what type of neurological processes are involved in mental simulations (but see Pulvermüller, 2013). Clearly, this theoretical issue should be discussed as empirical (conflicting) evidence accumulates. The fact that Sato et al. (2013) found effects for verbs that describe object shape is unsurprising, since simulation effects for object shape have been consistent in the literature (Zwaan, Stanfield & Yaxley, 2002;Zwaan & Pecher, 2012;Koning et al., 2017), whereas those for object orientation have been inconsistent. Shape is an intrinsic property and therefore higher on the simulation priority list than orientation, even when orientation is explicitly marked through placement verbs.
Our results do support the hypothesis that Spanish augmentatives lead to simulation of object size. This adds to evidence in favor of implied size simulation (Koning et al., 2016(Koning et al., , 2017. Note that we used screen size as the referent in the present study, whereas in previous studies another object (e.g., table) was used as the referent to determine size. Our findings also comply with arguments in the literature stating that size is an intrinsic object property and thus an important candidate for simulation (Connell, 2005(Connell, , 2007Koning et al., 2017). This study suggests that not only L1 speakers make size simulations, but L2 learners make size simulations too. This is in line with findings by Vukovic and Williams (2014) and Tomczak and Ewert (2015), who report distance and motion simulation effects for advanced L2 learners. This study suggests that in addition to advanced L2 learners, beginning and intermediate L2 learners make (size) simulations as well. The longer RTs for advanced L2 learners in comparison with L1 speakers that we found also comply with longer RTs for advanced learners reported by Vukovic and Williams (2014). These findings are also in agreement with research on L2 processing which explains longer RTs in L2 learners by their two (or more) languages always being simultaneously active (Grosjean, 2001).
Future work on simulation in L2 learners can advance in at least three directions. First, it would be interesting to replicate the studies with implied size (Koning et al., 2016, 2017) with L2 learners. A replication of the implied size effects with L2 learners would broaden the validity of simulation theory for a population of L2 speakers. Further, a study addressing simulation of both implied and explicit sentences in L1 and L2 speakers could provide insight into the richness of size simulations (see Hoeben Mannaert et al., 2017 for an investigation of richness of color simulations). A second direction could examine the effect of diminutives, which have the primary function to indicate small size, on size simulations. As diminutives are well-known and highly frequent in many languages, it is likely that size simulation effects can be documented for L2 speakers, without having to instruct L2 speakers prior to testing. Finally, future studies should investigate whether L2 readers make simulations for object shape and color as well, as has been reported for L1 readers (Zwaan & Pecher, 2012;Hoebaert Mannen, Zwaan & Dijkstra, 2017;Koning et al., 2017).

Note on Simulating L2 Forms
In section 2.3, we discussed that the instruction that L2 learners received before the experiment in the current study is possibly problematic. Potentially, the instruction could even explain the high percentage of No responses for L2 learners (especially in the Experiment 2). As priming may have led to a conscious treatment of object properties, it could have led to responses like "No, an object in this position/with this size was not mentioned" for some participants. Alternatively, the No responses could be explained by not knowing the proper nouns to describe the critical objects (though L2 learners did study the nouns for as long as they needed before the experiment). Yet, as L1 Spanish speakers also showed high percentages of No responses, the higher-degree-ofconsciousness-explanation is more plausible. Future work could address the instruction issue by testing L2 learners on their comprehension of L2 forms after the experiment and exclude participants as needed. It is recommendable to choose a linguistic form that is supposed to be known by L2 learners (e.g., diminutives). In addition, it could be interesting to compare groups of learners with and without instruction and examine whether they respond differently to the SPV task. In case learners do not receive instruction and do not show knowledge of critical language after the experiment, one would expect no match effects.

Conclusions
The key finding in this study is that L2 German learners did not make simulations of object orientation, even when orientation was explicitly marked by placement verbs, but that there is support that L2 Spanish learners make mental simulations of objects' size through Spanish augmentatives. Despite its limitations (see Note 4, section 5.2), the current study thus suggests that previous size simulation findings can be expanded to an L2 population. We suggest that future studies examine whether L2 learners make simulations for both implied and explicit sentences, whether they simulate with/out prior language instruction and whether they also simulate shape and color. If empirical evidence for simulation as a comprehension mechanism accumulates for L2 readers, the challenge will be to constitute a model of L2 comprehension based on simulation or integrate simulation into existing models of L2 comprehension (Thomas & Van Heuven, 2005;Tomasello et al., 2017). In addition, it will be critical to investigate whether L2 learners simulate in extended discourse (for L1 research, see Ditman et al., 2010). These expansions will provide important insights in the processes at work in second language comprehension.

Notes
1 Augmentatives have a secondary function of adding an emotional tone to a given word and may be used as pejoratives to express negative connotations (Butt & Benjamin, 2005;Hualde, Olarrea, Escobar & Travis, 2010). 2 The Common European Framework of Reference describes that at A2 level, for listening comprehension, learners can "catch the main point in short, clear, simple messages"; and for reading comprehension, learners can "read very short, simple texts" (Council of Europe, 2011: 26). 3 As addressed by an anonymous reviewer, addressing only eight objects leads to low power. Future studies should assess a larger number of objects. In addition, comprehension questions should be added to control that readers read for meaning. 4 As addressed by Stephanie Wassenburg and Diane Pecher (personal communication), the fact that participants saw previously encountered sentences and objects is problematic as memory may have reduced the effect. When an object is presented a second time, the orientation or size described by the first presentation may be activated again and interfere with the presentation of the object in the second sentence. Future studies should present each object only once. 5 Materials and datasets can be accessed through: https://www.iris-database.org/iris/app/home/ detail?id=york:934307. 6 Analyses and R scripts can be accessed through: https://github.com/belzebuu/LanguageStudy.