Can variation in input explain variation in typical spoken target-language features during study abroad?


Anita Thomas ,

University of Fribourg, CH
Rosamond Mitchell

University of Southampton, GB
Many studies have confirmed the belief that a stay abroad (SA) is beneficial for second language (L2) development. However, substantial variation in learning outcomes has long been acknowledged. Research has identified a range of factors that explain variability in target-language development. However, few studies have focused on the linguistic characteristics of the overall L2 input available to SA participants. In this study, we examine the role of the input situation during SA, in explaining variation in typical spoken language features. We will present six case studies from the longitudinal LANGSNAP corpus of French L2 learners at a university in the United Kingdom, who undertook a two-semester SA in France. For each participant, we established an input profile based on data from questionnaires and interviews collected before, during and following SA. The analysis of learner production over time examined three typical French spoken-language phenomena: The negative particle ne, the pronoun on as a first-person-plural subject and discourse markers. Since these phenomena are variable in both formal and informal French, we compared the learners’ production to variation found in three different reference corpora. Overall, results show a convincing relationship between the ongoing input situation and the production of typical spoken language features. Analyses of the reference corpora show differences in the proportions of the studied phenomena according to the level of formality of language use. For the L2 learners, limited engagement in everyday and informal interaction in French corresponds to a weaker use of the phenomena.

How to Cite: Thomas, A., & Mitchell, R. (2022). Can variation in input explain variation in typical spoken target-language features during study abroad?. Journal of the European Second Language Association, 6(1), 60–77.
  Published on 12 Dec 2022
 Accepted on 03 Nov 2022            Submitted on 30 May 2022

1. Introduction

Due to complex currents in global student mobility, widespread access to digital media, and the growing use of English as a lingua franca in higher education, study abroad (SA) today is an experience which no longer corresponds to straightforward immersion in a given target language (TL). Recent SA studies underline the high degree of variation found in participants’ language practices and in language learning outcomes (Mitchell et al., 2017). While many participants improve in fluency or lexis, only some learners also develop in the use of typical spoken-language features and sociolinguistic aspects of the TL. A large body of studies on the effect of SA on TL proficiency has identified a range of explanatory factors, confirming that presence in a TL-using region is insufficient by itself. A particularly challenging factor to study is the influence of the language to which the learners are exposed (i.e., the input situation). Research on second-language (L2) input suggests a relatively strong link between the language to which learners are exposed and their own production. However, this relation is often difficult to establish empirically.

In this study we examine the role of the input situation during SA in the development of typical spoken-language features of the TL. We hypothesize that learners who have regular access to informal interactions will use these variants better. To test this hypothesis, we investigate six individual case-study learners of L2 French from the LANGSNAP longitudinal corpus who spent two semesters in France. We focus on three typical French spoken-language features, namely the deletion of the negative particle ne, the use of the pronoun on instead of nous and the development of the repertoire of discourse markers (DMs). These variable features are often treated as sociolinguistic features in the second language acquisition (SLA) literature (e.g., Regan et al., 2009). However, research on French spoken language shows that they are pervasive in speech (see, for example, discussion in Giroud & Surcouf, 2016; Massot & Rowlett, 2013; Meisner & Pomino, 2014). Our methodology has two components. On the one hand, we analyse three reference corpora that could be representative of the more formal and less formal language to which learners are exposed during their stay: A corpus of lectures, a corpus of administrative interactions at the university and the first-language (L1) component of the LANGSNAP corpus. Second, using LANGSNAP interview and questionnaire data, we establish an input profile for each participant. This allows us to relate the input situation to the production of the three spoken features over time.

2. Background

2.1. SA and L2 development

Many studies have confirmed the belief that SA is beneficial for L2 development, particularly in respect of fluency and vocabulary (Llanes, 2011; Yang, 2016). However, substantial variation in learning outcomes has long been acknowledged (Isabelli-García et al., 2018). Attempts to identify factors explaining this variability include:

However, from a theoretical SLA perspective, the effects of all of these proposed explanatory factors are clearly indirect, and their overall function is to moderate the extent and nature of sojourner engagement with the TL while abroad. It is language experience itself (i.e., L2 input and L2 interaction) that must be the prime source of L2 development.

2.2. Input: Influence and measures

There is longstanding interest in the role of input in L2 learning. Research within the usage-based approach suggests that there is a tight relation between the language to which the learners are exposed and their representation of that language (Barlow & Kemmer, 2000; Bybee, 2010). This relation is strengthened through repeated exposure, in other words by the frequency of a specific construction in the input flow to the learners. However, this relation between learners’ input and output is not straightforward. Ellis and colleagues (e.g., Ellis et al., 2016) have identified a range of factors that influence input processing such as type and token frequency of a given construction in the input, the level of availability of a structure in the input flow (saliency), or the strength and reliability of the relation between a specific form and a function in the TL (contingency, multifunctionality).

An important issue here is input quality and quantity. While classroom input has many advantages for L2 learning, it remains limited in scale and register. Research on the role of interaction in L2 learning suggests that input becomes particularly salient in conversation (Gass et al., 2018; Long, 1996), where the interlocutors have to process what is said and are expected to actively contribute. Interaction encourages the mobilization of attentional processes, and, through co-construction, it also enables negotiation of meaning and feedback which increase input comprehensibility. A process of mutual adaptation may promote experimentation with new expressions (e.g., Michel & Cappellini, 2019). Accordingly, learners who have the opportunity to practice their L2 in interaction should develop forms typical of this context, such as phonological reductions or oral DMs (e.g., Kennedy Terry, 2017).

The relation between input and output is usually difficult to establish because we most often only have access to small samples of naturalistic interaction. This means that the input situation has to be inferred from data that could be representative for this situation. Studies that have examined the role of quantity and quality of input on L2 learning have tried to measure indirectly how much a learner has access to input and opportunities to actively use the TL as well as the quality of this input and then compare learner production with different reported input situations. For example, Ågren et al. (2014) established “individual input profiles” for children who had learned French at birth (simultaneous bilinguals) or as a L2 based on information derived from interviews about the children’s exposure to French outside school. The results showed that the input profile of some L2 children could be very similar to that of simultaneous bilingual children. These input profiles explained some differences in the development of morphosyntactic phenomena.

Such measures try to capture how often a learner has the opportunity to actively use the TL in their linguistic environment. However, in order to have a complete idea of the input situation, it is also necessary to examine the characteristics of the input to which the learners are exposed, since language use does not always correspond to our intuitions (e.g., Tracy-Ventura & Cuesta Medina, 2018; Wulff et al., 2009). What is needed are corpora of naturalistic language use that can be considered as representative for the kind of input to which the learners are exposed (Mitchell, 2021; Thomas & Ädel, 2021).

In the context of SA, the input situation has commonly been captured using indirect measures such as language-use questionnaires, diaries and interviews (Mitchell, 2021). However, these rely mainly on self-reported data and provide only an estimate of learners’ L2 engagement. Social-networks questionnaires also try to capture the level of access to L2 input and interaction but similarly cannot provide a detailed picture of input quality. This could be found in direct measures such as recordings of interactions, found in some small-scale, qualitative case studies (Mitchell, 2021). Examples include studies of interactions with sojourners’ host families (e.g., Kinginger et al., 2016; McMeekin, 2017), language exchanges (Bryfonski & Sanz, 2018; Fernández, 2016; Fernández-García & Martínez-Arbelaiz, 2014), service encounters (Ning, 2020; Shively, 2018) and relatively unstructured leisure talk (Behrent, 2007; Hasegawa, 2019; Kinginger & Wu, 2018). However, interaction involving short-term sojourners in academic sites such as university classrooms has been little studied.

In contrast to the concern with interaction outlined above, hardly any research has focused more broadly on the linguistic nature of the overall L2 input available to SA participants. Such research presents its own methodological challenges, most obviously the need to identify and/or create substantial input corpora. These could take various forms, ranging from corpora documenting the actual linguistic experience of individual mobile students (as have been created in child L1 acquisition research) to reference corpora reflecting the discourse genres to which SA participants are exposed, as called for by Taguchi and Collentine (2018).

In the present study, we combine two methodologies to capture the SA input situation. Firstly, we use indirect evidence to describe the global input situation of the learners in terms of input profiles, and secondly, we analyse the use of specific linguistic features in reference corpora from relevant genres and compare them to learner production over time. In order to test the impact of input in interaction, we concentrate on three features that are specific to spoken language.

2.3. Development of spoken-language features of TL during SA

In SLA, the phenomena we are interested in have often been studied in the context of sociolinguistic competence (Regan et al., 2009). However, today, these phenomena are considered as characteristics of spoken language as opposed to written (see Giroud & Surcouf, 2016, for an overview). The preverbal ne is obligatory in written language and is sometimes maintained in spoken language. However, ne deletion is common in both informal and formal spoken French but in different proportions (Riegel et al., 2021, p. 64, 703s). Likewise, the use of on, instead of nous, for the expression of first-person-plural subjects is much more usual in spoken than in written language (see review in Regan et al., 2009; Riegel et al., 2021, p. 364). As for DMs, these are central to the management of spoken interaction (e.g., bon ‘well’, du coup ‘so, then…’, enfin ‘well, finally’). They are defined by Crible and Degand (2019) as “Markers of structure and interaction that speakers [use to] convey not only the coherence of their intended message but also their attitude towards this message and towards the interlocutor” (p. 3). Typically, they are words and short phrases with some or all of the following characteristics:

  • syntactic optionality
  • weak clause association
  • high degree of grammaticalization
  • discourse-level scope and
  • procedural meaning (Crible & Degand, 2019, p. 13).

Again, stylistic variation is evident in the use of oral DMs in L1 French (e.g., Beeching, 2007). Traditionally however, spoken-language norms and usage have been largely ignored in classroom teaching, especially for French, in favour of written language norms. As a consequence, L2 learners of French may find themselves lost when confronted with informal spoken French (Surcouf & Ausoni, 2018).

For reasons of space and because research on these phenomena is well known, we present here only the main results of previous research in the SA literature on ne deletion (Dewaele, 2004; Gautier & Chevrot, 2012; Regan et al., 2009) and the use of on instead of nous for first-person-plural subjects (Dewaele, 2004; Regan et al., 2009; van Compernolle, 2016). All in all, these studies show a positive relation between authentic interaction abroad and an increase in ne deletion in L2 learners of French, alongside a very low level of ne deletion from learners with little contact to spoken French. They also show a positive correlation between the increased use of on and exposure to authentic interaction, though learners have ongoing difficulties with situational alternation, for example to prefer nous in formal written situations.

Regarding the L2 acquisition of DMs, some relevant studies have been undertaken in non-SA settings. The study of Sankoff et al. (1997) of L2 French users in the language contact setting of Quebec examined range and frequency of DMs used. A DM range similar to that of local L1 users was found only for learners who had extensive informal exposure to French, while overall frequency of DMs was related to general proficiency. Lyrigkou (2021) studied the acquisition of DMs by adolescent Greek learners of L2 English; she found that active engagement with informal learning activities promoted both range and frequency of DMs. In a corpus-based study, Gilquin (2016) drew similar conclusions regarding the importance of exposure to naturalistic speech for the acquisition of DMs in L2 English. In another corpus study, Borreguero Zuloaga and De Marco (2021) compared speech production of learners in L2 Italian immersion and non-immersion settings, finding that the immersion background promoted use of a greater variety of DM types. In an SA context, however, DM research is very limited. Arvidsson et al. (2019) studied the frequency of production of a small number of French DMs by two case-study Swedish students in oral interviews. The two students had very different patterns of engagement with French; Vera studied French assiduously at home but socialised during leisure time through Swedish or English, while June had a very active French-using social network. Yet unlike in the non-SA studies just discussed, both participants modestly increased their frequency of DM use to a similar extent. Arvidsson et al. speculated that the very high frequency of DMs in the wider SA language environment may override differences in participant networking patterns. However, strong conclusions cannot be drawn from this small and exploratory SA study. Furthermore, of these DM studies, only Sankoff et al. (1997) took account of possible variation in DM selection, relating to speech style (more/less formal).

Based on this literature review, the current study aims to address two research questions: 1) Can we observe variation between formal and informal spoken French for the chosen target phenomena in corpora representing the TL input of SA students in France? and 2) To what extent can we associate the SA students’ production of typical spoken French phenomena over time and the type of input they encounter during their SA? Our general hypothesis is that the students with a stronger and more varied input situation will display a higher frequency of use of these phenomena over time than those with a weaker input situation, whatever their starting proficiency level.

3. Method

3.1. Overview

The empirical study investigates the changing use of selected spoken language features by six advanced learners of L2 French sojourning in France. The learner data is drawn from the publicly available LANGSNAP corpus (Mitchell et al., 2017:

The LANGSNAP corpus studied the development of L2 French or L2 Spanish in a cohort of British undergraduates (N = 57) undertaking a 2-semester SA during the third year of a specialist languages degree. Data were collected pre-sojourn (year 2, one data collection point), in-sojourn (year 3, three points), and post-sojourn (year 4, two points). A range of tasks was used each time to collect language-production data, including elicited imitation (EI, collected at three times only), picture-based narrative and oral interview, transcribed using CHILDES/CHAT conventions (MacWhinney, 2000). Ten L1 speakers of French also completed the narrative and interview tasks. Data on language use and exposure were collected indirectly through interviews in the L1 and L2 and through questionnaires (social networking questionnaire, language engagement questionnaire). The main data-collection instruments are available through IRIS (Mackey & Marsden, 2016).

For this study, six case-study participants were selected from among the L2 French learner group. Individual input profiles were developed from interview and social networking questionnaire data. Progress in oral production was tracked through analysis of the sequence of six oral interviews in French available for each participant.

3.2. Selection of reference corpora

From research on contemporary French, we know that although the TL is often presented as homogenous, there is considerable variation both between and within locutors and situations (Beeching et al., 2009; Detey et al., 2016). Accordingly, to get an approximation of the input to which SA learners are exposed, we have to look at reference corpora representing different facets of language use. For the present study, we selected three different corpora. The first corpus is the oral interview data from the L1 speakers of the LANGSNAP project (14,000 words in CHAT, here called LANGSNAP B). This group consists of 10 French exchange students at a British university; the data represent the kind of input the sojourners might have during informal conversations with local peers. The second is a corpus of lecture extracts published in a guide to French for academic purposes, representing the input available to sojourners in the lecture setting (Mangiante & Parpette, 2011, here called LECTURES). We selected the data from disciplines relevant for LANGSNAP participants, namely economics, French as foreign language, law, linguistics, stylistics (5 lecturers, 13,000 words converted to CHAT). Lectures in France are typically ex cathedra, where the teacher delivers a well-prepared monologue, that might be more formal than the French used in conversation. The third corpus is FLEURON ( that was designed to introduce future exchange students to administrative interaction on campus in France (nine hours of conversation, approximately 81,000 words). This corpus represents mainly the input from administrative contacts. There are also clips about how to buy a train ticket and some conversations with and between L2 speakers of French, retained to represent another kind of campus French.

3.3. Participants

The six participants were selected from the LANGSNAP L2 French learner subset; the dataset of their interviews comprises about 44,000 words. Participant selection was based on several criteria. First, in order to examine possible differences in the influence of input according to proficiency level, we selected learners with different entry levels of French proficiency, using the results of the EI test administered pre-sojourn (diamond symbols in Figure 1). The test involved repetition of 30 French sentences of different lengths (7 to 19 syllables) with increasing syntactic complexity in French (see Tracy-Ventura et al., 2014, for details). When presented by decreasing order of pre-sojourn scores, the learners fell broadly into three proficiency groups, highlighted with the thin lines in Figure 1. We excluded two learners with a near-L1 level.

Figure 1 EI test results and participant selection.
Figure 1 

EI test results and participant selection.

Reproduced from Anglophone Students Abroad: Identity, social relationships and language learning (p. 79) by R. Mitchell, N. Tracy-Ventura & K. McManus, Routledge, 2017. Copyright the authors 2017, reproduced with permission.

Second, we wanted to have learners who develop differently during SA, to examine the possible relation between the level of development and the input situation. Therefore, we considered gains over time in the repeated EI test scores, from pre-sojourn to post-sojourn (cross symbols in Figure 1). The highlighted strips in Figure 1 show the six selected learners: Two from each proficiency level, one with strong and one with weak development.

Most of the lower proficiency learners worked as teaching assistants when abroad, except participant 107 who was an exchange student (the weakest participant but also one of the top 10 developing learners). In order to keep the global input situation comparable for participants with similar proficiency pre-sojourn, we chose to take two teaching assistants from this level. The others were all exchange students attending university.

3.4. Determining the participants’ input situation

The learners’ input situation was operationalized in terms of individual input profiles, based on data from the oral-interview series with each participant and their answers to the social networking and language engagement questionnaires. Unlike Ågren et al. (2014), we did not give points for specific activities, but we systematically analysed what the learners told us about their contact with French compared to other languages. We considered especially their housing situation and friendship networks. We also noted self-evaluations and comments about the situation in France and their level of motivation for learning French. The more diversified the learners’ contact with French, the stronger their input situation was considered to be. In addition, the analysis took into account situational variation over time. We observed two patterns. One is stable when the situation regarding contact with French did not change over time; the other is mixed when there were major changes during the stay. The analysis was cross-checked and agreed between the two authors; the complete analysis is available on Open Science Framework (OSF,

Table 1 presents the six selected participants with key elements of their input situation and overall input profile, as well as their level of French before departure and their development as measured through EI tests (see Figure 1 above).

Table 1

Participants and input situation.

P108: Woman, L1 Finnish, exchange student, high level of proficiency at pre-test, strong development

P108 lives with two French people and with the landlord. She is very active in a local athletic club and makes several close friends there. She has some contact with English and Finnish people because family and friends visit her. She has some friends within the community of exchange students (the Erasmus program) but she spends most of her time with her friends from the athletic club.
strong and stable input situation

P104: Man, L1 English and some French at home, exchange student, high level of proficiency at pre-test, weak development

P104 lives in a French host family and spends a lot of time with them. He shares the flat with several people, including an English-speaking person. He has many international friends, so that he speaks both French and English.
strong and mixed input situation

P118: Woman, L1 English, exchange student, intermediate level of proficiency at pre-test, strong development

P118 lives with Erasmus students in a student residence. She speaks French and English with them. She has everyday contact with her family and boyfriend in the United Kingdom (UK) and spent two weeks in the UK for Christmas. She finds it difficult to make friends with French people. She watches TV in English but reads a lot in French.
weak and stable input situation

P121: Woman, L1 English, exchange student, intermediate level of proficiency at pre-test, weak development

P121 lives with international students and spends most of her time with Erasmus students. She attends lectures but the teachers speak all the time and there is little participation from the students. She is already back in the UK at Visit 3.
very weak and stable input situation

P105: Woman, L1 English, teaching assistant, low level of proficiency at pre-test, strong development

P105 lives with English-speaking colleagues, at Visit 3 she lives with a French woman. She has an active social life and has French and international friends.
strong and mixed input situation

P119: Woman, L1 English, teaching assistant, low level of proficiency at pre-test, weak development

P119 lives first with international students but at Visit 2 she lives with a French man (who wants to speak English). She is struggling with French and really seeks opportunities to speak the language. She has finally some success and at Visit 3 she has made many French friends.
weak and mixed input situation

For most participants the input situation corresponds to overall proficiency development, but not for P118 who shows strong development but has a weak input situation, nor for P104 who has weak development but a rather strong input situation. The aim of our study is to examine whether the input situation influences more specifically the production of typical spoken French features.

Information about participants’ input situation provided in the oral interviews corresponds well with responses to the social networking questionnaire (Mitchell, 2015, p. 25). Figure 2 shows the social networks for the participants with the strongest (P108) and the weakest (P121) input situation at Visit 2. The nodes represent different social contacts, and node colour represents the main language used with each contact. Arrow breadth indicates the strength of the network relationship. The diagram for P121 confirms that her main source of French comes from university professors and classmates and that her contact with English remains strong in-sojourn. In contrast, P108’s social network shows a variety of French input sources and much weaker contact with other languages.

Figure 2 Schematic representation of the mid-sojourn social network for participants 121 (weak input) and 108 (strong input).
Figure 2 

Schematic representation of the mid-sojourn social network for participants 121 (weak input) and 108 (strong input).

3.5. Selection and analysis of spoken-language features

One of the expected activities during a SA is to engage with the TL in its real usage, different from the normative language often taught in class (Freed et al., 2004). In particular, learners are exposed to variation: The more formal language used in lectures is partly different from the language used in informal conversations. For this reason, we concentrate on the three phenomena characteristic of spoken French introduced in Section 2.3. We analyse and compare the occurrence of these phenomena in our three reference corpora and in learner productions. We recognise that the learner recordings only give a partial picture of what they are able to produce and that the context of the interviews may influence their usage. However, the style of interaction is similar for all the oral interviews, which allows us to observe the evolving production of the target features in this context over time. Analysis procedures for each individual phenomenon are described within each section below. Details are provided on OSF.

4. Results

4.1. Negation

The analysis of negation concentrated on the most basic and common structure <(ne) Vfinite pas> (je ne pense pas versus je pense pas ‘I don’t know’). Only structures with a finite verb were considered, including tokens of the imperative (n’hésitez pas ‘don’t hesitate’ mainly found in FLEURON and LECTURES). Tokens including the subject pronoun on and followed by a verb form starting with a vowel (on n’est pas ‘we are not’) were excluded, as the presence/absence of ne is hardly perceptible. All data have been coded manually as <pas> or <ne_pas> for the target structure <(ne) Vfinite pas> from a CLAN/COMBO output (for LANGSNAP B and LECTURES) or with the concordance of FLEURON.

Figure 3 shows the results for negation in the reference corpora. The graphs show the proportion of <ne Vfinite pas> from the sum of tokens of the structure <(ne) Vfinite pas>. It appears that ne is more often maintained in the two more formal corpora with around 50% for LECTURES (27/53 tokens) and around 30% for the administrative French in FLEURON (241/819) than in informal conversation (LANGSNAP B, around 10% with 20/219 expressed ne). These data confirm the frequent drop of ne in spoken L1 French, but also its presence in the more formal data. It can thus be hypothesized that learners who have more contact with L1 speakers in informal contexts will drop the ne more than those who are mainly exposed to formal French.

Figure 3 Results for negation: Proportion of ne in (ne) Vfinite pas in reference corpora.
Figure 3 

Results for negation: Proportion of ne in (ne) Vfinitepas in reference corpora.

Figure 4 shows the results for the learners at each data-collection time in chronological order. First, there is a clear drop in the production of ne after a few weeks in France (Visit 1) in all learners. For some learners (P104, P105 and P121) production increases again post-sojourn, but for most, this is to a lower extent than at the pre-test. For example, P105 had 71% (5/7) of ne at pre-sojourn, only 15% (4/26) at Visit 3 and an increase to 45% (5/11) at post-sojourn 2. This suggests that regular contact with French, even in formal situations, influences the level of ne deletion among learners.

Figure 4 Results for negation: Proportion of ne in (ne) Vfinite pas – Learners.
Figure 4 

Results for negation: Proportion of ne in (ne) Vfinitepas – Learners.

4.2. Nous versus on

The analysis of the use of on for the expression of first-person-plural subjects is based on a manual coding of the two pronouns in a CLAN/COMBO output. We only computed the production of the pronouns in their function as a subject clitic (on parle ‘one speaks’, nous avons ‘we have’). This means that the use of nous as a strong pronoun (nous on explique bien ‘we we explain well’), the use of the pronouns without any recognizable verb (on j’ai ‘one I have’), the use of nous after a preposition (chez nous ‘at our place’) and nous as an object pronoun (ils nous ont coupé la ligne ‘they cut us off’) were not considered. However, all tokens of nous or on followed by a verb were computed even in repetitions or in combination with an unexpected verb form as in (1).

    1. (1)
    1. <nous [/] nous avons euh> [//] nous sont restés dans une [/] une cha(let ?) (P105, Pre)
      <we [/] we have uh> [//] we stayed-3PL in a [/] cabin (two tokens)

Figure 5 shows the proportion of nous based on the sum of nous and on as clitic subject pronouns in the reference corpora. We can observe that there are somewhat more tokens of nous in the more formal French, and especially in FLEURON (10%, 91/977), where many interlocutors speak in the name of the administration they represent. In the LECTURES we have 9 tokens of nous and 231 on. Among the L1 speakers of LANGSNAP B, there is zero production of nous as a subject pronoun (0/241). For this feature there is a confirmed difference between formal and informal use of French.

Figure 5 Results nous vs on: Proportion of nous in reference corpora.
Figure 5 

Results nous vs on: Proportion of nous in reference corpora.

Figure 6 shows nous/on results for the learner data. First, we can observe that the three learners with the weakest input situation sometimes do not produce either of these pronouns during the oral interview, especially while in the UK (P118 and 119 at pre-sojourn, P121 at post-sojourn), and that two learners (P104 and P118) do not use nous. For the others we can see that there is a strong, permanent drop in the use of nous once it started to decrease (see, for example, P119 who has a decrease from 95% nous to 25%), except for P121 who has the weakest input situation of all and hardly produces the two pronouns. This suggests that the stay in France and especially informal contact has a positive effect on the use of on instead of nous.

Figure 6 Results nous vs on: Proportion of nous in learner data.
Figure 6 

Results nous vs on: Proportion of nous in learner data.

4.3. Discourse markers

To identify DMs in our dataset, we adopted the identification criteria of Crible and Degand (2019), presented above in Section 2.3, that they applied to a corpus-based study of DM use in informal L1 French conversations (in which they identified 33 DM types). Using a detailed analysis protocol based on the criteria of Crible and Degand, we identified 30 DM types across the L2 learners, LANGSNAP B and LECTURES corpora (We did not attempt analysis of DMs in FLEURON.). The protocol is available on OSF. It allowed us to distinguish between DM and non-DM uses of some items, as in the examples in Table 2. Note the tendency for DMs to occur in clusters, which is also evident in these examples.

Table 2

Examples of DM and Non-DM use.


bon ah bon alors maintenant qu’est-ce qu’il est devenu?
‘ah well now what has become of him?’
le bon choix; c’est bon
‘the right choice; it’s good’

du coup là je pense appeler la police
‘so now I’m thinking of calling the police’
depuis que je suis là
‘since I’m here’

The Crible and Degand (2019) definition includes as DMs a group of high-frequency conjunctions (et ‘and’, mais ‘but’, ou ‘or’, parce que ‘because’), which we excluded from further analysis as they showed little variation across the various corpora. We also excluded non-lexical items such as euh, bah, beuh (as do Crible & Degand, 2019). We concentrated our analysis on a subset of DMs that met a minimum frequency threshold of 10+ tokens in at least one corpus. These are alors (que) ‘so that’, après ‘then’, ben ‘well’, bon ‘well’, d’accord ‘ok’, donc ‘so’, du coup ‘so, then…’, en fait ‘in fact’, enfin ‘well, finally’, et tout ‘and everything’, hein, là ‘then, at this point’, (et) puis ‘(and) then’, quoi ‘what’, sinon ‘otherwise’, and voilà ‘that’s it’.

We first present an overview of the range of DMs used within each corpus (with the L2 corpus broken down by data session). Figure 7 shows that the range of DM types used by participants increased from a low-level pre-sojourn to a mean of 6 in Visits 2 and 3, but then declines again on return to the home setting. However, the mean range never approaches that of LANGSNAP B (M = 9.5) or of LECTURES (M = 8).

Figure 7 Number of different discursive markers (types).
Figure 7 

Number of different discursive markers (types).

Next, we present the sets of DMs most commonly used in the different corpora. Table 3 shows the DMs with 10+ tokens found in the LECTURES corpus.

Table 3

Most common DMs (10+ tokens) in LECTURES corpus (5 speakers, 13,245 words).


donc 179 13.8 5

hein 114 8.6 4

alors (que) 55 4.2 5

bon 32 2.4 3

(et) puis 21 1.6 4

14 1.1 4

enfin 11 0.8 3

Within the LECTURES corpus, just 7 DMs had frequencies of 10 or more; the markers hein and are unique to this corpus. Inter-lecturer variation was high; for example, a single lecturer contributed 88/114 tokens of hein. The range of DM types used by each individual speaker was somewhat lower than LANGSNAP B (see Figure 7). Most DMs with 10+ tokens in the LECTURES corpus (bolded in Table 3) also had 10+ tokens in the LANGSNAP B corpus (Table 4). Extract (2) below provides a flavour of DM use in the lecture setting.

Table 4

Most common DMs (10+ tokens) in the LANGSNAP B corpus (10 participants; 14,882 words).


donc 185 12.4 10

enfin 64 4.3 8

voilà 64 4.3 10

ben 39 2.6 7

en fait 39 2.6 7

alors (que) 34 2.3 10

du coup 32 2.2 5

sinon 32 2.2 8

après 28 1.9 9

puis 27 1.8 6

bon 26 1.8 5

quoi 18 1.2 5

et tout 14 0.9 5

    1. (2)
    1. bon alors (.) pour aujourd’hui on va euh donc terminer (.) le chapitre 4 (.), bien alors petit rappel sur ce qu’on a fait la semaine dernière pour les happy few (.), euh donc je pense qu’on est dans la partie 1.2 (.), non partie 1.1(.), je vous rappelle la perspective du chapitre (.), donc (.) encore une fois (.) ce qu’ on cherche à comprendre euh (.) c’est (.) bon c’est un phénomène dont on a parlé à plusieurs reprises (.). c’est le phénomène du chômage persistant (.) hein [ECONOMICS, Mangiante & Parpette, 2011].
      ‘fine then (.) for today we will euh so finish (.) chapter 4 (.), well then a little reminder on what we did last week with the happy few (.), euh so I think we are in section 1.2 (.), no section 1.1 (.), I’ll remind you about the point of view of the chapter (.) so (.) once again (.) what we are trying to understand euh (.) it’s (.) fine it is a phenomenon we have spoken about several times (.) it is the phenomenon of persistent unemployment (.) eh’

Table 4 shows the range of DMs with 10+ tokens found in the informal L1 interviews of LANGSNAP B (13 in all). The small number also found in LECTURES have been bolded; the rest seem distinctive to informal conversation. Therefore, the range of DM types used was higher than in LECTURES, with a mean of 9.9 DM types per speaker (see Figure 7); intra-speaker variation was also high, both for DM selection and for DM frequency. Extract (3) illustrates DM use from this corpus.

    1. (3)
    1. et en fait euh mon propriétaire, bon je vais vous raconter ma vie, il a quatre maisons qui sont sur cette même rue, donc tout est à côté, donc en fait on est tous ensemble, enfin pas tous ensemble mais une bonne bande voilà (P138, L1)
      ‘and in fact my landlord, well I’ll tell you my life, he has four houses on this same street, so everything is next door, so in fact we’re all together, well not all together but a good group here’

Table 5 lists the DMs found in the L2 learner corpus, with a frequency of 10 tokens or more. Comparisons with Tables 3 and 4 suggest that learners had a preference for DMs typical in the L1 French of the more formal lecture register. They also made relatively frequent use of the DM juste, perhaps influenced by its English cognate just. Figure 8 provides a more detailed picture of the DMs used by individual learners at each data-collection point. Individual learners used a reasonable range of DMs (with a mean range of 11 marker types, over all 6 interviews). However, the figure also shows a tendency to rely heavily on a relatively small number of DMs, making infrequent use of others. For example, P105 used donc 121 times, which dwarfed the rest of her DM usage (56 tokens across all other DMs she used). Extract (4) illustrates reliance in L2 speech on a relatively limited number of DMs.

Table 5

Most common DMs in the French L2 learner corpus (6 participants, 47,353 words).


donc 327 6.9 6

alors 138 2.9 4

enfin 100 2.1 4

après 88 1.9 6

en fait 84 1.8 5

bon 61 1.3 4

et tout 36 0.8 3

voilà 32 0.7 4

ben 23 0.5 4

quoi 22 0.5 4

(et) puis 17 0.4 5

d’accord 10 0.2 4

sinon 10 0.2 4

du coup 10 0.2 2

Figure 8 Tokens for individual DMs used by learners over time.
Figure 8 

Tokens for individual DMs used by learners over time.

    1. (4)
    1. INV: alors pour commencer qu’est-ce qui s’est passé depuis ta dernière visite en février ?
      P119: d’accord euh beaucoup en fait, euh quand je suis revenue à City j’ai décidé que je n’avais… je n’étais pas très content avant, comme j’ai dit, euh hum parce que la situation et les choses comme ça, donc j’ai décidé que je dois profiter des dernières semaines à City, donc euh j’ai rencontré les autres amis, et j’ai rencontré les autres amis français qui est très important pour moi, euh il y a en fait une une petite groupe.
      ‘INV: so first of all what’s happened since you last came back in February?
      P119: okay um a lot actually, um when I came back to the City I decided that I didn’t… I wasn’t very happy before, like I said, um because the situation and things like that, so I decided that I have to enjoy the last few weeks in the City, so um I met the other friends, and I met the other French friends which is very important to me, um there’s actually a small group.’

5. Discussion

Our analysis of three reference corpora confirmed the existence of variable usage in different French speech genres (research question 1). The most informal genre (L1 student conversation, LANGSNAP B) was characterized by very low ne retention (10%), consistent use of on as first-person-plural subject pronoun (100%), and a wide range of discourse markers (13 items with a frequency of 0.9 or above per 1,000 words). The Lectures genre was characterized by relatively high ne retention (50%), some on/nous alternation (4% nous), and a more limited set of discourse markers (7 with frequencies of 0.8 or higher per 1,000 words). The administrative FLEURON corpus showed moderate ne retention (30%), and the greatest use of nous (10%). These patterns broadly confirm other accounts of variation in contemporary French (Beeching et al., 2009; Crible & Degand, 2019; Massot & Rowlett, 2013; Riegel et al., 2021).

As a group, the six case-study learners showed an overall tendency to move toward informal speech norms during SA (decline in use of ne and of nous, from high starting levels; some increase in range of discourse markers). This change is in line with findings from other studies for ne and for nous/on (Dewaele, 2004; Regan et al., 2009), and with DM studies concerned with naturalistic exposure more generally (Sankoff et al., 1997; Lyrigkou, 2021). The strongest short-term change was evident at the first interview in France. In addition to confirming this overall trend, however, our approach allowed us to propose a clear relationship between participants’ individual input situation and movement towards spoken French norms (research question 2). For the three participants with strong or strong and mixed input profiles, for example, use of ne declined below 25% during the sojourn, while for two of the three participants with weak input remained above 40%. Those learners who used nous/on in interviews showed a large decline for nous during their stay. This achievement was maintained once back in the UK, except for P121 (the weakest of all for input). Unlike in the sole previous SA study by Arvidsson et al. (2019), the input situation also impacted DMs, DM use by L2 learners with strong or strong and mixed input was high both in types and tokens, while the learners with weak input mainly relied on the most frequent marker donc. However, the DM types most typically used by learners reflected the more formal choices found in LECTURES, rather than the informal choices of LANGSNAP B (which may partly also explain the findings of Arvidsson et al. 2019). These findings add original, new detail to the more general findings in the SA literature, where aspects of the input situation have been shown repeatedly to influence overall fluency, pragmatic and lexical development, but it has proved difficult to demonstrate more fine-grained linguistic impact (Llanes, 2011; Yang, 2016).

6. Conclusion

From a methodological perspective, this study shows that analyses of relevant reference corpora provide useful insights into the types of input that are likely available to SA learners and that prompt them to develop control of features in spoken French. The corpora available for a language such as French are still limited, however, and there is clearly scope for creation of larger and more varied corpora reflecting the full range of SA settings, which would have positive applications in advanced language pedagogy as well as in research.

Our findings demonstrate a convincing relationship between the input situation and development of less formal speech features. High grammatical proficiency does not predict control of variation and vice versa. Instead, it is clear that to develop control of speech features, learners need experience with a range of oral genres. When abroad, students need encouragement to develop the sense of agency, persistence and resilience that will facilitate access to different genres. Given the decline in some informal features that was evident following return to the UK, it also seems clear that to maintain control of these features, learners need ongoing exposure to less formal genres. A corpus such as FLEURON provides a positive example of the kind of pedagogical materials that could promote multi-genre exposure at home. Testing the effect of such material could be the focus of future research.


