1. Introduction

Students of different ages learning a foreign language or second language (L2) at different levels experience that speaking fluently is hard to achieve and that anxiety while speaking can be an intense experience (). Anxious students experience more difficulty demonstrating the skills and knowledge that they possess (). A body of research including meta-analyses has shown that foreign language anxiety (FLA) is negatively related to foreign language or L2 performance, including speaking performance (; ; ). However, most of the research on the relationship between FLA and language performance is correlational in nature, which means that there is little evidence of the causality of FLA on language performance. It has been noted (see , for an overview), that the relationship is most likely bi-directional in nature: anxiety causing difficulties in performance and learning as well as anxiety as a result of problems encountered during performance and learning. Because anxiety has been theorized to use up cognitive and attentional resources (), fluency in speaking (speaking rate and pausing) would particularly be affected by diminished cognitive resources. The current study, thus, (1) experimentally manipulates FLA within participants to directly investigate the effect of anxiety on speaking fluency and (2) compares reflections by learners on their dysfluencies in a low versus high anxiety condition to investigate how increased anxiety is related to cognitive processes during speaking. Comparing the reflections in the two conditions will provide more insight into how the cognitive processes of conceptualizing, encoding, and monitoring (), which underly speech production, are affected by feelings of anxiety.

2. Background

2.1. L2 fluency

In teaching and evaluating L2 speech production, both educators and researchers consider speech fluency, as it reflects an L2 learner’s ability to formulate a message in their L2, within strict time constraints. Speech fluency reflects the ease of production and efficient, automatic language processing (). Lennon () distinguished two definitions of fluency: the broad sense, in which fluency refers to an L2 learner’s general oral proficiency, and the narrow sense, in which fluency is a component of oral proficiency. The current study will adopt Lennon’s narrow sense of fluency and Segalowitz’s () operationalisation for L2 fluency.

Segalowitz () divides fluency into three subdomains: cognitive fluency, utterance fluency, and perceived fluency (). Cognitive fluency involves the speaker’s ability to efficiently carry out underlying cognitive processes that enable speech production. Cognitive fluency results, at least in part, from a speaker’s underlying L2 knowledge and processing abilities. The second dimension, utterance fluency, can be measured objectively from speech. It is often operationalised quantitatively from speech characteristics, such as speech rate, pausing, and dysfluencies. Thirdly, perceived fluency encompasses inferences made by listeners based on their perceptions of a speaker or speech sample. Previous research has shown that ratings on fluency (perceived fluency) are strongly related to different aspects of utterance fluency (). The relationship between cognitive fluency and utterance fluency is less thoroughly researched.

To investigate this relationship, Segalowitz and Freed (), De Jong et al. (), and Kahng () related utterance fluency measures to aspects of cognitive fluency quantitatively, such as (L2-specific) lexical access speed, lexical selection, and articulation. Kahng () included a qualitative methodology in her study and compared stimulated recall on aspects of fluency by lower and higher proficient speakers in their L2. Her study shows that aspects of L2 utterance fluency reflect processing difficulties in speech production. Moreover, the study found that higher proficiency learners reported different cognitive processes of their L2 performance than lower proficiency learners. Higher proficient learners reported conceptual difficulties resulting in disfluencies, whereas lower proficient learners mentioned difficulty in formulation leading to disfluencies in their L2 performance. Kahng () thus showed how a stimulated recall method can elucidate (differences in) cognitive processes leading to disfluencies in speaking by L2 speakers.

2.2. FLA

Building on a large body of literature, FLA is characterised as a type of anxiety that is situation-specific and linked to a language-learning setting (). The Foreign Language Classroom Anxiety Scale (FLCAS) () is the most widely adopted means of both capturing and measuring this type of anxiety among L2 learners ().

A consensus has been reached about the negative impact of L2 anxiety on L2 language achievement. In meta-analyses (; ; ), L2 anxiety is negatively associated with L2 language achievement, even more so with learner’s self-perceived achievement (). More recent research presents FLA as a type of anxiety with both internal (personal) and external (social, circumstantial) dimensions, moving away from the purely situation-specific type of construct and investigating other (emotional) factors that interact with FLA on L2 performance, such as foreign language enjoyment (FLE) and willingness to communicate (; ; ; ; ).

Other approaches looking into the impact of FLA on L2 performance were argued for by MacIntyre (). He raises “[…] a pressing need for additional experiments to help clarify causal connections between language anxiety and performance” (). Also, he elaborates on how a more dynamic approach toward investigating the effect of FLA along with other interactive factors on language learning and development can help achieve this. This dynamic approach can include both qualitative and quantitative data, ranging from emotional, behavioural, cognitive, and physical systems. Adopting such an idiodynamic approach, Boudreau et al. () investigated the relationship between anxiety and FLE within individuals on a second-by-second basis over a period of two minutes and qualitatively investigated to what the participants attributed fluctuations in their emotions. They found that the fluctuations in emotion varied highly between individuals and that individuals were able to reflect on these fluctuations, which led to a richer understanding of the role of emotions in language performance.

Dewaele and Dewaele () also investigated both FLE and FLA. They used a within-participants design (N = 40) comparing emotions during sessions by two teachers. They found that FLE was influenced by the teacher, whereas FLA was stable across teachers. These findings suggest that FLA is more strongly linked to learner-internal factors, as opposed to FLE being more strongly linked to learner-external factors.

2.3. Previous research relating L2 fluency to FLA

Concerning speaking and drawing upon Processing Efficiency Theory (; ), Kormos () theorised how anxiety results in depleting cognitive and attentional resources (e.g., affecting working memory and the inhibition and switching functions of attentional control). Following from this, Kormos () made further specific predictions in that high levels of anxiety may be related to more effort in retrieving words from the mental lexicon and encoding syntactic structures in the L2. In addition, inhibiting the activation of L1 items and constructions may be less efficient. Finally, efficient switching between the different processes of conceptualization, formulation, and monitoring (see Levelt’s () model of speech production) might also be more demanding when anxiety is high. In turn, higher levels of anxiety are hypothesized to be related to lower levels of fluency during speaking performance (; , ; ; ; ).

This section continues to review the literature that has investigated the claim that anxiety negatively influences speaking fluency empirically. MacIntyre and Gardner () were among the first to examine the extent of the negative effect of FLA on cognitive language processing in L2 learners. They distinguished between an input, processing, and output stage for anxiety and performance measures. In a sample with 97 first-language (L1) English L2 French learners, they found that at all stages, (output) anxiety correlated significantly and negatively with most of the performance measures, including L2 speaking fluency. Also as predicted, input and processing anxieties were significantly related to performance at the input and processing stages of tasks. They thus concluded that the effects of FLA build up across cognitive processing stages and affect both learners’ L2 knowledge and their ability to demonstrate it.

Pérez Castillejo () specifically investigated the link between anxiety and utterance fluency through correlational and regression analyses. Thirty-eight L1 American English learners of L2 Spanish completed the FLCAS, performed a Spanish-elicited imitation test to measure overall L2 oral proficiency, and were recorded during their final oral exam of a Spanish college course. These final oral exams were manually analysed on utterance fluency measures for breakdown and speed. Pérez Castillejo () found that FLA predicted the utterance fluency measures significantly, whereas proficiency, measured through elicited imitation, did not. Concerning the relation between FLA and fluency, results showed that, compared to the lower anxious learners, highly anxious learners paused more mid-utterance and had shorter runs of speech between pauses, longer pauses between utterances, and a lower phonation time ratio. Pérez Castillejo suggested that FLA interfered with speech processing by hindering formulation and encoding more than conceptualization of the message, although this conclusion did not entirely follow from the finding that pause durations between AS units were longer for speakers with higher anxiety, a finding suggesting that the conceptualization stage in speaking may also be influenced by anxiety.

Bielak () followed up on Pérez-Castillejo () and included measures of FLE. He studied FLE, FLA, and fluency in a sample of 43 proficient L1 Polish L2 English majors on a creative decision-making oral task. With respect to fluency, the results tentatively suggest that FLA is a somewhat stronger predictor of fluency than FLE, with more significant correlations between FLA and measures of fluency compared to the number of significant correlations between FLE and fluency.

Lastly, Aubrey () investigated the relationship between anxiety, enjoyment, and pausing in speech (breakdown fluency) with idiodynamic ratings and stimulated recall interviews. In a sample of 4 L1 Cantonese advanced L2 English speakers, Aubrey found that anxiety and enjoyment both independently fluctuated in participants’ speaking performance at a per-second timescale and that feelings of anxiety impacted fluency breakdown more than enjoyment. From the stimulated recalls it was found that most participants reported difficulty in remembering words, an issue of linguistic formulation, which was related to peaks in anxiety, lowered enjoyment, and increased pausing.

To summarize, the aforementioned studies have established a link between FLA and L2 fluency, and in the literature it is assumed that FLA impedes L2 learners in accessing L2 knowledge and poses constraints on efficiency in cognitive fluency, mostly on encoding and formulation of L2 processes. However, the relationship between FLA and fluency has not been investigated directly or experimentally by manipulating FLA. The aforementioned studies, moreover, have not yet investigated how L2 learners’ FLA levels impact L2 cognitive fluency, except perhaps for MacIntyre and Gardner (), who showed that different stages of processing are affected by FLA

3. The current study

The current mixed-methods study answers MacIntyre’s () call to clarify causal connections between FLA and language performance, focusing on speaking fluency and adopting the dynamic approach to FLA (; ). We (1) experimentally manipulated FLA in a within-participants design (as in ) and (2) asked participants to reflect on their disfluencies using a stimulated recall method in order to tap into cognitive fluency, following Kahng (). With this setup, we can shed more light on how FLA interferes with L2 processing during L2 speech production. We aim to answer the following research questions:

  1. To what extent do changes in FLA lead to changes in L2 learners’ L2 utterance fluency?
  2. To what extent do changes in FLA lead to changes in L2 learners’ L2 cognitive fluency?

3.1. Method

3.1.1 Overall design and experimental manipulation

The current study adopted a within-subjects experimental design with two conditions, manipulated through grade and interlocutor. Both the assessment () and the interlocutor () may trigger feelings of anxiety in L2 learners. In the high anxiety condition, participants performed a speaking task with an external teacher (also referred to as experimenter or researcher in this section) that would be graded, whereas in the low anxiety condition, participants performed a speaking task with a peer that would only affect their grade minimally.

Because the participants in this study were underage, parental consent was obtained through the secondary school. Parents and guardians were informed about the research through a letter. They could object to letting their children participate in the experiment, but no guardian or parent did so. All procedures were discussed with the secondary school beforehand, all obtained data was anonymised directly and privacy of data during storage was ensured. To carry out the experiment and uphold the manipulation in the two conditions, participants were not explicitly told that they could withdraw from the experiment while taking part. Immediately after the experiment, however, all participants were fully informed and they were told that their performances did not result in an actual grade. Because speaking was planned to be assessed as part of the curriculum after the experiment, the speaking tasks in the experiment served as extra practice for the participants.

3.1.2 Participants

Due to data loss (no recording, poor recording quality, background noise), seven of the thirty participants were excluded. One native speaker of English was additionally excluded. The 22 remaining participants were all Dutch L2 English learners (12 men, 10 women) with a mean age of 15.50 years (SD = 0.78). They were in their fourth year of secondary school at the pre-university level. One student reported L1 Polish, who started learning Dutch at the age of two. Their English proficiency level was at B1/B2, according to their English teacher; students started learning English at a mean age of 8.07 years (SD = 3.48) with on average 7.27 years of exposure (SD = 3.36). For the follow-up stimulated recall experiment, seven students chose to participate and they received a gift card (€5) as compensation.

3.1.3 Materials

Participants completed two speaking tasks, two adjusted FLCAS questionnaires, and a background questionnaire. Seven students also participated in stimulated recalls.

The first speaking task asked participants to compare three strategies for studying, shown in pictures on the computer screen. The second speaking task was a scene-building exercise. Both participants saw an outcome picture on their screen (i.e., coffee being spilt) and each participant had different pictures that they could use to tell a story that led to the same outcome. The tasks (see appendix A on https://osf.io/qdr9t/) were designed and delivered in Qualtrics (). In both tasks, when participants were interacting with the experimenter, the interaction was minimized because the experimenter first took a long turn, after which the participant started with their long turn.

The speaking tasks used everyday situations and topics (e.g., coffee being spilt, studying styles) that could be done as simply or elaborately as the participants wanted. Therefore, these tasks could be seen as somewhat level-independent and task complexity would not play a role in speaking performance.

The background questionnaire, also administered through Qualtrics, was based on the LEAP-Q questionnaire () and asked participants about their language experience and proficiency (i.e., which languages they mastered, their exposure to learned languages, and the age of onset of learning those languages, see Appendix B on https://osf.io/qdr9t/).

The Qualtrics survey also contained an adjusted FLCAS () to obtain information about perceived anxiety levels by all participants in both experimental conditions. Participants needed to indicate on a five-point Likert scale whether they completely disagreed, disagreed, were neutral, agreed, or strongly agreed with the statements. In addition, the questionnaire included a question about their own perceived performance in order to be able to replicate one of the strongest correlates with anxiety (self-perceived performance, see ). To minimise the effect of participants becoming aware of the research topic, the questions were called reflection questions.

Six items out of the 33-item FLCAS were included in the survey (see Appendix C on https://osf.io/qdr9t/), which specifically tapped into the speaker’s experience while speaking in English. Excluded questions involved the English classroom setting or atmosphere in general, fear of negative evaluation, trait anxiety, and teacher interaction. This way, we only asked about anxiety during speaking and drastically shortened the questionnaire. For the sake of comprehensibility, the reflection questions were in Dutch.

3.1.4. Procedure

Before commencing the experiment, the experimenter and two students joined a video call in Microsoft Teams and carried out sound checks in the Qualtrics survey. The experimenter supervised and guided the students through the entire procedure via Microsoft Teams. When a participant was not involved in a speaking task, (i.e., when their peer was doing a speaking task with the experimenter), they were asked to hang up and were later invited to join the video call again.

Firstly, participants performed a speaking task either with a peer or with the researcher. After this speaking task, participants filled out the adjusted shortened FLCAS. Then, the other speaking task was performed with the other interlocutor followed by the questionnaire. The speaking tasks per condition and the order of conditions were both counterbalanced across participants.

Students were told that the speaking task with the experimenter would be graded, whereas the one with the peer was for practice and only counted minimally for their grade. To emphasize the difference between the two conditions, the experimenter interacted little with the student and used only a few supportive fillers or hedges. Students were asked to speak for a couple of minutes. Within the Qualtrics survey, there was a time constraint of five minutes for both speaking tasks. In total, the whole procedure took approximately 15 minutes per duo.

3.1.5. Materials and procedure for stimulated recalls

Stimulated recall is an introspective method that can be conducted retrospectively. In our experiment, participants were presented with their recorded speech sample and reflected on it, leading to information on the speech production process that quantitative data cannot reveal (; ).

Following Kahng (), the stimulated recalls were obtained by playing back recorded speaking tasks to the participants (n = 7, 4 men, 3 women). Due to the set-up of the experiment, none of the participants performed the stimulated recalls immediately after their speaking tasks. Instead, they performed them between six and 24 hours later.

To ensure that the participants could fully explain themselves, the stimulated recalls were conducted in their L1, Dutch. The experimenter explained that the students were going to comment on their thinking processes during speaking, in particular on what they were thinking while pausing or hesitating. The recording was paused when there was a silent pause, a filled pause, or a hesitation on which students could comment. Both the experimenter and participant could give directions to pause the recording. The experimenter’s interference was reduced to only pointing out pauses. The stimulated recalls between participant and experimenter took place using the screen-sharing function in Microsoft Teams. The participants could see the sound waves of their recordings in the phonetics computer program PRAAT () and they could see when the recording was put on pause. The participants’ responses were recorded on a mobile phone by the participant and sent through Microsoft Teams to the experimenter.

Four participants had started with the high anxiety condition and three participants had started with the low anxiety condition in the speaking tasks, and this order was adopted in the stimulated recall procedure. The stimulated recall procedure took about 25 minutes.

3.1.6. Analysis

Participants voices in both speaking tasks were recorded using microphones on their headphones. The WAV files were edited in PRAAT () so that the instructions from the researcher at the beginning and end of the audio files were cut out of all recordings. Likewise, long silences around turn takes and backchannels were deleted. These only occurred in the low anxiety condition. After thus editing the audio files, automatic measurement of fluency was conducted in PRAAT using two scripts described in De Jong et al. (). These scripts indicate syllables, filled pauses, and silent pauses automatically and save them into a so-called TextGrid. The threshold for silent pauses was set to 250 ms or longer ().

After this automatic analysis, the TextGrids were checked, marked and coded manually for all temporal measures (i.e., silent pauses, filled puases, repetitions, and corrections). The selection of temporal utterance fluency measures was based on Kahng () and De Jong et al. (). The following transcription of an excerpt from a participant in the experimental condition shows how repetitions and corrections were marked:

“Okay, I see eh pictures of a cat ehm that’s laying and eh by a window and looking at the ca- at the camera. Eh, the second picture is the picture of a b– of a bed in a bedroom with a lot of stuff in it- on it, like eh pillows and a blanket and just overall a lot of stuff.”

“at the ca- at the camera” and “the picture of a b- of a bed” are both marked as repetitions, because (part of) words are repeated by the participant. In contrast, “a lot of stuff in it- on it” is marked as a correction, as the preposition is changed and corrected.

After recoding two positively worded items, such that higher scores on all items now indicated higher anxiety, we calculated internal consistency (summability, ) and reliability (Cronbach’s alpha) of the items in the FLA questionnaires. Then, we carried out a manipulation check, comparing the averages of the FLA questions in both conditions. As an additional manipulation check, we compared the self-perceived performance. To investigate the effect of the manipulation on utterance fluency measures, paired-sample t-tests were carried out, comparing the two conditions on all utterance fluency measures mentioned in Table 1. To answer the first research question (pertaining to the extent that changes in FLA lead to changes in L2 learners’ L2 utterance fluency) in another way, we also calculated difference measures for additional analyses. We calculated these difference (D-) measures between conditions per participant for anxiety on the one hand, and for all fluency measures, on the other, by subtracting scores/measures for the low anxiety condition from the high anxiety condition. The manipulation would likely lead to heightened levels of anxiety, but not for all participants to the same degree. For instance, some participants may not be very impressed by the grading or by the interlocutor and her unsupportive behaviour. Establishing the correlation between the degree of heightened anxiety and the degree of differences in fluency is thus an additional way to answer the first research question.

Table 1

Overview of utterance fluency measures.


UTTERANCE FLUENCYDEFINITION AND OPERATIONALISATION

Speed

Mean syllable durationTotal spoken time/total number of syllables

Pausing

Number of silent pausesNumber of silent pauses/spoken time

Number of filled pausesNumber of filled pauses/spoken time

Mean duration of silent pausesTotal silent time/number of silent pauses

Mean duration of filled pausesTotal filled pause time/number of filled pauses

Repair

Number of correctionsNumber of corrections/spoken time

Number of repetitionsNumber of repetitions/spoken time

Finally, to answer the second research question on how anxiety leads to changes in cognitive fluency, all stimulated recalls were transcribed verbatim. The transcripts were then categorized based on Kahng’s () categories, which are content of message, vocabulary, grammar, phonology, and other issues. The categorized responses were counted and compared per experimental condition using a chi-square test. Representative examples of the different categories in the two conditions are reported and described in the results section.

3.2. Results

3.2.1. Quantitative analyses: Utterance fluency and anxiety measures

An analysis of the summability of the questionnaire items on FLA revealed that questions 1 through 6 showed consistency (summability of 0.41) and can be summarized into a single score. Reliability was also sufficiently high (Cronbach’s alpha = 0.81). Table 2 lists the overview of the utterance fluency measures, as well as the mean score for anxiety, separate for the two conditions.

Table 2

Overview of utterance fluency and questionnaire scores across anxiety conditions and paired t-test results.


MEASUREHIGH ANXIETY (n = 22)LOW ANXIETY (n = 22)PAIRED t-TEST RESULTS

MSDMSDt-VALUE p-VALUEd-VALUE

Utterance fluency

Mean syllable duration0.3040.0400.3010.0430.500.6210.11

# silent pauses0.5010.1160.5040.2581.210.2400.26

# filled pauses0.3450.1850.2740.1482.730.013*0.58

Mean duration of silent pauses0.9900.2630.8750.1931.760.0930.38

Mean duration of filled pauses0.1860.0930.1550.1041.610.1210.34

# corrections0.0320.0230.0290.022.5110.6150.11

# repetitions0.0300.0370.0390.034–1.180.2520.25

Questionnaire

Self-assessment6.4001.1326.9090.714–2.480.021*0.53

Anxiety2.9920.7432.5610.710–4.030.001*0.86

Note: Mean syllable duration and mean duration of silent and filled pauses reported in seconds; frequency of silent and filled pauses/repetitions/repairs reported per second.

Before conducting the inferential analyses, we ascertained the normality of all variables. Only for silent pauses per second could normality not be reasonably be assumed, due to an outlier. In the reported analyses, this outlier is removed (Note that the same conclusions would be drawn when performing non-parametric analyses with this participant included.). Table 2 shows the outcomes of the paired-sample t-tests. First, it was confirmed that the high anxiety condition (conversation with the external teacher) led to higher feelings of anxiety compared to the low anxiety condition (p = 0.001; d = 0.86). Additionally, as expected, the self-perceived achievement was higher when conversing with a peer compared to with a teacher (p = 0.021; d = 0.53). Following Plonsky and Oswald’s () benchmarks for small (.60), medium (1.00) and large (1.40) effect sizes for within-participant measurements, we can conclude that the experimental manipulation led to a small to medium effect on perceived anxiety and a small effect on self-perceived achievement. Concerning the utterance fluency measures, a significant difference was found for filled pauses per second only (p = 0.013; d = 0.58), with more filled pauses per second in the high anxiety condition compared to the low anxiety condition (a small effect). All other differences did not reach significance.

To ascertain to what extent heightened anxiety led to lowered utterance fluency measures per participant, we calculated the difference measure (D-measure) for the average anxiety, as well as the utterance fluency measures for each participant. Indeed, participants varied in their perceived anxiety differences, with three participants even feeling more anxious in the low compared to the high anxiety condition. We checked the assumptions for carrying out correlations and found that normality could not be assumed for the D-measure of silent pauses per second, due to one outlier. Table 3 shows the results for the correlations, with this participant deleted for this particular correlation (Note that including this participant and performing a nonparametric alternative led to the same conclusions.). We follow Plonsky and Oswald’s () benchmarks for effect sizes for correlations (rs close to 0.25 small, 0.40 medium, and 0.60 large). Firstly, D-anxiety was strongly related to the difference measures in perceived performance, which can be seen as a validation check of the D-measures for anxiety. Concerning the fluency measures, there was a medium to strong relationship between heightened anxiety (higher D for anxiety) and more silent pauses per second (higher D for silent pauses per second). All other correlations did not reach significance.

Table 3

Correlations between D-measures of anxiety and of fluency.


CORRELATION WITH D-ANXIETY r-VALUE p-VALUE

Utterance fluency

D-Mean syllable duration–0.2400.282

D-Number of silent pauses0.4920.023*

D-Number of filled pauses–0.0320.889

D-Mean duration of silent pauses0.0890.695

D-Mean duration of filled pauses–0.2080.354

D-Number of corrections–0.3320.131

D-Number of repetitions0.2330.296

Questionnaire

D-Self-assessment–0.6490.001*

3.2.2. Stimulated recalls

Table 4 lists an overview of the number of responses in the different categories. It shows that overall, there were more reported issues in the high anxiety condition compared to the low anxiety condition, primarily issues regarding the content of participants’ messages. To compare the number of times issues of content versus form (collapsing over vocabulary, grammar, and phonology) were mentioned in both conditions, we performed a chi-square test. It turned out that the number of issues reported on content, compared to form, was significantly higher in the high anxiety condition (46 on content versus 14 on form) than in the low anxiety condition 17 on content versus 19 on form, respectively; χ2 = 7.39, p = .006). In what follows, for each category, typical responses are elucidated through translated examples (following transcripts of original task performances).

Table 4

Overview of reported stimulated recalls across anxiety conditions.


RESPONSE CATEGORYHIGH ANXIETY (n = 7)LOW ANXIETY (n = 7)


#%#%

Content of message4673.01738.6

Vocabulary1117.51738.6

Grammar11.624.5

Phonology23.200

Other34.8818.2

Total6310044100

Typically, participants reported that during pauses or hesitations, they were trying to work out what to say next. Responses that fit the category content of message occurred most frequently in the high anxiety condition (73.0%). In the low-anxiety condition, an equal amount in the content and vocabulary categories was found (38.6%). The following translated examples are prototypical for the content of message category:

  • 1. Speaking task performance participant 6 in high anxiety condition:
    eh, eh, okay, I maybe eh there was a person who has to go to his work and he eh was in a hurry because he eh overslept and the dog was still sleeping eh and he eh was eh maybe eh he-he-he couldn’t find his clothes so he eh messed up his whole eh bedroom and w-with the cat still laying there
    Translated response: “The runs of speech in which I do not say ‘uhm’ are the runs I had made up while saying ‘uhm’. So, during many of the ‘uhms’ that I say, I am thinking about how I should continue. Because honestly, I did not know where my story was going.”
  • 2. Speaking task performance participant 5 in low anxiety condition:
    well, can give eh distractions to you, because, ehm because you have eh much eh things going on there
    Translated response: “And here I think I was thinking about why it would be distracting.”

During pausing, particularly in the high but also often in the low anxiety condition, participants reported that they were trying to anticipate what to say next at the abstract level of content, rather than the formal level of language use. Nevertheless, participants did report formal difficulties, such as word retrieval. As already mentioned, in the low-anxiety condition, vocabulary issues and issues of content were reported equally often. With respect to issues reported on vocabulary, some participants drew upon translating from Dutch to English, as becomes evident from the following examples:

  • 3. Speaking task performance participant 2 in high anxiety condition:
    and the third picture is oh eh a clock which makes a lot of noise ‘cause you see the ehm yeah the sound waves eh by it so it means yeah that it’s ringing
    Translated response: “I had totally forgotten the word “[Dutch ‘alarm’] in English and I was looking for a solution for this.”
  • 4. Speaking task performance participant 5 in low anxiety condition:
    I like ehm eh ehm take notes in my notebook and eh give things another colour because then I eh can really see the difference between eh things
    Translated response: “Yes, I remember this point. At this point, I was thinking whether I knew what the word [Dutch ‘highlight’] in English was, but I couldn’t think of it, so I had to think of something else.”

In addition, participants also explained that they were looking for words in English in general:

  • 5. Speaking task performance participant 4 in high anxiety condition:
    eh on the second picture people are – people are studying together which means that which means that you can work out problems together and come to insights you normally wouldn’t get alone
    Translated response: “At this point, I had to think about in what way I could say that together, you can arrive at ideas that you would not think of yourself. I was not translating from Dutch or anything but I was trying to find the words for this in English.”
  • 6. Speaking task performance participant 5 in low anxiety condition:
    I think that ehm eh learning eh with eh a computer is in eh eh is eh well can give eh distractions to you because ehm because you have eh much eh things going on there and eh not only the thing you are supposed to do I think
    Translated response: “At this point, I was thinking of a word. About how I would say that it is both disadvantageous and advantageous.”

Responses that fell into the grammar category appeared twice in the low-anxiety condition and once in the high-anxiety condition. In the high anxiety condition, participant 5 reported on the use of tense:

  • 7. Speaking task performance participant 5 in high anxiety condition:
    but ehm when eh the lady walked away the the dog eh came back to the man and ehm the man ha- didn’t see it coming the dog
    Translated response: “Yes, at this point I was not sure how I would say this. So I am thinking about if it should be past tense or if it could stay present tense.”

The other two grammar comments were in the low anxiety condition and had to do with word formation, for instance:

  • 8. Speaking task performance participant 1 in low anxiety condition:
    yes but do you think you could eh you-you’ll get eh better marks if you would eh put more ehm if you start summarizing or marking if you put more work in it
    Translated response: “I tried to transform a Dutch sentence into an English sentence but that did not go well.”

Only two participants in the high anxiety condition reported issues related to the phonology category:

  • 9. Speaking task performance participant 2 in high anxiety condition:
    okay I see eh pictures of a cat ehm that’s laying and eh by a window and looking at the ca- at the camera eh
    Translated response: “I forgot how to pronounce the word ‘camera’. Where you are supposed to put the stress.”
  • 10. Speaking task performance participant 6 in high anxiety condition:
    eh in the first picture of mine you see eh someone’s watching eh his watch
    Translated response: “This is going to sound very silly, because of watching and watch. So yes, but I had to say it anyway.”

Responses that could not be assigned one of the categories were put under the other label. For instance, some participants reported that they could not remember why they paused (n = 4) or explained that they were looking at the pictures without anticipating their speech (n = 2).

  • 11. Speaking task performance participant 5 in low anxiety condition:
    yeah well I-I don’t like eh study together bu-because I eh rather do it on my own
    Translated response: “Yes, I do not know actually. I think about what I wanted to say. I do not think I struggled with a word, anyway.”

In example (11), participant 5 tries to come up with an explanation for her pausing. However, in the first instance, she claims to have no memory, and therefore, it was categorized as other. In addition, in two instances, participants attributed the pausing to the turn-taking of their peer (note that participants listened to the full conversation in the stimulated recall). Finally, responses in this category involved participants reporting stress and not feeling comfortable (n = 2) and one response concerned external noise.

4. Discussion

This study sought to investigate the effect of FLA on utterance fluency and cognitive fluency. Anxiety was manipulated experimentally within participants by having learners interact with a peer where their performance would only slightly impact their grade (low anxiety condition) and by having the same participants interact with an external teacher where their performance would be graded (high anxiety condition). Utterance fluency was operationalized through measurements from their speaking performances and cognitive fluency was operationalized through stimulated recall on (dis)fluency episodes. Firstly, it was found that self-perceived anxiety was indeed affected as predicted. Additionally, self-perceived performance was, as expected, lower in the high anxiety condition.

Concerning utterance fluency, an effect of manipulation was established on filled pause usage only. On average, the participants used more filled pauses when speaking with the external teacher than when they were speaking to a peer. Additionally, after calculating difference measures for both anxiety and utterance fluency, we carried out correlations between these measures. Thus, we gain additional information to answer the first research question about how changes in FLA are related to changes in L2 learners’ utterance fluency. We found that participants who experienced larger differences in perceived anxiety tended to show larger silent pause differences (more silent pauses in the high anxiety condition compared to the low anxiety condition) than those learners who only experienced a small difference in anxiety due to the manipulation. Both findings are in line with previous research, for instance with Pérez Castillejo () who also showed that aspects of breakdown fluency were related to levels of anxiety.

Concerning the qualitative analysis carried out for the second research question investigating how anxiety affects L2 cognitive fluency, this study found that in the high anxiety condition, participants reported most issues in the category content of the message. In the low anxiety condition, on the other hand, the distribution of content-related issues and form-related issues were reported almost equally often. Our finding that heightened FLA primarily led to issues in the conceptualization stage is the exact opposite of the hypothesis that FLA chiefly impacts encoding and formulation (; ; ; ), which has been supported by previous research (; ). The current results are more in line with Kormos’ () prediction that conceptualization can also be affected because switching attention between the different stages in speech production would become less efficient (including switching to and from the stage of conceptualization).

Whereas the number of issues reported on content versus form was different between the two conditions, we did not find qualitative differences in the reported issues. In other words, the type of issues related to content did not differ when speaking in either condition, and neither did the type of issues related to form differ qualitatively between the two conditions.

Suffice it to say, this study is not without limitations. Firstly, the way in which anxiety was manipulated may include a confound: the external teacher minimized the amount of interaction in the high anxiety condition, whereas the peer in the low anxiety condition collaborated in interaction. It could be that participants are more fluent when the setting is more like a dialogue (see ), resulting in the confounding variable (dialogue). However, the significant and medium to strong correlation between the difference value between the two conditions for silent pauses per second with the difference value for anxiety cannot be attributed to confounding variables and is additional evidence that anxiety is related to fluency. It should be noted, however, that this correlation could be bi-directional: participants may use more pauses because of heightened anxiety, and participants may become more anxious because of their numerous pauses. Another limitation concerning the experimental manipulation is that anxiety was operationalized through manipulating two variables at the same time (interlocutor and grade). Therefore, the finding that participants used more filled pauses per second in the high anxiety condition may be attributable to either one of the variables manipulated and to the confounding variable. A third limitation in the current study may be in the way FLA and utterance fluency were measured. By adopting a dynamic and situated approach to FLA (; ), the items in the questionnaire were indeed referring to the two specific speaking situations. This is in contrast to some other studies, which adopted a more trait-like stable approach to FLA (). Whether the relationship between fluency and the currently measured state-like anxiety is different from the relationship found in previous studies measuring trait-like anxiety could not be investigated. Concerning measures of utterance fluency, because the current study used automatic measures, the locations of pauses were not taken into account, in contrast to recent studies such as Bielak (). Lastly, the time between speaking tasks and stimulated recall was longer than usually advised (; ). Retrospective and introspective methods like stimulated recall hinge on individuals’ capacity to memorize and verbalize thought processes and are therefore both personal, unique, and subjective in nature.

5. Conclusion

To conclude, to the best of our knowledge, this study is the first to experimentally manipulate FLA to measure effects on fluency in a within-subjects design, using a mixed-methods approach. In line with previous research, we found that an aspect of fluency (filled pause use) was indeed affected by the manipulation. Additionally, the degree of heightened anxiety was related to a higher silent pause use. Finally, the qualitative analyses showed that in the high-anxiety condition, learners remembered more issues related to content planning compared to issues referring to linguistic encoding and formulation. We call for future experimental research to further investigate the discrepancy between our results and previous research that observed that heightened anxiety would lead to issues during encoding and formulation rather than conceptualization.

We finish the paper by highlighting some pedagogical notes. This study included a stimulated recall for research on cognitive fluency in L2 learners, but we wish to note this method also has potential in foreign language teaching in general. Learners and teachers gain insights about students’ learning needs when learners reflect on their speech production processes. Together with the help of their teacher, they might learn and think about strategies to improve their proficiency. For instance, if learners tend to fall silent when they cannot retrieve specific vocabulary, they can work on circumvention and expanding their lexicon. Moreover, the activity of sharing difficulties while speaking may incite feelings of recognition and relief among peers, which in turn may reduce feelings of anxiety. Even teachers can share their difficulties during speaking and show how hesitations in speaking are natural and used by L1 speakers as well. Currently, resources in and outside the L2 classroom provide ample opportunities to let students and teachers record, review, and reflect on their speaking performances. Newly developed didactics indeed used such resources to have students reflect on their speaking performances (). A final pedagogical implication comes more directly from the current and previous research findings: because lower levels of anxiety are related to higher levels of fluency, teachers may strive to maximize low-anxiety situations in the classroom to optimize positive speaking experiences.