1. Introduction

This study examines the effect of formulaic expressions (henceforth FEs) on the development of L2 syntax in a longitudinal learner corpus. There is a large body of literature in Applied Linguistics concerned with the identification and role of FEs in SLA (e.g., , ; ), as well as a growing interest in the interaction of input and usage on the acquisition of modular linguistic knowledge more generally (e.g., , ). Longitudinal studies investigating the relationship between FE use and later syntactic development have been more widely explored within usage-based (UBL) frameworks, regarding the extent to which learners’ L2 utterances at later stages of acquisition can be traced back ontogenetically to previously used FEs which embody the same utterance schemas and or/schematic patterns (; ; ). Studies of this nature feature less within the generative framework. Those few that have explored this interaction within a classroom context, however, have found evidence for L2 learners using syntactically complex FEs as building-blocks towards creative language use of a similar functionality (). More specifically, Hammond and Gil () recently analysed longitudinal production data and found that the use of fixed wh-expressions (henceforth FEswh) at the initial state seemed to ‘bootstrap’ learners into an incremental development of L2 phrase structure (i.e., from Verb Phrase (VP) to Tense Phrase (TP) to Complementiser Phrase (CP)). Learners who interacted more with these expressions showed a better L2 knowledge of functional categories T(ense) and C(omplementiser) more generally by the end of the data collection period. Studies of this kind question the consensus that has been held within the generative tradition that, despite FEs being an effective communicative tool, the creative language process develops independently of their use and/or analysis (; ).

The present study takes a novel approach to investigating the role of FEs on learners’ syntactic development, arguing that a combination of usage-based and generative analyses as outlined above can offer a better insight into this phenomenon than either model can do independently. Analysing a subset of the Barcelona English Learner Corpus (BELC), we show that learners’ initial use of memorised FEswh is facilitative on their later syntactic development both in terms of utterance-schema extraction as well as knowledge of their associated computational mechanisms more generally. We present how an application of both approaches is useful for understanding the role of input and usage on the acquisition of formal linguistic features and discuss the significant role that memorised formulaic language can play in this process.

Section 2 first outlines both generative and usage-based approaches to SLA, specifying the perceived role of FEs in each framework. Section 3 presents the data, and Section 4 analyses the identified FEs as products of abstract computational derivation (generative) and abstract utterance schemas (usage-based). Section 5 presents the results, and Section 6 gives a discussion of these. Section 7 concludes.

2. Formulaic expressions in generative vs usage-based approaches to SLA

2.1 Generative approaches to FEs in SLA

Under generative models, language is modular. Syntax is formalised as ‘Merge’, which via the operation `Select’, takes items from the lexicon and forms composed elements through recursive computational procedures (). These procedures, namely, computational properties, are driven by features on functional categories and result in a variety of overt surface forms. Merge and Select are universal syntactic operations, a part of Universal Grammar (UG), which is taken to be an innate endowment of human beings (). Generative second language acquisition (GenSLA) is largely concerned with the interplay between UG, knowledge that comes from the L1, and knowledge that comes from exposure to the target language (the L2) (). There are competing theories within the paradigm as to how these aspects interact. For example, there are those models that claim full transfer from the L1 at the initial stages of SLA, (known as the Strong Continuity Hypothesis) () and others that assume an incremental development of phrase structure where the L2 initial state is largely lexical in nature, (known as the Weak Continuity Hypothesis) ().

Regardless of the Strong/Weak continuity debate, how exactly L2 input and usage can trigger modular syntactic knowledge is an ongoing line of investigation. Despite an increased interest in exploring this interaction in instructional/classroom contexts (e.g., ), there has been little focus from generative studies to investigate the role of FEs in this capacity, despite these constituting a significant proportion of L2 classroom input (). An exception is Myles and colleagues (; ), who analysed spoken production data of English classroom adolescent learners of L2 French over a period of 2 years. The authors note how at the early stages of data collection, the same learners produced syntactically complex FEs such as quel âge as tu? [how old are you?], while at the same time producing ungrammatical sentences in similar functional environments, such as *il age frère?- [he age brother?] (how old is your brother?) that lacked wh-fronting and inversion in the L2. They then checked how learners overextended and modified these expressions over the course of the data collection period to produce similar functional structures. For example, learners were shown to add NPs such as la fille [the girl] to the formulaic expression (1a) which led to overextensions such as (1b) before modification led to the correct structure (1c):

(1)a.  comment t’appellestu?
    howcall yourselfyou
    ‘what is your name?’
 b.*comment t’appellestula fille?
    howcall yourselfyouthegirl
    (lit.) ‘what is your name the girl?’
 c.  comment s’appellela fille?
    howcall-herselfthegirl
    ‘what’s the girl’s name?’

The authors concluded that FEs provided learners with a databank of complex structures beyond their initial state grammars, and that learners kept ‘working on’ these until their current generative grammar (which developed in an incremental fashion) was compatible with them.

In a similar study, Hammond and Gil () recently analysed the spoken production data of 9 classroom longitudinal Spanish/Catalan learners of English over a period of 7 years. They found that learners across the data collection period also made extensive use of highly prototypical wh-expressions derived from their classroom input; ‘what is your name?’, ‘where are you from?’, ‘how old are you?’ and ‘where do you live?’. Like the anglophone learners in Myles et al., (), at the initial stages of data collection these expressions were produced in advance of knowledge of their associated syntactic derivations (wh-movement, inversion etc.). Unlike Myles’ learners, however, Hammond and Gil () found no evidence of learners overextending or modifying these expressions erroneously in similar functional structures. Rather, those learners that interacted more with these expressions at the initial stages of data collection (ages 10 and 12) were quicker to develop a more complex L2 grammar (e.g., VP-TP-CP). Hammond and Gil () interpret syntactically complex fixed expressions as ‘bootstrapping’ mechanisms into higher syntactic categories, using processing models of SLA (e.g., MOGUL) to explain their results. However, the authors did not conduct a usage-based traceback analysis of the data, so it was unclear whether some of the observed syntactic development can be accounted for via utterance schema extraction and generalisation of the model FEwh forms.

2.2. Usage-based approaches to FEs in SLA

Rather than a dichotomy of syntax and lexicon, UBL propose a lexicon in which ‘abstract grammatical patterns and the lexical instantiations of those patterns are jointly included, and which may consist of many different levels of abstraction’ (, pp. 228–229). For UBL, formulaic expressions that are high in frequency, functionality and prototypicality play a central role in SLA. It is argued that a learner’s long-term knowledge of such can serve as the ‘database’ for their language acquisition (e.g., ). The proposed usage-based learning pattern for both L1 and L2 acquisition is from formulaic expression to utterance schema (known also as semi-fixed or slot and-frame pattern) to fully productive schematic pattern (; ). For example, through frequent exposure and usage of the prototypical formulaic exemplar ‘where do you live?’, learners can derive the utterance schema in (2a) before finally acquiring the fully schematic wh-question pattern in (2b):

(2)a.[where do you VERB]
  e.g., where do you go?, where do you pay?
 b.[WH + AUX DO + PRN + VERB]
  e.g., what does he do?, when do you go?

As UBL frameworks perceive fluidity among linguistic patterns and the abstraction of any generalities within recurring, prototypical exemplars (), any utterance schema for which a formulaic expression exemplifies is derivable from its abstract schematic construction. For instance, the utterance schema [do you + X] is equally as derivable as [where do + X] from the exemplar ‘where do you live?’. Utterance schemas can be lexically [what do + X] or categorically [WH + AUX DO + X] specific to their formulaic exemplars, where lexically specific schemas maintain some of the same lexical items and categorically specific ones the more general grammatical category sequencing. One example for L2 acquisition is Eskildsen (), who investigated the longitudinal development of L2 English question formation and deduced that their subjects were constructing wh-questions based on more general [WH + COPULAR + X] and [WH + AUX DO + X] utterance schemas derived from their usage. Some example utterances that exemplified these schemas are shown in (3) and (4):

(3)[WH + COPULAR + X]
 a.what is your name?
 b.when is your birthday?
 c.when were you born?

(4)[WH + AUX DO + X]
 a.where do you live?
 b.how do you say?
 c.where does she work?

A significant component of L2 learning in UBL is therefore the abstraction and subsequent generalisation of FEs, which can be understood as the gradual expansion of varied utterance schema use (). Importantly, FEs that are identified as having initiated schematic development must precede all other instantiations ontogenically in longitudinal learner data (). That is, learners must be shown to produce the proposed FEs in advance of any other instantiation of related utterance schemas and/or fully schematic patterns. For example, to reliably argue that ‘where do you live?’ has instantiated the utterance schemas [where do + X] or [do you + X] for a particular learner, ‘where do you live?’ must appear in this learner’s data before all other utterances which embody these schematic frames.

2.3. Research Questions

The present study analyses a subset of the Barcelona English Language Corpus (BELC) to examine how learners’ use of fixed wh-expressions (FEswh) interacts with their corresponding L2 syntactic development. To further explore the trends observed in Hammond and Gil () with a novel analysis that considers both generative and usage-based frameworks, we distinguish the following research questions:

  1. Does use of identified FEswh lead to better L2 knowledge of the expressions’ underlying computational properties as conceptualised under generative frameworks?
  2. Can learners’ L2 interrogatives be traced back to utterance schemas of FEswh in learners’ production data ontogenetically?

From the results of Hammond and Gil (), we can predict that the current study will observe a correlation between FEwh use and better L2 knowledge of the expressions’ specific computational properties involved in their generation (i.e., wh-movement, T-C movement, A-movement etc), despite the general consensus amongst generative studies positing no relationship between FE use and L2 acquisition. This is because Hammond and Gil () found that learners who more frequently used FEswh were the ones whose L2 grammars showed an incremental development quicker moving from a bare VP to TP to CP stage. From the results of past usage-based longitudinal studies, we can predict that learners’ L2 interrogatives can be traced back to utterance schemas of previously used FEswh in their production data.

The current paper aims to bring these two analyses together to show that the most comprehensive account of learners’ syntactic development seeded by FEwh use is achieved by combining the results derived from both approaches.

3. Methodology

3.1. The Barcelona English Learner Corpus (BELC)

Our data comes from transcripts in the spoken longitudinal Barcelona English Language Corpus (BELC) (). Nine balanced bilingual Spanish/Catalan EFL Catalonian state-school beginner learners of English participated in naturalistic L2 interview tasks across four rounds of data collection (Table 1). These rounds can be split into two groups: early years (ages 10 and 12) and later years (16 and 17), as seen in Table 1.

Table 1

The four rounds of data collection for the 9 learners under analysis and corresponding hours of classroom English instruction (accumulative).


AGEHOURS OF INSTRUCTION

early years10200

12416

later years16726

17826

To make an observation on the learners’ progression across different rounds, nine learners were chosen for analysis out of the 55 that constitute the entire corpus, as these were the only learners that participated across at least three rounds of data collection. Spoken tasks consisted of an interview, narrative, and role-play. The interviews were semi-guided, beginning with a series of questions about the learner’s family, daily life and hobbies and included a section whereby learners were required to ask questions to the interviewer. The narrative task was elicited from a series of six pictures that learners could freely look at before and during their telling of the story to the interviewer. Finally, the role-play task was performed in randomly chosen pairs, where one of the students was given the role of the parent and the other the child, which they would swap after completing an interaction. The learner acting as the child was required to ask permission to have a party at home, and both students were asked to negotiate arrangements such as time setting and choice of activities.

Importantly, beginner learners with only school exposure to English fulfilled the conditions for comparison in the data. For example, it was not the case that any of these pupils had more hours of instruction via extracurricular exposure or retaking a course grade. Controlling for these factors meant that the learners’ linguistic environment was homogenous and therefore highly predictable, making them an ideal test ground for comparison.

As in Hammond and Gil (in press), we extracted the four most frequent expressions that were presented holistically to learners in spoken tasks from two local and two global EFL textbooks. These were the following wh-questions:

(5)a.what’s/is your name?
 b.how old are you?
 c.where do you live?
 d.where are you from?

3.2. Learner productions of the fixed wh-expressions (FEswh)

A manual analysis of the corpus revealed that all nine learners produced the extracted FEswh and the overall distribution of them can be seen in Table 2. Note that ‘NT’ stands for ‘no transcript’ and indicates that the learner did not participate in that round of data collection. A dash ‘-’ means that a learner participated but was not shown to produce an FEwh.

Table 2

BELC learners’ productions of the identified FEswh.


LEARNERAGE 10AGE 12AGE 16AGE 17

2Hm <what are> [\\] what [\\] what is your name how old are youwhat’s your name

5what’s your namewhat’s your namewhat’s your name

7what’s your namewhat’s your name how old are you where do you livewhat’s your name first of all how old are you (x2) well and where do you live

13~~what’syour name where do you live what’s her namewhat’s your name and where do you live how old are you

18what’s your name how old are you *what do you live~

27NThow oldare you where do you livehow old are youwhere do you live what is your name

38NTwhat’s your name *where you livewhat’s your namewhere do you live now

42NTwhat is your name where are you fromwhat is your name *where is you from

47how old are you what’s yournameNThow old are you where do you livehow old are you (x2) what’s your name where are you from

At the age learners are first shown to produce an FEwh, the overwhelming majority of other L2 utterances outside of these expressions are ungrammatical and/or of a much lower syntactic complexity (6a-c, 7a-c) and they still rely heavily on the L1 (6d, 7d). Some example utterances from Learner 2 and Learner 5’s transcripts are given below to demonstrate:

(6)Learner 2: Age 16: what is your name?, how old are you?
 a.*study
 b.*going to excursion
 c.*the mother (.) hm read the map
 d.  the dogsales de (.)delascestas[SPANISH]
    thedogcomesoutofofthebasket 
    ‘the dog jumps out of the basket’

(7)Learner 5: Age 10: what’s your name?
 a.*the mum it’s
 b.  I study
 c.  girl and boy see
 d.  mi hopotsreptir?[CATALAN]
    meyoucanrepeat 
    ‘can you repeat it for me’?

The FEswh can therefore be confidently categorised as ‘formulaic’ and salient for our learners, and when first produced are of a higher syntactic complexity than the majority of other L2 utterances produced by the same learners. Section 4 now presents our analysis. It first outlines the FEs’wh syntactic derivation under a generative model and then presents how these would be conceptualised as abstract schematic constructions under usage-based models.

4. Analysis

4.1. The fixed wh-expressions as products of computational derivation

Under mainstream generative grammar, the derivation of the FEswh involves the Merging of lexical items via computational procedures driven by features on functional categories T and C. All are wh-questions, involving the computational properties A-movement, wh-movement, T-C movement, and V-raising, and ‘where do you live?’ also involves do- support. A syntactic tree is given in Figure 1 for ‘what is your name?’ to exemplify this derivation.

Figure 1 

what is your name’ assumed syntactic structure.

These computational properties have the potential to manifest overtly via a variety of surface structures. Following Hammond and Gil (), Table 3 outlines the surface phenomena that we take as evidence for their manifestation.

Table 3

FEs’wh computational properties and reliable surface structures that evidence their manifestation.


COMPUTATIONAL PROPERTY wh- MOVEMENTT-C MOVEMENT do- SUPPORTA- MOVEMENT

surface structure evidence
  • – wh- words occupying a clause-initial position in root interrogatives
  • – exclamative clauses
  • – relative clauses
  • – interrogative complement clauses/ embedded wh- clauses
  • – inversion of the subject and (auxiliary) verb
  • – via negation
  • – via question formation
overt subjects used with structures that imply a TP projection, including:
  • – corresponding finite verbal inflection [TNS, NUM, AGR] in declaratives (not including is/are)
  • – modal/auxiliary verbs (including dummy do)
  • wh-movement
  • – T-C movement
  • – ‘infinitival to’

Note that we are conservative in what we accept as surface structure evidence, to measure the manifestation of these properties as reliably as possible. A-movement, for example, is only assumed when overt subjects appear with other overt evidence for functional category T (such as an inflectional morpheme or auxiliary verb) and excluded from the count are highly frequent irregular conjugations which are often rote-learned in the EFL classroom (i.e., present simple clauses with be (I am, you are) and have (you have, he has)). We also measure learners’ L2 accuracy of these properties as a relative percentage out of all production possibilities, as learners have the potential to realise a given utterance during the data collection period in the L1, via translanguaging, accurately in the L2 or inaccurately in the L2. An example with do-support can be used to illustrate this procedure. Say that in a learner’s transcript at age 16, there were 9 contexts, as shown in (8a-i), which require do-support in English, and our example learner realised these as below (where the intended English output is given in squared brackets []).

(8)a.I don’t go to school 
 b.He not like it[He doesn’t like it]
 c.No se[I don’t know]
 d.No se[I don’t know]
 e.No want eat tonight[I don’t want to eat tonight]
 f.Want go there tonight?[Do you want to go there tonight?]
 g.He doesn’t gustar la comida[He doesn’t like the meal]
 h.Te gusta la musica?[Do you like music?]
 i.Do you speak English? 

Out of these 9 contexts where do-support should manifest, 3 of these are realised in the L1 (c, d and h), 1 via translanguaging (g) and 5 are attempted in the L2 (a, b, e, f and i). Out of these 5 L2 attempts, only 2 of these utterances are accurate (i.e., grammatical) (a and i). This learner’s L2 accuracy rate of do-support at age 16 is therefore 22%, as they realise 2 accurate utterances in the L2 out of a possible 9 contexts.

In Section 4.2, we now analyse the FEswh as abstract schematic constructions under usage-based models and outline associated utterance schemas which are potentially extractable and generalisable across similar functional structures.

4.2. The fixed wh-expressions as abstract schematic constructions

Rather than a computational system, the level of ultimate abstractness for UBL consists of schematic knowledge of symbolic units, that is, the storage of lexical items as a range of fully schematic constructions. Following Eskildsen (), the FEswh would represent the fully schematic constructions below.

(9)a.what’s/is your name?[WH + COPULA + PossDET + NOUN]
 b.how old are you?[WH + ADJ + COPULA + PRN]
 c.where do you live?[WH + AUX DO + PRN + VERB]
 d.where are you from?[WH + COPULA + PRN + PREP]

Usage-based models posit an acquisition of fully schematic constructions and/or utterance schemas through the analysis and subsequent generalisation of prototypical, formulaic expressions that exemplify these constructions. Due to their saliency, prototypicality and formulaicity for all learners under analysis, the FEswh are good candidates for acquisitional seeds in this proposed developmental sequence. They are also all produced in isolation and in advance of any other grammatical L2 utterance of a similar complexity (see Section 3). Adopting this learning strategy, for example, learners could gradually move from the FEwh [what is your name?] to a derived utterance schema (a fixed part and open slot) [what is + PossDET + NOUN], to the fully schematic construction [WH + COPULA + PossDET + NOUN], as schematised in Figure 2.

Figure 2 

A usage based developmental trajectory of the schematic construction [WH + COPULA + PossSUBJ + NOUN] derived from the formulaic exemplar what’s your name.

Equally, as past studies on English L2 interrogative development have suggested (see Section 2.2), learners can use FEswh to derive more general ‘wh-question’ utterance schemas. Utterance schemas based on fixed wh-questions traditionally comprise the [WH + VERB] element, based on evidence that a learner’s earliest wh-questions produced with an auxiliary and/or copula can be explained with reference to formulaic patterns that begin with a limited range of these schemas (; ). Based on the FEswh, this would give for the following utterance schemas, which have the potential to be lexically (10) and/or categorically (11) specific.

(10)a.what’s/is your name?[what is/’s + X]
 b.how old are you?[how old are + X]
 c.where do you live?[where do + X]
 d.where are you from?[where are + X]

(11)a.what’s/is your name?[WH + COPULA + X]
 b.how old are you?[WH + ADJ + COPULA + X]
 c.where do you live?[WH + AUX DO + X]
 d.where are you from?[WH + COPULA + X]

As any utterance schema is potentially extractable from formulaic exemplars, learners could also extract the FEswh’ [VERB + SUBJ] utterance schemas and omit the wh-element to derive yes/no questions. These lexically and categorically specific yes/no question utterance schemas are shown in (12) and (13) respectively.

(12)a.what’s/is your name?[is your + X]
 b.how old are you?[are you + X]
 c.where are you from?[are you + X]
 d.where do you live?[do you + X]

(13)a.what’s/is your name?[COPULA + PossDP + X]
 b.how old are you?[COPULA + PRN + X]
 c.where are you from?[COPULA + PRN + X]
 d.where do you live?[AUX DO + PRN + X]

To examine whether learners’ L2 questions shared an utterance schema/fully schematic pattern of a previously used FEwh in their production data, we adopted a traceback methodology and created individual learner tables documenting their FEwh productions and L2 questions across the four rounds of data collection (ages 10, 12, 16 and 17). Underneath each FEwh and L2 question, we specified their lexically (i) and categorically (ii) specific utterance schemas, as well as their fully schematic patterns (iii). We then underlined instances where these of a L2 question matched those of a previously produced FEwh. Learner 13’s wh-questions can be seen in Table 4 as an example. Note that where FEswh are not shown for a certain age, this means that the learner did not produce an FEwh at this age. ‘NT’ refers to ‘no transcript’, meaning that the learner did not participate in that round of data collection, and a dash ‘-’ indicates that learners did participate but were not shown to produce any wh-questions in the L2 at this stage.

Table 4

Traceback methodology: Learner 13’s FEwh productions and L2 wh-questions.


#AGE 10: wh-QAGE 12: wh-QAGE 16: FEwh AGE 16: wh-QAGE 17: FEwh AGE 17: wh-Q

13 what’s your name
  1. [what’s] + X
  2. [WH+COPULA] + X
  3. [WH+COPULA]+PossDE T+NOUN]
where do you live
  1. [where do] + X
  2. [WH+AUX DO] + X
  3. [WH+AUX DO + PRN +VERB]
what’s her name
  1. [what’s] + X
  2. [WH+COPUL A] + X
  3. [WH+COPUL A]+PossDET+ NOUN]
what’syour name
  1. [what’s] + X
  2. [WH+COPULA] + X
  3. [WH+COPULA]+Poss DET+NOUN]
where do you go the last weekend
  1. [where do] + X
  2. [WH+AUX D01+ X
  3. [WH+AUX DO + PRN + VERB]
*what language talk you in the house
  1. [what language talk] + X
  2. [WH+N+V] + X
  3. [WH+NOUN+VER B+PRN+Prep+Det+ NOUN]

Table 4, for example, shows that one L2 wh-question in Learner 13’s transcripts share the same wh-question utterance schema and fully schematic pattern of a previously produced FEwh. This is ‘where do you go the last weekend?’ produced at age 17 after using ‘where do you live?’ one year previously at age 16, sharing the same fully schematic pattern [WH + AUX DO + PRN + VERB].

5. Results

Section 5 presents the results of both the generative and usage-based analyses of the data, before bringing these together in Section 6. We begin with the generative analysis.

5.1. FEwh use and later knowledge of associated computational properties

Although all learners are shown to produce the FEswh across the data collection period, they differ in their frequency of FEwh productions and age they first produce an FEwh. We test the effect of these two variables on learners’ L2 accuracy of associated computational properties at the later stages of data collection (ages 16 and 17). ‘Age of first FEwh production’ refers to the age in which a learner first produces an FEwh in the corpus (e.g., 10, 12, 16 or 17) and ‘frequency of FEwh production’ refers to the number of FEswh learners produce at the early ages (ages 10 & 12), not including repetitions. We measure learners’ L2 accuracy of the computational properties at the later ages as a mean average between their relative accuracy score at age 16 and that of age 17. Table 5 demonstrates this with Learner 47.

Table 5

Learner 47 L2 accuracy rates of computational properties at the later ages.


LEARNER 47

AGE wh -MOVET-C MOVE do – SUPPORTA- MOVEMEAN

16(2/4) 50%(5/7) 71%(6/11) 55%(36/52) 69%61%

17(3/3) 100%(6/6) 100%(7/7) 100%(29/29) 100%100%

mean75%86%%78%85%81%

These are discussed Section 5.1.1. and Section 5.1.2. respectively below.

5.1.1. Age of first FEwh production

Figure 3 displays a scatterplot showing the learners’ age of first FEwh production (y-axis) and their mean L2 computational accuracy rates in all required contexts (calculated as a combined average between wh-movement, T-C movement, A-movement and do- support) at the end of the data collection period (x-axis, ages 16 and 17). The scatterplot shows a negative slope regression line, which indicates an amount of linearity between a younger age of first FEwh production and a higher L2 computational accuracy rate at the later ages. Those learners who produce an FEwh for the first time at age 16 are clustered towards accuracy rates between 20–40%, whereas those who produce them at ages 10 and 12 are largely between 80–100%.

Figure 3 

Scatterplot showing learners’ age of first FEwh production and mean L2 accuracy of computational properties at later ages (16 & 17).

To investigate this linearity further, we ran correlations between these variables, shown in Table 6. Correlations were run between age of first FEwh production and each computational property individually, as well as with these individual accuracy rates combined as a mean average (as in the scatterplot above). Following recent developments in the application of statistics in SLA which question assumptions of significance traditionally derived by p values (; ), we have included confidence intervals (CIs) in tandem with bootstrapping to give a more accurate picture of the r effect sizes. We have also adjusted the alpha level to .15 (from the traditional .05) to compensate for small SLA data samples (; ), and measure effect sizes for SLA following Plonsky & Oswald () as r = .2 as a small effect, r = .4 as a medium effect and r = .6 as a large effect.

Table 6

Correlation coefficient between age of first FEwh production and L2 accuracy of computational properties at the later ages.


L2 ACCURACY (AGES 16 & 17)AGE OF FIRST FEwh PRODUCTIONSIG. (2-TAILED)BOOTSTRAP (BCA) 95% CONFIDENCE INTERVAL

Mean–.689*.040**–.944/–.265

wh- movement–.683*.062**–.954/–.294

T-C movement–.593*.093**–.903/–.028

do- support–.600*.088**–.946/–.099

A- movement–.779*.013**–.985/–.320

* Significant at the adjusted p < .15 for small sample sizes in SLA.

** CI effect for a relationship among variables.

The negative effect sizes indicate that a learner’s earlier production of the FEswh shows strong, significant correlations with their later L2 accuracy of all related computational properties and these combined as a mean average. Taken together, these figures show that those learners who produce the FEswh for the first time at younger ages show a higher L2 accuracy rate of their associated computational properties at the end of the data collection period.

5.1.2. Frequency of FEwh production

Figure 4 shows a scatterplot of the learners’ frequency of FEwh production at the early ages (y-axis) and their mean L2 computational accuracy rates in all required contexts (calculated as a combined average between wh-movement, T-C movement, A-movement and do- support) at the end of the data collection period (x-axis, ages 16 and 17). The scatterplot shows a positive scope regression line, indicating linearity between a higher number of FEswh produced at the early ages and a higher L2 accuracy of their associated computational properties at the later ages.

Figure 4 

Scatterplot showing learners’ frequency of FEwh production at the early ages (10 & 12) and mean L2 accuracy of computational properties at later ages (16 & 17).

Correlations were run to investigate this relationship further, which compare frequency of FEwh production at the early ages with L2 accuracy at the later ages of each computational property individually and then these as a mean average. These are shown in Table 7. A learner’s higher number of FEwh productions at the early ages shows strong significant correlations with their later L2 accuracy of wh- movement, T-C movement, and the four computational properties as a mean average. Individually, A- movement and do- support show medium correlations, and fail to reach significance (p = .156, p = .263).

Table 7

Correlation coefficient between number of FEswh produced at the early ages (10 & 12) and L2 accuracy of computational rules at the later ages.


L2 ACCURACY (AGES 16 & 17)NUMBER OF FEwh PRODUCTIONS (AGES 10 & 12)SIG. (2TAILED)BOOTSTRAP (BCA) 95% CONFIDENCE INTERVAL

Mean.578*.123**.001/.967

wh- movement.615*.104**.143/.968

T-C movement.520*.150–.168/.966

do- support.418.263–.199/.969

A- movement.515.156–.0.95/.945

* Significant at the adjusted p < .15 for small sample sizes in SLA.

** CI effect for a relationship among variables.

Taken together, learners’ higher L2 accuracy of the FEs’wh associated computational properties at the later ages (16 and 17) correlates with a younger age of first FEwh production and a higher number of FEwh productions at the early ages (10 and 12). Note that this relationship between learners’ FEwh use and L2 accuracy of associated computational properties seems to be developmental; that is, we find a clear linearity between learners’ differing use of these expressions at the early stages of data collection and differing L2 accuracy rates at the later stages. For example, if we count learners’ individual FEwh productions across the entire data collection period (across ages 10, 12, 16 and 17), and then compare these differing frequencies with their L2 computational accuracy rates at the later ages, we find no relationship. Instead, when analysing these variables, Figure 5 shows a scatterplot with a relatively flat regression line, and Table 8 shows that overall frequency of FEwh production across the four rounds of data collection shows no correlation with later L2 accuracy of any associated computational property individually or these as a mean average.

Figure 5 

Scatterplot showing learners’ frequency of FEwh production across all ages and mean L2 accuracy of computational rules at later ages (16 & 17).

Table 8

Correlation coefficient between total number of FEswh produced across all ages and L2 accuracy of computational rules at the later ages (16 & 17).


L2 ACCURACY (AGES 16 & 17)TOTAL NUMBER OF FESwh PRODUCED (ACROSS ALL AGES)SIG. (2-TAILED)BOOTSTRAP (BCA) 95% CONFIDENCE INTERVAL

Mean.009.982–.693/.663

wh- movement.062.883–.682/.840

T-C movement–.018.963–.817/.680

do- support.018.962–.689/–.665

A- movement.009.982–.711/.655

Therefore, a better L2 accuracy of associated computational properties seems to correlate specifically to a more frequent production of the FEswh at early stages of data collection, rather than a frequent production of the expressions across the entire data collection period. This is suggestive of a more developmental relationship between early use of these expressions and a better knowledge of related computational derivations.

5.2. Learners’ use of the FEswh and later knowledge of their schematic constructions

Moving now to test if the usage-based developmental sequence is applicable to the present dataset, we identified all learners’ L2 root interrogatives across the data collection period to see if they embodied the same schematic patterns/utterance schemas of previously produced FEswh, starting with learners’ wh-questions.

5.2.1. Wh-questions

As discussed previously, the FEswh have the potential to represent lexically and categorically specific wh-question utterance schemas and fully schematic patterns. Following the procedure outlined in Section 4.2, our usage-based analysis reveals that a total of 20 wh-questions are produced by all 9 learners across the data collection period. Out of these 20 wh-questions, 17 appear after an FEwh ontogenically in learners’ production data. Of these 17, 9 embody the same categorically specific utterance schemas of a previously produced FEwh, 3 of which also embody the same lexically specific utterance schemas and 4 of which show the same fully schematic patterns. This accounts for 53% of learners’ total wh-questions that follow FEwh use in the longitudinal data. An example is Learner 38, who produces ‘what’s your name?’ at age 12 and ‘why are you doing this kind?’ and ‘why are you doing this work?’ at ages 16 and 17 respectively, which all share the same utterance schema [WH + COPULA] + X. They also produce another FEwh erroneously at age 12- *’where you live?’- and seem to adopt this [WH + PRN] + X utterance schema which leads to an ungrammatical wh-question at age 16 ‘*what you wanna say?’. Their productions across the data collection period are presented in Table 9.

Table 9

Traceback methodology: Learner 38’s FEwh productions and L2 wh-questions.


#AGE 10AGE 12: wh-QAGE 12: FEwh AGE 16: wh-QAGE 16: FEwh AGE 17: wh-QAGE 17: FEwh

38NT what’s your name
  1. [what’s] + X
  2. [WH+COPUL A] +X
  3. [WH+COPUL A]+PossDET+ NOUN]
* where you live
  1. [where + you] + X
  2. [WH + PRN] + X
  3. [WH + PRN + VERB] + X
why are you doing this kind?
  1. [why + are] + X
  2. [WH + COPULA] + X
  3. [WH+COPULA +PRN+ VERB - ing+DET+NOU N]
*what you wanna say?
  1. [what + you] + X
  2. [WH + PRN] + X
  3. [WH + PRN + AUX + VERB]
what’s your name
  1. [what’s]+X
  2. [WH+COPULA] + X
  3. [WH+COPULA]+ PossDET+NOUN]
why are you doing this work?
  1. [why are] + X
  2. [WH+COPULA] + X
  3. [WH+COPULA+ PRN+VERB- ing+DET+NOUN ]
where do you live
  1. [where do] + X
  2. [WH+AUX DO] + X
  3. [WH+AUX DO + PRN +VERB]

5.2.2. Yes/No questions

As well as the wh-question utterance schemas presented above, the FEswh have the potential to represent lexically and categorically specific ‘yes/no-question utterance schemas’. A total of 23 yes/no questions are produced by all 9 learners across the data collection period. Of these 23 yes/no questions, 21 follow an FEwh in learners’ data ontogenically, out of which 11 embody the same categorically specific utterance schemas as a previously produced FEwh (53%). All 11 of these yes/no questions also share the same lexically specific utterance schemas as the FEswh. An example is Learner 18, who makes use of the [are you] + X utterance schema in ‘are you studying’? at age 17 after producing the FEwhhow old are you?’ at age 12. They also produce the erroneous FEwh*’what do you live?’ at age 12 and continue to produce five yes/no questions with the [do you] + X utterance schema at ages 16 and 17, including ‘do you like your job?’, ‘do you live in Barcelona?’ and ‘do you have any brothers or sisters?’. Further evidencing productive use of this utterance schema is their overextension of such in the ungrammatical ‘*do you born in Spain?’. Their production data is shown below in Table 10.

Table 10

Traceback methodology: Learner 18’s FEwh productions and L2 yes/no questions.


#AGE 10: Y/N QAGE 10: FEwh AGE 12: Y/N QAGE 12: FFwh AGE 16: Y/N QAGE 17: Y/N Q

18 do you have any pets?
  1. [do you] + X
  2. [AUX DO + PRN] + X
do you like pets?
  1. [do you] + X
  2. [AUX DO + PRN] + X
what’s your name
  1. [is your] + X
  2. [COPULA+ PossDET] + X
how old areyou
  1. [are you] + X
  2. [COPULA + PRN]+X
*what do you live
  1. [do you] + X
  2. [AUX DO + PRN] + X
do you born in Spain?
  1. [do you] + X
  2. [AUXDO + PRN] + X
do you like your job?
  1. [do you] + X
  2. [AUX DO + PRN] + X
are you studying?
  1. [areyou] + X
  2. [COPULA + PRN] + X
do you live in Barcelona?
  1. [do you] + X
  2. [AUX DO + PRN] + X
do you live on your own?
  1. [do you] + X
  2. [AUX DO + PRN] + X
do you have any brothers or sisters?
  1. [do you] + X
  2. [AUX DO + PRN] + X

To summarise, 53% of L2 (20/38) interrogatives (both wh- and yes/no) produced by all learners under analysis can be traced back to utterance schemas of previously produced FEswh. All these utterance schemas are categorically specific to preceding FEswh (20/20), 70% are lexically specific (14/20) and 44% of learners’ wh-interrogatives (4/9) share the same fully schematic patterns. The most productive wh- question utterance schema is [WH + COPULA] + X (4/9) and the most productive yes/no question utterance schema is [do you] + X (9/11). No learner’s total L2 interrogatives can be linked back to utterance schemas of previously used FEswh.

6. Discussion

In Section 5.1 we adopted a generative model to address research question (i), finding that higher L2 accuracy rates of the FEs’wh associated computational properties at the end of the data collection period correlates with a younger age of first FEwh production and a higher number of FEwh productions at the early ages. This supports the trends observed in Hammond and Gil (), whereby those learners who interacted more with the FEswh were quicker to move from VP-TP-CP based grammars. In Section 5.2, we adopted a usage-based schematic model to address research question (ii) and discovered that 53% of learners’ L2 interrogatives can be traced back to utterance schemas of previously used FEswh ontogenetically in their spoken transcripts. This supports those longitudinal usage-based studies who have been able to trace back productive use of complex L2 utterances to model formulaic exemplars in learners’ production data.

The discussion now compares how each model can account for the observed L2 development over the longitudinal data collection period, and argues that the most comprehensive description is achieved by combining the results of both analyses.

6.1. Fixed wh- expressions: Databases of computational properties or schematic patterns?

Both generative and usage-based analyses of the longitudinal data can distinguish a relationship between learners’ use of identified FEswh and associated L2 syntactic development, which highlights the central role that formulaic language can play in L2 development. It can be said that conceptualising the FEswh as databases for the acquisition of more general associated computational properties can account for a larger range of corresponding L2 development, rather than limiting the expressions to databases for the acquisition of L2 interrogative utterance schemas only. This is somewhat unsurprising, given that these properties have the potential to manifest via a larger range of related surface structures. For example, a gradual acquisition of the underlying syntactic mechanisms necessary to construct interrogatives in the L2 can account for 100% of learners’ L2 interrogatives across the corpus, including the 47% that constitute different utterance schemas than those of the FEswh. An acquisition of the FEs’wh computational properties can also, of course, account for grammatical L2 utterances outside of learners’ interrogatives. For example, an acquisition of the feature specifications necessary to constrain wh-movement in the L2, as influenced by early and frequent FEwh use, is also exemplified by learners’ comparative use of relative clauses and interrogative complement clauses. Table 11 shows that the only learners who produce these structures in the L2 are those that show early FEwh usage.

Table 11

Learners early FEwh use and later L2 productions of relative and interrogative complement clauses.


LEARNERAGE 10: FEAGE 12: FEAGE 16: RELATIVE/INTERROGATIVE COMPLEMENT CLAUSEAGE 17: RELATIVE/INTERROGATIVE COMPLEMENT CLAUSE

2

5what’s your name

7what’s your namewhat’s your name how old are you where do you liveI need some food and somebody who play the music

13

18what’s your name how old are you *what do you live when I have homework I do the homework and [\] and go [\\] or watch TV
when the mother is [/] (.) is (.) telling <what *is the>[/] (.) what *is the [/] (.) the *street they are [//] (.) have <to to> [/] to go the dog (.) came to [//] <into the> [/] into the basket>
when they (.) arrive in the mountain they have a surprise that (.) the dog *(.) eat <all the> [/] all the (.) food
and when they: [\] they go to eat the breakfast the dog they see that the dog *eat [\\] eats the: [\]

27no transcript

38no transcriptwhat’s your name where you live

42no transcript

47how old are you what’s your nameno transcriptswhen the sister and brother arrive to the [\] to the forest they looked that her dog mm *_was [\] was appeared

However, utterance schema extraction and generalisation based on previous FEwh use is clearly a productive learning strategy, as this can account for over half of learners’ total interrogatives produced in the L2 across the corpus. Therefore, the most unified account of the observed syntactic development must incorporate this strategy within the development of associated underlying syntactic mechanisms more generally. Section 6.2 now discusses some theoretical concepts which are compatible with this combination of results derived from both approaches.

6.2. The interaction of usage-based and generative approaches to SLA

We posit that the usage-based notion of utterance schema extraction and generalisation can facilitate the acquisition of the underlying computational properties for which their surface forms exemplify. The FEswh for all learners are first produced in advance of associated L2 competence, so must be taken as memorised products of holistic retrieval via working/phonological memory. This is also an indication that the FEswh constitute learners’ intake rather than input (), as they are the expressions that learners rely on upon functional contextual cues. At these initial stages, the FEswh as recalls from working memory are analogous to what some models of L1/L2 acquisition term ‘perceptual intake’ () or ‘perceptual output structures’ (). Importantly, when processing these perceptual strings, learners construct an associated linguistic representation which contains information about the L2 syntactic feature specifications. Thus, an increased interaction with the FEswh may quicker engender a restructuring of learners’ L1 grammar based on this new L2 linguistic information, as they are better exposed to this in model form.

It follows that if learners can extract utterance schemas from prototypical formulaic exemplars (via general cognitive means) and extend these to similar functional structures, it allows them to interact with more surface forms which exemplify the same L2 linguistic information, leading to a better identification of the abstract representations realised in these surface forms of L2. In our data, utterance schema extraction and generalisation has likely facilitated the production of a large proportion of L2 interrogatives (53%), which exemplify the L2 functional categories and related computational properties necessary to construct interrogatives in L2, irrespective of their specific schematic surface forms ([WH + COPULAR, WH+AUX DO] etc.). With every interaction, or more specifically, with every use of these complex L2 surface forms, the learner is better equipped to make inferences about the underlying grammar that generated them, based on their pre-existing (UG-based) knowledge of the computational component. This concept of increased interaction and usage of L2 surface forms is compatible with Paradis’s () notion of ‘practicing’. They state:

“The repeated practicing of a target form may eventually lead to the internalization of the implicit computational procedures that result in the automatic comprehension and production of that form. It is not the instruction and resulting knowledge that affect competence, but the extra practice provided by the use of the correct form.”

We believe that utterance schema extraction and generalisation is best analysed as a productive learning strategy that can lead to the further, repeated ‘practicing’ of complex L2 surface forms, allowing for a quicker restructuring of corresponding L2 linguistic representations in the computational component. Utterance schema extraction alone based on previous FEwh use cannot account for all L2 question forms in our corpus, nor can it account for the development of related syntactic phenomena outside of interrogatives. This strategy is best interpreted as facilitating the gradual acquisition of related underlying computational properties and corresponding feature specifications more generally. It is in this scaffolding, facilitative sense that we propose the usage and deconstruction of classroom input-derived formulaic expressions can interact with the development of modular syntactic knowledge.

Finally, a note on the limitations of the data is in order. It is necessary to reiterate that the BELC is only a snapshot of these learners’ L2 capabilities at particular points in time. As for all corpus studies, a learner’s production of a specific form at these recorded intervals may not necessarily reflect their L2 competence and, as such, the absence of a form does not entail a learner’s lack of knowledge (; ). Similarly, learners will likely have been exposed to other holistically taught prototypical expressions in their EFL classroom that did simply not surface in their transcripts. However, the aim of this paper is to account for the developmental trends that are observable in the available production data, using both generative and usage-based frameworks. The identified FEswh are clearly salient, as all learners are shown to produce these same expressions upon the same contextual cues, and initially in advance of associated L2 competence. No other formulaic material was identifiable in the transcripts alongside these expressions. This salience in learners’ production data, along with their inherent prototypicality and functionality, place the FEswh as prime candidates for acquisitional seeds under all usage-based accounts, and we observe a clear relationship between early and frequent use of these expressions and learners’ corresponding L2 development.

7. Conclusion

This study has adopted a novel approach by combining usage-based and generative analyses of longitudinal learner production data to discover an effect of formulaic expressions on L2 syntactic development. We have argued that positing the FEswh as databases for the underlying computational properties/L2 feature specifications for which they exemplify can account for a large range of corresponding syntactic development, and that utterance schema extraction and generalisation is a productive learning strategy that is likely facilitative on this process. More generally, we believe that the study of FEs and their relationship with syntactic development is an ideal test ground for the integration of usage-based and generative models of SLA, which can help to better understand the interplay between input, usage and modular linguistic knowledge.

Additional File

The additional file for this article can be found as follows:

Appendix 1

All learners’ L2 accuracy rates of computational properties at the later ages. DOI: https://doi.org/10.22599/jesla.100.s1