Productive collocation knowledge and advanced CEFR-levels in Swedish as a second language: A conceptual replication of Forsberg Lundell, Lindqvist & Edmonds (2018)

This study constitutes a conceptual replication of Forsberg Lundell et al. (2018), who investigated whether productive collocation knowledge – a linguistic feature known to be indicative of high-level L2 proficiency – developed between the B2 and C1 levels of the Common European Framework of Reference for Languages scale in second-language (L2) French. The results showed significant development. The present study set out to replicate these findings in L2 Swedish, in order to investigate whether the reported development would stand cross-linguistic validation. To this end, a test of productive collocation knowledge in L2 Swedish was developed based on 22 separate computerized newspaper corpora of Swedish, searchable via the corpus tool Korp at SprakbankenText (Swedish Language Bank). The method of the item selection was identical to that of the Forsberg Lundell et al., but the replication could only be conceptual since the reference corpora are different, from different languages. The test was conducted comparing participants from B2 and C1 levels in Swedish (N = 60). The results replicated the original study, confirming a significant difference in productive collocation knowledge between the B2 and C1 levels. Furthermore, in addition to the replication, the study explored frequency and Mutual Information score as potential factors for collocation item difficulty. The study found no significant effects for frequency or for Mutual Information score. Finally, the impact of cross-linguistic similarity was also investigated, grouping the results for participants with Germanic and non-Germanic first languages. This analysis did not point to any noteworthy effects.


INTRODUCTION
The present study covers a phenomenon that has been known to be one of the most characteristic features of high-level second-language (L2) proficiency: Formulaic language (e.g., Erman et al., 2018). Formulaic language comes in many different shapes (Schmitt & Carter, 2004) and the object of the present study is the sub-category of collocations, one of the most frequently occurring categories of formulaic language (Cowie, 1992). Collocations are combinations of words that occur frequently together in a given language such as, for example, keep a secret (for an exact definition in the present study, see section 2.1.). Knowledge of these conventionalized word combinations is known to be crucial for communicative competence, receptive and productive fluency and idiomaticity in a second language. At the same time it develops late, in both a first language (L1) and an L2, and also presents some of the biggest challenges, especially productively, for L2 learners (Eyckmans, 2009;Henriksen, 2013;Nesselhauf, 2003Nesselhauf, , 2005Vasiljevic, 2014;Wray, 2002). In the words of Nesselhauf (2003, p. 223), "collocations are of particular importance for learners striving for a high degree of competence in the second language, but they are also of some importance for learners with less ambitious aspirations, as they not only enhance accuracy but also fluency".
Knowledge, processing and production of collocations in an L2 is affected by different factors. One of the most researched factors is that of cross-linguistic influence. Research on different modalities has shown that L1 and L2 congruence in collocations confer processing and/or acquisition advantages (Lesniewska & Witalisz, 2007;Peters, 2016;Wolter & Gyllstad, 2011). From the perspective of corpus linguistics, studies have also shown that L1 transfer is a frequently occurring phenomenon in the written production of L2 collocations (e.g., Paquot, 2014;2017). Other factors that have been shown to affect the use and processing of collocations are those of frequency and Mutual Information (MI) scores (Durrant & Schmitt, 2009;Simpson-Vlach & Ellis, 2010). MI score is a measure derived from information theory that quantifies the mutual dependency between two variables. The MI score uses a logarithmic scale to express the ratio between the frequency of the collocation and the frequency of random co-occurrence of the two words in the combination (Church & Hanks, 1990).
Yet another factor is collocation transparency. Previous research by Revier (2009) suggests that knowledge of different collocation types, according to degree of semantic transparency, develops differently over time. The study indicated that learners in earlier stages of development have a significantly greater knowledge of transparent collocations, where both constituents are used in their literal sense (e.g., make tea), than of less transparent collocations (e.g., make trouble). At the highest proficiency level in the study, the overall collocation knowledge had increased significantly and, moreover, the difference between knowledge of transparent collocations and what Revier calls semi-transparent collocations had decreased and was no longer statistically significant (Revier,p. 135).
In view of the importance of collocations for L2 proficiency, Forsberg Lundell et al. (2018) set out to create a collocation test of productive collocation knowledge in L2 French. The main aim of the study was to investigate whether it was possible to discern a development of productive collocation knowledge between the B2 and the C1 levels of the Common European Framework of Reference for Languages (CEFR). The CEFR has had considerable influence on language teaching and assessment, especially in the European context. The CEFR is characterized by a functional theory of language proficiency, containing a number of Can Do-statements for each proficiency level. However, the scale also contains linguistic descriptors for different linguistic domains such as vocabulary and grammar. Although the scale is widely used, Hulstijn (2007) points out that the levels described in the CEFR need to be taken with some caution. This is because the empirical foundation of the CEFR is not evidence from developmental studies but rather relies on experienced teachers' assessments and practice (i.e., teachers' perceptions of what learners at different levels are capable of). In other words, many experienced teachers would probably agree on the fact that idiomatic features of language are characteristic of the C1 level, but this is not based on empirical research that has compared B-level and C-level learner performance .
Mastery of target-like formulaic language, according to CEFR descriptors, is relegated largely to the very highest levels of proficiency (i.e., C1 and C2), with the exception of unanalyzed chunks, which are supposedly used at the very early stages of acquisition. Therefore, we find numerous references to the use and comprehension of formulaic language starting from the C1 level, including "a good command of idiomatic expressions and colloquialisms" (Council of Europe, 2001, p.112). This presentation suggests that such mastery characterizes only the highest levels of proficiency and says nothing about the potential development of phraseological knowledge prior to the C1 level.
To this end, Forsberg Lundell et al. (2018) created a productive collocation test in L2 French, tested with 152 participants, including L1 and L2 speakers. The final version was tested on 47 L2 speakers in France, from varying L1 backgrounds. The results showed that there was a significant development of productive collocation knowledge between these levels, confirming that knowledge of collocations continues to develop at the most advanced levels.
The present study constitutes a conceptual replication of this study, using the same methodology for item selection and testing, but focusing on another language, Swedish, which also implies that the database for item selection will necessarily be different. In labelling this study a "conceptual replication", we are following the recommendations put forward by Marsden et al. (2018). Based on a systematic review of 67 replication studies (Marsden et al. p. 340), the authors suggest a clear distinction between three types of replication studies to make the labelling in publications more consistent and relationships between initial studies and replications more transparent: Direct replications make no intentional change to the initial study and seek to confirm methods, data and analysis; partial replications introduce one principled change to a key variable in the initial study to test generalizability in a clearly defined way; and conceptual replications introduce more than on change to one or more significant variables (p. 366-367).
Conceptual replications can also extend their aim in relation to the initial study (ibid, p. 366), which has been done in the current article.
Through the development of a test of productive collocation knowledge in L2 Swedish following the same methodology as Forsberg , we seek the answer to this research question: Is there a progression in productive collocation knowledge between the B2 level and C1 level of the CEFR in L2 Swedish, just like in L2 French?
In addition to replicating Forsberg , which is the main aim of the present study, the paper also endeavours to explore one of the suggestions made in the initial study: "Is there a hierarchy in terms of collocation difficulty? If yes, can such a hierarchy be explained by factors such as frequency or transparency?" (Forsberg Lundell et al.,p. 644). In the present study, we will therefore look into the impact of frequency and MI score on collocation difficulty, leaving the factor of transparency for future studies. Furthermore, cross-linguistic similarity will also be explored, since cognateness has proved to have an impact on collocation difficulty (cf. Peters, 2016;Wolter & Gyllstad, 2011).
Besides responding to a theoretical interest in the linguistic correlates of the CEFR, the study aims at providing a test that could be useful in language assessment and language teaching of L2 Swedish. In addition to its importance for fluency and idiomatic language use in L2 acquisition, collocation knowledge plays an important role in genre and domain specific language and has been described as characteristic for academic writing that aims at "clarity, precision and lack of ambiguity" (Henriksen & Westbrook, 2017, p. 34; see also Howarth, 1998). This suggests a need for research on collocation knowledge at higher proficiency levels (e.g., in order to develop teaching materials and assessment tools that can be of use in preparing L2 students of Swedish for the linguistic demands of higher education and academia). The present study includes a few extensions compared to the replicated study, examining factors affecting collocation difficulty. This will add to our general understanding regarding the acquisition of collocations.

METHODOLOGY: DEVELOPING THE INSTRUMENT FOR THE L2 SWEDISH TEST
In this section, we provide details on the development of the collocation test in L2 Swedish. The methodology for test development in Forsberg  was first developed in Forsberg Lundell and Lindqvist (2014) and then modified. Based on the experiences from these two studies, instrument development could be synthesized into three distinct phases in the current study. Accordingly, the instrument development procedure for the Swedish test can be considered more efficient, since some steps that were explored previously were not used in the final analysis. These steps could be discarded because they proved not to be necessary in the item selection process in the end. The three phases to be included were:
Testing with L1 speakers (in order to ascertain relevance for high-level L2 proficiency)

Testing with L2 speakers
These three phases will now be accounted for below.

ITEM IDENTIFICATION IN CORPORA
The items for the first version of the test were selected according to the following three steps. First, a list of the 3000 most frequent words, based on 22 separate newspaper corpora in the Swedish language bank (https://spraakbanken.gu.se) was extracted. The corpora included material from a variety of Swedish newspapers between 1965 and 2011, comprising about 288 million tokens in total. Based on this list, 150 frequent nouns were randomly chosen (see Forsberg Lundell & Lindquist, 2014). The next step was to search for verbs collocating with the 150 nouns in the same 22 newspaper corpora from which the first 3000 words were extracted. Third, in order to limit the material to a reasonable number of relevant test items, similar selection criteria for suitable items as in Forsberg Lundell et al. (2018) were applied (i.e., the MI score for the verb-noun combinations had to be 3 1 or higher for the item to be considered a collocation and accordingly a relevant test item). The raw frequency of a relevant item had to be at least 288 in the searched material (minimum 1 occurrence per one million words). The selection process resulted in a final list of 67 items with a range of MI scores between 2.73 and 9.64 and a frequency range between 301 and 16,487. These were the 67 test items comprising the pilot test.
The test format was the same as in Forsberg Lundell et al. (2018), which was a fill-in-the-gap test based on Mizrahi and Laufer (2010). In this test, the participants were asked to supply the verb; the first letter of the verb was provided, in order to not open up the possibility for too many alternatives. Examples are provided in (5-7).
'This is why Swedish couples travel abroad, where women are allowed to g birth (to a child) for somebody else.'
'GP [Göteborgs Posten] is the first of the foreign media getting a chance to a__________ a question at the press conference.'
'At a press conference he refused to a__________ a question from a female BBCreporter just because she was a woman.' To ensure the collocations were presented in an authentic context in the test, we conducted corpus searches in the newspaper material searchable via the search tool Korp (https://spraakbanken.gu.se/ korp). We chose sentences that provided an unambiguous context for the collocation in question and that were regarded to be at an adequate level of difficulty for the target group.

TESTING WITH L1 SPEAKERS OF SWEDISH
The test was administered among 59 university students at the Universities of Stockholm and Gothenburg. The participants were all L1 speakers of Swedish. The main aim of testing L1 speakers was to eliminate items from the test that were not clearly recognized as collocations by L1 speakers of Swedish. We therefore excluded all of the items from the test that were not recognized as a collocation by at least 90% of the L1 Swedish participants. This means that all items that were answered in the expected way by fewer than 54 participants were eliminated from the test. However, we allowed for some variation in the answers, as long as the variant could still be confirmed as a collocation (based on an MI score > 3) according to the corpus material, like in example (8) The testing with L1 Swedish speakers resulted in a remaining list of 39 collocations, meaning that 28 of the 67 items in the pilot were recognized as a collocation by less than 90% of the participants and were consequently discarded. These 39 items were included in the test that was distributed to a group of L2 speakers of Swedish, the target population for the present test.

TESTING WITH L2 SPEAKERS
The test was distributed in pen-and-paper format. The first part of the test that the participants were asked to fill out was a short demographic survey, providing information on their sex, year of birth, time of residency in Sweden, educational background, spoken languages, use of Swedish and the type of instruction in Swedish they had received. The second part was a CEFR test designed by the Folkuniversitetet (Swedish open university). Folkuniversitetet has developed standardized placements tests for the CEFR levels. It is a test that assesses lexical and grammatical knowledge. The first part of the test includes a multiple-choice, fill-in-the gap test targeting word order, prepositions, temporality, connectors and pronouns. The second part tests word knowledge and morphology (i.e., converting a noun into an adjective). The third part is a cloze test and the last section targets the past tense of verbs. This test was chosen because tests developed according to the same principles exist both for French and Swedish, allowing for the same design as in Forsberg Lundell et al. (2018). It has also been used earlier in SLA research (Falk et al., 2015). Finally, the third part of the test consisted of the 39 items described in section 2.2.
Thus, the participants were instructed to first take the CEFR level test and then the collocation test. In some cases, a researcher was present while a bigger group of participants took the test. In other cases, the test was distributed individually by mail, with instructions on how to take the test, which specified that participants should not receive help from a third party or any other resources, such as dictionaries.

L2 PARTICIPANTS
The target group for the current study were L2 speakers with different L1s at proficiency levels of Swedish corresponding approximately to the CEFR levels B2 and C1. The authors searched for potential participants within their different networks of university students and university employees and within other groups to whom they had access. The search for participants was also posted on both authors' Facebook accounts and to the second author's Twitter account to reach as many potential participants as possible. In this way, a final participant group of 60 participants was included (compared to 47 participants in Forsberg Lundell et al. 2018.). As shown in Table 1, the group was composed of 37 female and 23 male participants, with a mean age of testing of 40.4 years (SD = 8.8). Their average length of Swedish studies was 5.4 months (SD = 9.9), and their length of residence was 10.7 years (SD = 7.5).

RESULTS
In this section, we provide a summary of the key results in Forsberg , followed by an account of the results of the present study.

KEY RESULTS IN FORSBERG LUNDELL ET AL. (2018)
The participants in Forsberg  were divided into two groups, based on their scores on the CEFR test designed by the Folkuniversitetet (see Table 2). There were 26 participants in the B2 group and 21 participants in the C1 group. The mean collocation score was 19.7 (SD = 4.7) in the B2 group and 25.7 (SD = 4.1) in the C1 group. The maximum score on the test was 30. The difference observed between the two groups on the collocation test, based on a t-test, was significant: p < 0.0001, t = −4.64. This suggested that productive collocation knowledge, as measured by this test, showed a significant progression between B2 and C1 levels. A Pearson correlation analysis also yielded a significant, moderate correlation between the two variables of collocation score and CEFR level (r = 0.647, p =< 0.001). With respect to the test's reliability, we also conducted a Cronbach's alpha analysis. Internal consistency is a frequently used as a measure of reliability. The Cronbach's alpha analysis showed that the test demonstrates a good internal consistency (α = 0.844). Just as in Forsberg Lundell et al. (2018), the participants were separated according to their designated CEFR level (see Table 3). There were 22 participants in the B2 group and 38 participants in the C1 group. The mean collocation score was 27.27 (SD = 6.08) in the B2 group and 35.53 (SD = 2.45) in the C1 group, the maximum score of the test being 39. The standard deviation was higher in the B2 group than the C1 group, suggesting that there was greater variability in collocation knowledge among the less proficient participants. Just as in Forsberg Lundell et al., the difference between the B2 group and the C1 group was significant (p =< 0.00001, t = 7.4277). A Pearson correlation analysis yielded a significant, strong correlation between collocation test result and CEFR level (r = 0.6982, p =< 0.00001). The correlation was moderate in Forsberg Lundell et al. (2018), but it was strong in the present study, possibly due to the larger pool of participants and somewhat larger pool of test items. A Cronbach's alpha was calculated for the Swedish test, in order to ascertain the test's reliability. The Cronbach's alpha analysis showed that the test, demonstrates a good internal consistency (even somewhat better than the French test), bordering on excellent (α = 0.88).

THE IMPORTANCE OF FREQUENCY AND MI SCORE FOR COLLOCATION DIFFICULTY
In order to follow one of the suggestions made in Forsberg Lundell et al. (2018), we decided to investigate the relationship between the (absolute) frequency of the items in the corpora and   their relative difficulty. Collocation difficulty was operationalized as the number of speakers providing a correct answer to an item (according to the definition on item correction in 2.2.). As stated above, the test contained 39 items, with a total of 60 participants. Correct answers per item had a range from 18-59 (M = 49.8, SD = 8.99). A Pearson correlation test was conducted between the number of speakers who provided a correct answer on an item and the absolute frequency of the item. This analysis did not yield a significant correlation (r = 0.1105, p = 0.503064).
Furthermore, the relationship between collocation difficulty and MI score was investigated. MI scores for the 39 items had a range of 2.73-9.64 (M = 5.51; SD = 1.67). A Pearson correlation test was conducted between the number of speakers who provided a correct answer on an item and the MI score. Again, the analysis did not yield a significant correlation (r = 0.1652, p = 0.31488).

THE IMPORTANCE OF CROSS-LINGUISTIC DIFFERENCE/SIMILARITY FOR COLLOCATION DIFFICULTY
As stated above, the participant group is linguistically relatively heterogeneous. The 18 different L1 groups represented differed in size, which makes it difficult to compare the results between these groups. To be able to say something about cross-linguistic influence on the test results, we therefore compared the results for the participants with Germanic languages as their L1 (thus more similar to Swedish) with those for the participants with non-Germanic L1s. Table 4 shows the mean of test scores for the participant group with Germanic L1s compared with the group with non-Germanic L1. As we can see, the group with the Germanic L1s has slightly higher scores (M = 33.63, SD = 5.48) than the group with the non-Germanic L1s (M = 31.27, SD = 5.85), a rather small difference which can hardly be seen as any clear evidence of an effect of cross-linguistic differences/similarities in the results of our study. The table also shows that the two groups differ regarding the distribution of CEFR levels. In the Germanic L1 group, 74% of participants qualified as C1 in the initial placement test, whereas 55% of the participants in the non-Germanic L1 group were placed in the C1 group. To get a better picture of the difference in results between the two groups, we divided them according to CEFR levels, which is shown in Table 5.
In both L1 groups, the C1 participants scored higher than the participants at the B2 level. In addition, the mean for the test scores on B2 and C1 level respectively were similar. For the B2 level, the mean score for the participants in the Germanic L1 group was 27.29 (SD = 7.18) and the mean score for the participants in the non-Germanic L1 group was 27.33 (SD = 6.23). For the C1 level, the mean scores for the Germanic L1 group and non-Germanic L1 group were 35.85 (SD = 2.21) and 34.56 (SD = 2.73), respectively. Table 5 illustrates the similarities between the L1 groups. The test scores for the B2 and C1 respectively are on a similar level, with a similar progression between CEFR levels. Looking at the standard deviation, there is a higher degree of individual variation at the B2 level in both L1 groups compared to C1 level. This difference in standard deviation between the B2 group (SD = 6.08) and the C1 group (SD = 2.45) is also shown in   the two L1 groups can be interpreted as proficiency level constituting an important factor for collocation knowledge. In other words, collocation knowledge appears to be a distinguishing factor between proficiency levels, whereas the L1 differences of the participants in our data do not seem to play a major role for the participants' test scores. These results are in line with the findings of Forsberg Lundell et al. (2018), which showed a significant difference in collocation knowledge between participants that were placed at B2 and C1 level respectively according to the CEFR test.
The importance of cross-linguistic differences as a factor for the mastery of collocations on different proficiency levels will, however, have to be investigated further and more systematically, by collecting data from a more balanced sample of participants with different L1s, in order to draw general conclusions regarding this matter.

DISCUSSION AND CONCLUSION
This paper set out to replicate, to the extent possible between two different target languages, Forsberg Lundell et al.'s (2018) study on productive collocation knowledge at advanced CEFR levels in L2 French. The main finding from Forsberg Lundell et al. was that productive collocation knowledge is significantly different between the B2 and C1 level groups, with significantly better scores being seen in the C1 level group. The results from Forsberg Lundell et al. thus warranted cross-linguistic validation. They also concur with the descriptors in the CEFR, in that mastery of idiomaticity is considered characteristic of the C1 level. It should, however, be noted that mastery of collocations is not an all-or-nothing affair with respect to progression from B2 and C1 level. C1 participants were likely to have a higher productive knowledge of collocations than B2 participants, but B2 participants also displayed extensive knowledge of collocations. This suggests that mastery of formulaic knowledge is also characteristic of the B2 level, an observation that is not evident from the CEFR descriptors. A most welcome step -regarding both L2 French and L2 Swedish -would be to investigate productive collocation knowledge at all CEFR levels, using the same test in order to examine its progression over CEFR levels.
The present study was conducted on a different language than the one investigated in Forsberg  and found almost exactly the same results, the only difference being that the correlation between collocation test score and CEFR level was stronger in the Swedish dataset. Furthermore, we also set out to explore some of the suggestions from Forsberg Lundell et al. One of those was to investigate the impact of item frequency on collocation difficulty. This analysis did not yield significant results (i.e., for the present test, frequency effects do not seem to play a prominent role). We also conducted an analysis to investigate the relationship between MI score and collocation difficulty, but this analysis did not yield significant results either. It could be relevant to take into consideration that frequencies and MI scores came from a written corpus. It is possible that the corresponding figures in an oral corpus could have an effect, since speakers are naturally exposed to both written and oral input, which will influence their productive collocation knowledge. Furthermore, looking into cross-linguistic differences potentially impacting the results, we divided the participants and their test scores into two groups according to whether or not their L1 was a Germanic language. This analysis indicated that CEFR level was more strongly associated with test scores in our data than cross-linguistic differences or similarities between Swedish and the participants' L1. This is in line with the results of our replication discussed above. At the same time our sample is not representative of different L1 groups and further research is needed in order to draw conclusions about the effect of L1 influence, in relation to proficiency level, on the mastery of collocations. In other words, this conceptual replication confirms the results of Forsberg Lundell et al., emphasizing the importance of collocation knowledge for high-level L2 proficiency (e.g., Eyckmans, 2009;Nesselhauf, 2003;Wray, 2002). However, it cannot confirm or contradict research that has shown that L1 influence and distributional properties such as frequency and MI score are important for the L2 acquisition and use of collocations (cf. Durrant & Schmitt, 2009;Simpson-Vlach & Ellis, 2010).
In view of the current study's findings, it would also be interesting, in the future, to conduct an analysis that focuses on semantic properties such as transparency (cf. Revier, 2009). This would add to our understanding of L2 collocation difficulty.