This project examines the inventory of recurrent word-combinations and formulaic language in corpora of native and non-native English speech, inspired by the ‘corpus-driven recurrent word-combinations’-approach presented in Altenberg (1998) and De Cock (2004). The analysis draws on usage-based theories which considers recurring patterns of language to be reflective of fundamental properties of language competence. It further considers how we may best identify and describe recurring language patterns and their functions in naturally occurring speech, with a particular focus on learner language.
The primary source of material for this study is one native speaker corpus, the Louvain Corpus of Native English Conversation (LOCNEC), and two subcorpora of the non-native speaker corpus LINDSEI (the Louvain International Database of Spoken English Interlanguage), which contain speech produced by Swedish and Norwegian advanced learners of English.
The study shows some of the strengths and weaknesses of employing a hypothesis-finding, corpus-driven approach to the identification and description of formulaic language. It confirms the pervasiveness of recurrent language in both native- and non-native speech, and presents quantitative results showing how particular word-combinations are under- and overrepresented in the learner material as compared to the native speaker corpus. The more qualitatively grounded discussion draws on concepts derived from cognitive linguistics in explaining how recurrent patterns of words occur, function and change, and thus aims to position quantitative findings from corpus linguistics within this theoretical framework.