Samlaren: tidsskrift för forskning om svensk och annan nordisk litteratur. 2019, 140, 281-304
This article presents the data-mining technique sub-corpus topic modeling and argues that it is well suited for capturing data that can be used for discourse analysis. Epistemological implications of digital methods and the related concept of ”data-mining” in the humanities are discussed based on Emely Apter’s 2017 essay ”Overburden.” ”Mining” as a metaphor may indicate that knowledge is regarded as something that is extracted and refined, as metals are from mines. In this case, data-mining may appear incompatible with the ”surface reading” on which discourse analysis depends. This article argues, however, that sub-corpus topic modeling, with some adjustments, particularly preserves the ”surface” of the research objects, which is demonstrated by means of a detailed, but accessible presentation of the method. Sub-corpus topic modeling uses themes modeled from a ”sub-corpus” to search for the same themes in a larger corpus. With the Norwegian National Library’s ”Bokhylla” (the ”Digital Bookshelf ”), a large number of texts are available digitally. As an example of how sub-corpus topic modeling can be used to capture data for discourse analysis, the Digital Bookshelf constitutes the larger corpus, while texts by the Swedish author Fredrika Bremer constitute the sub-corpora.