Abstract
With all the data available today the need to label and categorise data is more important than ever. There are many machine learning and deep learning models well equipped to classify data. However, to find the best model for a specific task is not simple. In this thesis, the goal is to compare deep learning and traditional machine learning on text classification. Deep learning is the new hype, and it seems to be the go-to solution for many problems. However, deep learning has its pitfalls and challenges the user might not be aware of. So, to see if deep learning is the silver bullet, a comparison will be made between the deep learning models, RNN and CNN, and the traditional machine learning models, SVM and Naive Bayes, on text classification. The comparison will be on classification performance, training time and setup complexity. To make the comparison a dataset will be made from football articles in VG.no and TV2.no, and each model will learn to classify paragraphs in these articles. The optimised models will be tested in an application on new data, and the performance will be analysed and compared. The conclusion is that with a limited dataset with few samples, the traditional machine learning models is the better alternative. The reason for this is that the classification performance is only slightly lower than the deep learning models, but the training time is much lower, the setup is more straightforward, and it is lower maintenance. However, for a larger dataset, the overhead with deep learning might be worth the extra performance.