An unmanned ground vehicle (UGV) that should operate autonomously on the road as well as in the terrain, is equipped with sensors that perceive the environment. The information from the sensors can be combined to create a high-level description of the scene. This thesis treats the computer vision task of semantic segmentation using the camera. The objective with semantic segmentation is to recognise objects and their spatial pixel-level extent in an image. The semantic representation of the scene provides the local path planner system with knowledge of obstacles and possible driveable surfaces. Convolutional neural networks (CNNs) have recently shown prominent results for a range of computer vision recognition tasks, including semantic segmentation. The success of deep learning is mainly driven by the availability of large datasets and parallel computation in compact embedded formats. Several large-scale datasets exist for vehicle-based scene understanding, however, they are limited to specific domains and typically only cover structured urban scenes. Moreover, the process of creating datasets for semantic segmentation requires a considerable amount of human effort. This project investigates techniques that try to reduce the work involved in annotating such datasets. To this end, a novel active learning framework is proposed and tested on the Cityscapes benchmark dataset. The proposed algorithm jointly train a convolutional neural network on the minority labelled training set, and regularly queries the human annotator to label a small sample of most informative images in the dataset. Moreover, a pseudo-labelling procedure is proposed to make the model utilise the unlabelled part of the dataset during training. The experiments suggest that careful selection of informative images for annotation benefit model performance when the labelled images span a relatively small proportion of the dataset. In the closing, several enhancements and extensions to this framework are proposed for future work.