Predicting the future using deep learning is a research field of increasing interest. The majority of contributions concern architectural designs for predictive models, however, there is a lack of established evaluation methods for assessing their predictive abilities. Images and videos are targeted towards human observers, and since humans have individual perceptions of the world, evaluation of videos should take subjectivity into account. With the absence of appropriate evaluation methods, measuring the performance of predictive models and comparing different model architectures is challenging. In this thesis, I present a protocol for evaluating predictive models using subjective data. The evaluation method is applied in an experiment to measure the realism and accuracy of predictions of a visual traffic environment. These predictions are generated by a proposed model architecture, which produces discrete latent representations of the environment. Application of the evaluation method reveals that the proposed deep learning model proves to be capable of producing accurate predictions ten seconds into the environment’s future. The predictive model is also shown to be robust in terms of processing different image types for describing the environment. The proposed evaluation method is shown to be uncorrelated with the predominant approach for evaluating predictive models, which is a frame-wise comparison between predictions and ground truth. These findings emphasise the importance of using subjective data in the assessment of predictive abilities of models, and open up a new alternative of evaluating predictive deep learning models.