A benchmark for evaluating Deep Learning based Image Analytics

Ikram, Chaudhry Rehan

Master thesis

View/Open

imagemark_final.pdf (21.16Mb)

Year

2019

Abstract

Deep learning based systems are on the rise as they have shown tremendous potential to extract concealed patterns through the data. Today Deep learning systems are surpassing human-level vision capabilities, which leads to the widespread adoption of deep learning on image classification and object detection. There is a wide variety of hardware, deep learning frameworks, model architectures and algorithms to chose from if one wants to implement deep learning based system, but a fair apple to apple comparison to aid selection remains a considerable challenge. The study aims to serve as a guide for the selection of deep learning framework and object detection algorithm with appropriate backbone feature extractor, which provides the optimal speed and accuracy trade-off. The proposed solution ImageMark has two parts. First part is a classification benchmark which provides a comparative study of seven state-of-the-art GPU-accelerated deep learning software tools ( MXNet, Tensorflow, PyTorch, CNTK, MXNet(Gluon), Keras, Theano) by executing a Convolutional Neural Network workload over all of them. All of them provide almost similar performance in terms of accuracy while some differences in training and inference speed are observed.MXNet is the fastest in terms of training speed while Theano is the leader in terms of inference speed while all of the frameworks utilize the GPU very efficiently. The second part is an object detection benchmark component which is focused on the evaluation of state of the art object detection algorithms which we view as “meta-architectures” i.e. Faster-RCNN, RFCN and SSD using seven different base feature extractor CNN architectures for high-level feature extraction from the input image. We use a small, more practical dataset on road damage detection for the workloads. Faster-RCNN based object detectors generally provide better accuracy then RFCN based models, while RFCN based models, in turn, perform better than SSD based models.SSD based models provide high inference and training speed compared to RFCN and Faster-RCNN based models. We study speed-accuracy trade-off curve by keeping the hyper-parameters same across all models and apply multi-objective optimization to optimize speed and accuracy and present range of object detectors on Pareto front. Faster-RCNN based model with PNasNet base feature extractor achieves a highest mAP and F1-score but takes much more time to train and is impractical for scenarios which require high frame rate during inference. On the other extreme, we present SSD based model with Inception base feature extractor, which takes the least amount of time for training and inference and still provides decent accuracy.