Adaptive context encoding module for semantic segmentation

Wang, Congcong; Alaya Cheikh, Faouzi; Beghdadi, Azeddine; Elle, Ole Jacob

Journal article; AcceptedVersion; Peer reviewed

View/Open

1907.06082.pdf (651.4Kb)

Year

2020

Original version

IS&T International Symposium on Electronic Imaging Science and Technology. 2020, 2020 (10), 27-1-27-1-27-7, DOI: https://doi.org/10.2352/ISSN.2470-1173.2020.10.IPAS-027

Abstract

The object sizes in images are diverse, therefore, capturing multiple scale context information is essential for semantic segmentation. Existing context aggregation methods such as pyramid pooling module (PPM) and atrous spatial pyramid pooling (ASPP) employ different pooling size

or atrous rate, such that multiple scale information is captured. However, the pooling sizes and atrous rates are chosen empirically. Rethinking of ASPP leads to our observation that learnable sampling locations of the convolution operation can endow the network learnable fieldof- view, thus

the ability of capturing object context information adaptively. Following this observation, in this paper, we propose an adaptive context encoding (ACE) module based on deformable convolution operation where sampling locations of the convolution operation are learnable. Our ACE module can

be embedded into other Convolutional Neural Networks (CNNs) easily for context aggregation. The effectiveness of the proposed module is demonstrated on Pascal-Context and ADE20K datasets. Although our proposed ACE only consists of three deformable convolution blocks, it outperforms PPM and

ASPP in terms of mean Intersection of Union (mIoU) on both datasets. All the experimental studies confirm that our proposed module is effective compared to the state-of-the-art methods.