Submission ID: deepjoker (7th place during the validation phase before 1st May, 2018)
One of the most prevalent architectures for binary semantic segmentation is U-Net, which fuses multi-level feature maps to hierarchically increase the spatial resolution of the output probability map. The encoder exploited in the original U-Net is built by plain neural units, while the feature representation capability of plain neural unit is overtaken by residual neural unit. More recently, a multi-layer dense block was proposed in DenseNet architecture, where each layer is connected to every other layer. Taking advantage of this type of connection, the forward feature propagation ability is strengthened, and the number of parameters can be reduced as the network going deeper. Inspired by this, we seek to investigate whether the U-Net encoders can be made of dense blocks and if an improvement of semantic segmentation can be achieved. To this end, as illustrated as follows, a U-Dense network is utilized. Its encoder part is mainly composed of four dense blocks and three transition layers, and the decoder part is constructed based on five plain neural units. To be specific, the dense block and transition layer are explained as follows.
Combination of DenseNet encoding block and U-Net architecture
Given the input images
Correspondingly, the joint loss function is formulated as:
where
Different from other kinds of images, remote sensing images always have large-scale spatial sizes; e.g.,
Learning curves of validation dataset