This paper aims at comparing 4 top models for crowd counting and evaluating their highlights based on their performance. In DSNet, the distended convolution block network was proposed, where the distended layers are densely connected to each other in order to preserve information from continuously varied scales. Three blocks are cascaded and linked to dense residual connections to widen the range of levels covered by network and also a novel loss of consistency at multi-scale density level was introduced to improve performance. In SFANet, two foremost elements with VGG backbone CNN and two-way path multi-scale fusion networks were suggested for the front end feature extractor and back end to make density map in which one path highlights crowded regions present in images. The other direction is responsible for the fusion of multi-scale features and for the generation of the final high-quality high-density maps. In MANet (Multi-scale Attention Network), a new mechanism of soft attention was presented, which learns a series of masks and a level-conscious loss feature was introduced to regularize and direct the learning of different branches to specialize on a specific scale. In Bayesian Loss, a novel loss function was used to generate a density contribution model from the point annotations. We also analyzed the results of the 4 convolutional neural networks, extracted the pattern of convolutional neural network structure and found promising pathways for researchers in this fast-growing area.
|