深度学习的效果在某种意义上是靠大量数据喂出来的,小目标检测的性能同样也可以通过增加训练集中小目标样本的种类和数量来提升。在《深度学习中不平衡样本的处理》[2]一文中已经介绍了许多数据增强的方案,这些方案虽然主要是解决不同类别样本之间数量不均衡的问题的,但是有时候小目标检测之难其中也有数据集中小样本相对于大样本来说数量很少的因素,所以其中很多方案都可以用在小样本数据的增强上,这里不赘述。另外,在19年的论文Augmentation for small object detection(https://arxiv.org/abs/1902.07296)中,也提出了两个简单粗暴的方法:
机器学习里面有个重要的观点,模型预训练的分布要尽可能地接近测试输入的分布。所以,在大分辨率(比如常见的224 x 224)下训练出来的模型,不适合检测本身是小分辨率再经放大送入模型的图片。如果是小分辨率的图片做输入,应该在小分辨率的图片上训练模型;再不行,应该用大分辨率的图片训练的模型上用小分辨率的图片来微调fine-tune;最差的就是直接用大分辨率的图片来预测小分辨率的图(通过上采样放大)。但是这是在理想的情况下的(训练样本数量、丰富程度都一样的前提下,但实际上,很多数据集都是小样本严重缺乏的),所以放大输入图像+使用高分率图像预训练再在小图上微调,在实践中要优于专门针对小目标训练一个分类器。
1. Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks,https://arxiv.org/abs/1604.028782. 《深度学习中不平衡样本的处理》,https://github.com/Captain1986/CaptainBlackboard/blob/master/D%230016-深度学习中不平衡样本的处理/D%230016.md3. Augmentation for small object detection,https://arxiv.org/pdf/1902.07296.pdf4. Feature Pyramid Networks for Object Detection,https://arxiv.org/abs/1612.031445. RetinaFace: Single-stage Dense Face Localisation in the Wild,https://arxiv.org/pdf/1905.00641.pdf6. SSH: Single Stage Headless Face Detector,https://arxiv.org/pdf/1708.03979.pdf7. An Analysis of Scale Invariance in Object Detection - SNIP,https://arxiv.org/abs/1711.081898. R-FCN: Object Detection via Region-based Fully Convolutional Networks,https://arxiv.org/abs/1605.064099. SNIPER: Efficient Multi-Scale Training,https://arxiv.org/pdf/1805.09300.pdf10. SAN: Learning Relationship between Convolutional Features for Multi-Scale Object Detection,https://arxiv.org/pdf/1808.04974.pdf11. ScratchDet: Training Single-Shot Object Detectors from Scratch,
https://arxiv.org/pdf/1810.08425.pdf12. FaceBoxes: A CPU Real-time Face Detector with High Accuracy,FaceBoxes: A CPU Real-time Face Detector with High Accuracy13. S3FD: Single Shot Scale-Invariant Face Detector,http://openaccess.thecvf.com/content_ICCV_2017/papers/Zhang_S3FD_Single_Shot_ICCV_2017_paper.pdf14. Perceptual Generative Adversarial Networks for Small Object Detection,https://arxiv.org/abs/1706.0527415. PyramidBox: A Context-assisted Single Shot Face Detector,https://arxiv.org/abs/1803.0773716.Relation Networks for Object Detection,https://arxiv.org/abs/1711.11575