【深度学习】还是那头痛的 Resize

作者丨王小二@知乎（已授权）

来源丨https://zhuanlan.zhihu.com/p/666296180

编辑丨极市平台

导读

打破Resize的支配！

一、起因

自己的推理实现在imagenet上eval结果和pytorch的差了0.924个点，看着差距不大但是就非常疑惑。

首先进行两个可能点排查：

在Python端使用pytorch和onnxruntime分别对torch和onnx文件进行推理比较，发现一致
在c++端使用onnxruntime和自己的推理分别进行推理比较，发现一致

这不是很奇怪吗？

A == B and B == C

那　Ａ≠Ｃ　吗？你以为是JavaScript吗。。。

那直接找A和C的不同呢？分别喂相同的数据给pytorch和自己的推理代码，发现存在一些细微差别。通过逐步排查发现在前处理的地方得到的数据就不一致了，那么就先重点排查前处理部分。

对于pytorch的训练一般的前处理就是

读图片解码
resize
转tensor
norm

看着都是比较常规的操作，应该容易对齐才对的。然后逐步排查发现了两个差距点

resize行为不一致
round行为不一致

二、老生常谈的一个问题

来看看让@大缺弦(https://zhuanlan.zhihu.com/p/107761106)大老师都头痛的Resize，主要是说在深度学习中的Resize

还有这位国外老哥，直接用了dangers这个词，这里主要是说一般图像的Resize
https://zuru.tech/blog/the-dangers-behind-image-resizing

不管是深度学习里面，还是一般的图像处理中。大家都被这个Resize支配着。

不过下面两节要谈的是一个更具体的点，pillow和opencv的resize差距。

三、简单搜索分析一下

为什么是pillow呢，因为torchvision默认就用的pillow来做图像的resize操作。

这里我已经怀疑pillow的resize和opencv不一致了，于是去Google了一把，发现果然很多人都遇到过类似的问题，那就说明方向对了。

https//github.com/python-pillow/Pillow/issues/2718

https//github.com/python-pillow/Pillow/issues/4445

https//github.com/python-pillow/Pillow/issues/4476

在pillow的issues中找到从17年就有人提出这个问题了，并且还有人试图pr一份修改，但是一直没有修改，为什么呢？

当然是pillow从之前的pil继承了resize的逻辑，认为他们不需要去模仿opencv的行为，并且选择了和大多数人理解不一样的实现方式。

https//zh.wikipedia.org/zh-cn/%25E5%258F%258C%25E7%25BA%25BF%25E6%2580%25A7%25E6%258F%2592%25E5%2580%25BC

大多数人学的都应该是维基百科提到的，双线性插值是在某个点的周围取相邻4个坐标点的值来计算，只是具体的计算方式有不同次数乘法的实现。

但是Pillow在双线性插值的时候用了一个高级的 two pass resize于是乎取值的逻辑就变了。从上面的issue中摘抄一个真实数据来看看

INPUT:
 [[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
 [10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]
 [20. 21. 22. 23. 24. 25. 26. 27. 28. 29.]
 [30. 31. 32. 33. 34. 35. 36. 37. 38. 39.]
 [40. 41. 42. 43. 44. 45. 46. 47. 48. 49.]
 [50. 51. 52. 53. 54. 55. 56. 57. 58. 59.]
 [60. 61. 62. 63. 64. 65. 66. 67. 68. 69.]
 [70. 71. 72. 73. 74. 75. 76. 77. 78. 79.]
 [80. 81. 82. 83. 84. 85. 86. 87. 88. 89.]
 [90. 91. 92. 93. 94. 95. 96. 97. 98. 99.]]
Pillow: 
 [[ 7.857143  9.642858 11.642858 13.642858 15.428572]
 [25.714285 27.5      29.5      31.5      33.285713]
 [45.714287 47.5      49.5      51.5      53.285713]
 [65.71429  67.5      69.5      71.5      73.28571 ]
 [83.57143  85.35714  87.35714  89.35714  91.14285 ]]
OpenCV: 
 [[ 5.5  7.5  9.5 11.5 13.5]
 [25.5 27.5 29.5 31.5 33.5]
 [45.5 47.5 49.5 51.5 53.5]
 [65.5 67.5 69.5 71.5 73.5]
 [85.5 87.5 89.5 91.5 93.5]]

Opencv的结果比较好理解，比如第一个点：5.5 = (0 + 1 + 10 +11) / 4

但是Pillow的结果就比较不常见了。直接看官方的解释

Regarding the first post, Pillow performs two passes over the image - horizontal, and then vertical.

So why is the first value 7.857143?

The coefficients generated by our bilinear function are 0.428571, 0.428571 and 0.142857.

Applying that horizontally,

0 * 0.428571 + 1 * 0.428571 + 2 * 0.142857 = 0.714285
10 * 0.428571 + 11 * 0.428571 + 12 * 0.142857 = 10.714275
20 * 0.428571 + 21 * 0.428571 + 22 * 0.142857 = 20.714265
and then vertically,

0.714285 * 0.428571 + 10.714275 * 0.428571 + 20.714265 * 0.142857 = 7.85712714287
You suggest that Pillow should only consider 0, 1, 10 and 11 to get the first pixel value. Instead, Pillow is also considering 2, 12, 20, 21 and 22. That is different to OpenCV, but I see no reason why it should be thought of as incorrect. 0, 1, 10 and 11 are still all considered equally.

https//github.com/python-pillow/Pillow/blob/b4bf2885f365b23e16772380173e971f89b208bf/src/libImaging/Resample.c%23L655

https//github.com/python-pillow/Pillow/blob/b4bf2885f365b23e16772380173e971f89b208bf/src/libImaging/Resample.c%23L20-L29

具体的核计算在上面的链接中。

一个直观的感受就是在这个示例中，opencv沿用的4点取值，pillow采用的时6点取值。这就导致了他们结果不会完全对等上。

实际上pillow的操作是为了抗锯齿，下面的示例能比较清楚的看出pillow和opencv的差距。

四、如何解决

这里不说Python怎么做，因为你可以非常方便的安装这两个库，并且相互转换。下面只讨论c++部署怎么做。

把pillow的c++代码抠出来
自己手搓一个
找大佬求求看有没有已经实现过了
Google碰运气

Google说今天运气不错，有现成的，并且还是基于opencv来做的。

https//github.com/zurutech/pillow-resize

这里要注意一下大佬的宏可能和某些opencv版本不匹配了，需要手动修改一下。其它毛病没有。

五、附加的round差异

大家都知道round有很多种

https//en.wikipedia.org/wiki/Floating-point_arithmetic

在维基百科的浮点说明中，Rounding modes 单元有如下的说明：

Alternative rounding options are also available. IEEE 754 specifies the following rounding modes:

round to nearest, where ties round to the nearest even digit in the required position (the default and by far the most common mode)
round to nearest, where ties round away from zero (optional for binary floating-point and commonly used in decimal)
round up (toward +∞; negative results thus round toward zero)
round down (toward −∞; negative results thus round away from zero)
round toward zero (truncation; it is similar to the common behavior of float-to-integer conversions, which convert −3.9 to −3 and 3.9 to 3)

比较常见的就是向最近临的偶数取整

然而在不同的语言，不同的版本中，同一个数的舍入情况也不一样。比如Python2.7和Python3.5对舍入操作的描述就不一样。在迁移到c++的时候需要小心的关注一下Python端用的那种舍入模式。避免Python和c++的不一致。




    
往期精彩回顾



    




适合初学者入门人工智能的路线及资料下载
(图文+视频)机器学习入门系列下载
机器学习及深度学习笔记等资料打印



    
《统计学习方法》的代码复现专辑

```
交流群
```

欢迎加入机器学习爱好者微信群一起和同行交流，目前有机器学习交流群、博士群、博士申报交流、CV、NLP等微信群，请扫描下面的微信号加群，备注：”昵称-学校/公司-研究方向“，例如：”张小明-浙大-CV“。请按照格式备注，否则不予通过。添加成功后会根据研究方向邀请进入相关微信群。请勿在群内发送广告，否则会请出群，谢谢理解~（也可以加入机器学习交流qq群772479961）