在分割中我们有时会去用intersection over union去衡量模型的表现,具体定义如下:在有了这个定义以后我们可以规定比如说对于predicted instance和actual instance,IoU大于0.5算一个positive。在这基础之上可以做一些F1,F2之类其他的更宏观的metric。所以说怎么去优化IoU呢?¬_¬拿二分类问题举例,做baseline的时先扔上个binary-crossentropy看下效果,于是就有了以下的实现(PyTorch):
这次的问题在于训练过程的不稳定。一个模型从坏到好,我们希望监督它的loss/metric的过渡是平滑的,但直接暴力套用IoU显然不行。。。。于是我们有了Lovász-Softmax!A tractablesurrogate for the optimization of the intersection-over-union measure in neural networkshttps://github.com/bermanmaxim/LovaszSoftmax具体为什么这个loss比BCE/Jaccard要好我不敢瞎说......但从个人使用体验来看效果拔群 \ (•◡•) /还有一个很有意思的细节是:原implementation中这一段:
loss = torch.dot(F.relu(errors_sorted), Variable(grad))
def focal_loss(self, output, target, alpha, gamma, OHEM_percent): output = output.contiguous().view(-1) target = target.contiguous().view(-1) max_val = (-output).clamp(min=0) loss = output - output * target + max_val + ((-max_val).exp() + (-output - max_val).exp()).log() # This formula gives us the log sigmoid of 1-p if y is 0 and of p if y is 1 invprobs = F.logsigmoid(-output * (target * 2 - 1)) focal_loss = alpha * (invprobs * gamma).exp() * loss # Online Hard Example Mining: top x% losses (pixel-wise). Refer to http://www.robots.ox.ac.uk/~tvg/publications/2017/0026.pdf OHEM, _ = focal_loss.topk(k=int(OHEM_percent * [*focal_loss.shape][0])) return OHEM.mean()
2. 魔改U-Net
原始Unet长这样子(Keras):
def conv_block(neurons, block_input, bn=False, dropout=None): conv1 = Conv2D(neurons, (3,3), padding='same', kernel_initializer='glorot_normal')(block_input) if bn: conv1 = BatchNormalization()(conv1) conv1 = Activation('relu')(conv1) if dropout is not None: conv1 = SpatialDropout2D(dropout)(conv1) conv2 = Conv2D(neurons, (3,3), padding='same', kernel_initializer='glorot_normal')(conv1) if bn: conv2 = BatchNormalization()(conv2) conv2 = Activation('relu')(conv2) if dropout is not None: conv2 = SpatialDropout2D(dropout)(conv2) pool = MaxPooling2D((2,2))(conv2) return pool, conv2 # returns the block output and the shortcut to use in the uppooling blocks def middle_block(neurons, block_input, bn=False, dropout=None): conv1 = Conv2D(neurons, (3,3), padding='same', kernel_initializer='glorot_normal')(block_input) if bn: conv1 = BatchNormalization()(conv1) conv1 = Activation('relu')(conv1) if dropout is not None: conv1 = SpatialDropout2D(dropout)(conv1) conv2 = Conv2D(neurons, (3,3), padding='same', kernel_initializer='glorot_normal')(conv1) if bn: conv2 = BatchNormalization()(conv2) conv2 = Activation('relu')(conv2) if dropout is not None: conv2 = SpatialDropout2D(dropout)(conv2) return conv2 def deconv_block(neurons, block_input, shortcut, bn=False, dropout=None): deconv = Conv2DTranspose(neurons, (3, 3), strides=(2, 2), padding="same")(block_input) uconv = concatenate([deconv, shortcut]) uconv = Conv2D(neurons, (3, 3), padding="same", kernel_initializer='glorot_normal')(uconv) if bn: uconv = BatchNormalization()(uconv) uconv = Activation('relu')(uconv) if dropout is not None: uconv = SpatialDropout2D(dropout)(uconv) uconv = Conv2D(neurons, (3, 3), padding="same", kernel_initializer='glorot_normal')(uconv) if bn: uconv = BatchNormalization()(uconv) uconv = Activation('relu')(uconv) if dropout is not None: uconv = SpatialDropout2D(dropout)(uconv) return uconv def build_model(start_neurons, bn=False, dropout=None): input_layer = Input((128, 128, 1)) # 128 -> 64 conv1, shortcut1 = conv_block(start_neurons, input_layer, bn, dropout) # 64 -> 32 conv2, shortcut2 = conv_block(start_neurons * 2, conv1, bn, dropout) # 32 -> 16 conv3, shortcut3 = conv_block(start_neurons * 4, conv2, bn, dropout) # 16 -> 8 conv4, shortcut4 = conv_block(start_neurons * 8, conv3, bn, dropout) #Middle convm = middle_block(start_neurons * 16, conv4, bn, dropout) # 8 -> 16 deconv4 = deconv_block(start_neurons * 8, convm, shortcut4, bn, dropout) # 16 -> 32 deconv3 = deconv_block(start_neurons * 4, deconv4, shortcut3, bn, dropout) # 32 -> 64 deconv2 = deconv_block(start_neurons * 2, deconv3, shortcut2, bn, dropout) # 64 -> 128 deconv1 = deconv_block(start_neurons, deconv2, shortcut1, bn, dropout) #uconv1 = Dropout(0.5)(uconv1) output_layer = Conv2D(1, (1,1), padding="same", activation="sigmoid")(deconv1) model = Model(input_layer, output_layer) return model
但一般与其是用transposed convolution我们会选择用upsampling+3*3 conv(Deconvolution and Checkerboard Artifacts),具体原因请见这篇文章:Deconvolution and Checkerboard Artifacts (强烈安利distill,blog质量奇高)再往下说,在实际做project的时候往往没有那么多的训练资源,所以我们得想办法把那些classification预训练模型嵌入到Unet中。ʕ•ᴥ•ʔ把encoder替换预训练的模型的诀窍在于,如何很好的提取出pretrained models在不同尺度上提取出来的信息,并且如何把它们高效的接在decoder上。常见的用于嫁接的模型有Inception和Mobilenet,但我在这里就分析一下更直观一些的ResNet/ResNeXt这一类的模型:
def forward(self, x): x = self.conv1(x) x = self.bn1(x) x = self.relu(x) x = self.maxpool(x) x = self.layer1(x) x = self.layer2(x) x = self.layer3(x) x = self.layer4(x) x = self.avgpool(x) x = x.view(x.size(0), -1) x = self.fc(x) return x
其实训练我觉得真的是case by case,在task A上用的heuristics放到task B效果就反而没那么好,所以我就介绍一个大多场合下都能用的trick:Cosine Annealing w. Snapshot Ensemble(https://arxiv.org/abs/1704.00109)听上去听酷炫的,实际上就是每个一段时间warm restart学习率,这样在单位时间内能得到多个而不是一个converged local minina,做融合的话手上的模型会多很多。放几张图上来感受一下:实现的话,其实挺简单的: