Adam 可以解决一堆奇奇怪怪的问题(有时 loss 降不下去,换 Adam 瞬间就好了),也可以带来一堆奇奇怪怪的问题(比如单词词频差异很大,当前 batch 没有的单词的词向量也被更新;再比如Adam和L2正则结合产生的复杂效果)。用的时候要胆大心细,万一遇到问题找各种魔改 Adam(比如 MaskedAdam[14], AdamW 啥的)抢救。
本文参考资料[1]How Do You Find A Good Learning Rate:https://sgugger.github.io/how-do-you-find-a-good-learning-rate.html[2]Cyclical Learning Rates for Training Neural Networks:https://arxiv.org/abs/1506.01186[3]Pytorch库Issue: https://github.com/pytorch/pytorch/issues/4534[4]Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification: https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/He_Delving_Deep_into_ICCV_2015_paper.pdf[5]Understanding the difficulty of training deep feedforward neural networks: http://proceedings.mlr.press/v9/glorot10a.html[6]Xavier初始化论文: http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf[7]He初始化论文: https://arxiv.org/abs/1502.01852[8]https://arxiv.org/abs/1312.6120: https://link.zhihu.com/?target=https%3A//arxiv.org/abs/1312.6120[9]fastai中的图像增强技术为什么相对比较好: https://oldpan.me/archives/fastai-1-0-quick-study[10]towardsdatascience.com/transfer-le…: https://towardsdatascience.com/transfer-learning-using-differential-learning-rates-638455797f00[11]机器学习算法如何调参?这里有一份神经网络学习速率设置指南: https://zhuanlan.zhihu.com/p/34236769[12]SGDR: Stochastic Gradient Descent with Warm Restarts: https://arxiv.org/abs/1608.03983[13]The nuts and bolts of building applications using deep learning: https://www.youtube.com/watch?v=F1ka6a13S9I[14]MaskedAdam:https://www.zhihu.com/question/265357659/answer/580469438[15]http://arxiv.org/abs/1409.2329: https://link.zhihu.com/?target=http%3A//arxiv.org/abs/1409.2329[16]http://jmlr.org/proceedings/papers/v37/jozefowicz15.pdf:https://link.zhihu.com/?target=http%3A//jmlr.org/proceedings/papers/v37/jozefowicz15.pdf[17]http://arxiv.org/abs/1505.00387: https://link.zhihu.com/?target=http%3A//arxiv.org/abs/1505.00387[18]关于训练神经网路的诸多技巧Tricks(完全总结版): https://juejin.im/post/5be5b0d7e51d4543b365da51[19]你有哪些deep learning(rnn、cnn)调参的经验?: https://www.zhihu.com/question/41631631[20]Bag of Tricks for Image Classification with Convolutional Neural Networks: https://link.zhihu.com/?target=https%3A//arxiv.org/abs/1812.01187[21]Must Know Tips/Tricks in Deep Neural Networks:https://link.zhihu.com/?target=http%3A//lamda.nju.edu.cn/weixs/project/CNNTricks/CNNTricks.html[22]33条神经网络训练秘技: https://zhuanlan.zhihu.com/p/63841572[23]26秒单GPU训练CIFAR10: https://zhuanlan.zhihu.com/p/79020733[24]Batch Normalization: https://link.zhihu.com/
?target=https%3A//arxiv.org/abs/1502.03167%3Fcontext%3Dcs[25]Searching for Activation Functions: https://link.zhihu.com/?target=https%3A//arxiv.org/abs/1710.05941