[1] Bengio Y, LeCun Y. Scaling learning algorithms towards AI[J]. Large-scale kernel machines, 2007, 34(5): 1-41.
[2] Montufar G F, Pascanu R, Cho K, et al. On the number of linear regions of deep neural networks[C]//Advances in neural information processing systems. 2014: 2924-2932.
[3] Pascanu R, Montufar G, Bengio Y. On the number of response regions of deep feed forward networks with piece-wise linear activations[J]. arXiv preprint arXiv:1312.6098, 2013.
[4] Bianchini M, Scarselli F. On the complexity of neural network classifiers: A comparison between shallow and deep architectures[J]. IEEE transactions on neural networks and learning systems, 2014, 25(8): 1553-1565.
[5] Raghu M, Poole B, Kleinberg J, et al. On the expressive power of deep neural networks[C]//Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017: 2847-2854.