摘自Deeplarning.ai:
建立神经网络的一般方法是:
-
定义神经网络结构(输入单元、隐藏单元等)。
-
初始化模型参数
-
循环:
-
实现正向传播
-
计算损失
-
实现反向传播以获取渐变
-
更新参数(梯度下降)
损失函数如何影响网络学习?
例如,下面是我的前向和后向传播的实现,我认为这是正确的,因为我可以使用下面的代码来训练模型,以获得可接受的结果:
for i in range(number_iterations):
# forward propagation
Z1 = np.dot(weight_layer_1, xtrain) + bias_1
a_1 = sigmoid(Z1)
Z2 = np.dot(weight_layer_2, a_1) + bias_2
a_2 = sigmoid(Z2)
mse_cost = np.sum(cost_all_examples)
cost_cross_entropy = -(1.0/len(X_train) * (np.dot(np.log(a_2), Y_train.T) + np.dot(np.log(1-a_2), (1-Y_train).T)))
# Back propagation and gradient descent
d_Z2 = np.multiply((a_2 - xtrain), d_sigmoid(a_2))
d_weight_2 = np.dot(d_Z2, a_1.T)
d_bias_2 = np.asarray(list(map(lambda x : [sum(x)] , d_Z2)))
# perform a parameter update in the negative gradient direction to decrease the loss
weight_layer_2 = weight_layer_2 + np.multiply(- learning_rate , d_weight_2)
bias_2 = bias_2 + np.multiply(- learning_rate , d_bias_2)
d_a_1 = np.dot(weight_layer_2.T, d_Z2)
d_Z1 = np.multiply(d_a_1, d_sigmoid(a_1))
d_weight_1 = np.dot(d_Z1, xtrain.T)
d_bias_1 = np.asarray(list(map(lambda x : [sum(x)] , d_Z1)))
weight_layer_1 = weight_layer_1 + np.multiply(- learning_rate , d_weight_1)
bias_1 = bias_1 + np.multiply(- learning_rate , d_bias_1)
注意线条:
mse_cost = np.sum(cost_all_examples)
cost_cross_entropy = -(1.0/len(X_train) * (np.dot(np.log(a_2), Y_train.T) + np.dot(np.log(1-a_2), (1-Y_train).T)))
我可以使用MSE损失或交叉熵损失来通知系统的学习情况。但这只是为了提供信息,成本函数的选择并不影响网络的学习方式。我相信我并没有像在深度学习文献中那样经常理解一些基本的东西。它指出,选择损失函数是深度学习的一个重要步骤?但是正如上面的代码所示,我可以选择交叉熵或MSE损失,并且不会影响网络的学习方式,交叉熵或MSE损失仅用于信息目的?
更新:
例如,下面是deeplarning.ai中计算成本的代码片段:
# GRADED FUNCTION: compute_cost
def compute_cost(A2, Y, parameters):
"""
Computes the cross-entropy cost given in equation (13)
Arguments:
A2 -- The sigmoid output of the second activation, of shape (1, number of examples)
Y -- "true" labels vector of shape (1, number of examples)
parameters -- python dictionary containing your parameters W1, b1, W2 and b2
Returns:
cost -- cross-entropy cost given equation (13)
"""
m = Y.shape[1] # number of example
# Retrieve W1 and W2 from parameters
### START CODE HERE ### (â 2 lines of code)
W1 = parameters['W1']
W2 = parameters['W2']
### END CODE HERE ###
# Compute the cross-entropy cost
### START CODE HERE ### (â 2 lines of code)
logprobs = np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2))
cost = - np.sum(logprobs) / m
### END CODE HERE ###
cost = np.squeeze(cost) # makes sure cost is the dimension we expect.
# E.g., turns [[17]] into 17
assert(isinstance(cost, float))
return cost
此代码按预期运行,实现了高精度/低成本。除了向机器学习工程师提供有关网络学习情况的信息外,此实施中不使用成本值。这让我怀疑成本函数的选择如何影响神经网络的学习?