Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

循环神经网络rnn.py中calc_delta_k方法实现有错误 #29

Open
cyrixlin opened this issue Sep 20, 2018 · 2 comments
Open

循环神经网络rnn.py中calc_delta_k方法实现有错误 #29

cyrixlin opened this issue Sep 20, 2018 · 2 comments

Comments

@cyrixlin
Copy link

cyrixlin commented Sep 20, 2018

首先,非常感谢《零基础入门深度学习》作者hanbingtao付出的辛苦努力,提供了这么好的教程和代码程序。

在学习的过程中我发现,零基础入门深度学习(5) - 循环神经网络的代码实现rnn.py有个明显错误的地方,并且我用梯度检验程序验过了,确实有问题。现提出问题和解决方法如下,供作者参考。

原方法内容:

def calc_delta_k(self, k, activator):
'''
根据k+1时刻的delta计算k时刻的delta
'''
state = self.state_list[k+1].copy()
element_wise_op(self.state_list[k+1],
activator.backward)
self.delta_list[k] = np.dot(
np.dot(self.delta_list[k+1].T, self.W),
np.diag(state[:,0])).T

这里存在2处明显的错误:

  1. state 应取self.state_list[k].copy(),而非k+1元素。
  2. state变量取出后,没进行element_wise_op操作,应当放在element_wise_op方法中进行逐元素的activator.backward操作。

分析:
state取self.state_list[k].copy()后,再进行element_wise_op操作,获得激活函数的导数数组,用此k层的数组乘以(k+1层的误差项与W的乘积)才是k层的误差项。

修改如下:
def calc_delta_k(self, k, activator):
'''
根据k+1时刻的delta计算k时刻的delta
'''
state = self.state_list[k].copy()
element_wise_op(state,
activator.backward)
self.delta_list[k] = np.dot(
np.dot(self.delta_list[k+1].T, self.W),
np.diag(state[:,0])).T

验证情况如下:
验证数据调整如下(输入数据调整为4维,输入数据调整为3个):
def data_set():
x = [np.array([[1], [2], [3], [8]]),
np.array([[2], [3], [4],[-9]]),
np.array([[-1], [-2], [4], [3]])]
d = np.array([[1], [2]])
return x, d

验证程序调整如下(输入数据调整为4维,每层的隐藏神经元数调整为3,输入数据调整为3个):
def gradient_check():
'''
梯度检查
'''
# 设计一个误差函数,取所有节点输出项之和
error_function = lambda o: o.sum()

rl = RecurrentLayer(4, 3, IdentityActivator(), 1e-3)

# 计算forward值
x, d = data_set()
rl.forward(x[0])
rl.forward(x[1])
rl.forward(x[2])

# 求取sensitivity map
sensitivity_array = np.ones(rl.state_list[-1].shape,
                            dtype=np.float64)
# 计算梯度
rl.backward(sensitivity_array, IdentityActivator())

# 检查梯度
epsilon = 10e-4
for i in range(rl.W.shape[0]):
    for j in range(rl.W.shape[1]):
        rl.W[i,j] += epsilon
        rl.reset_state()
        rl.forward(x[0])
        rl.forward(x[1])
        rl.forward(x[2])
        err1 = error_function(rl.state_list[-1])
        rl.W[i,j] -= 2*epsilon
        rl.reset_state()
        rl.forward(x[0])
        rl.forward(x[1])
        rl.forward(x[2])
        err2 = error_function(rl.state_list[-1])
        expect_grad = (err1 - err2) / (2 * epsilon)
        rl.W[i,j] += epsilon
        print 'weights(%d,%d): expected - actural %f - %f' % (
            i, j, expect_grad, rl.gradient[i,j])

按calc_delta_k的原程序,输出如下:
D:\python_2.7\python.exe D:/python_code/learn_dl-master/rnn.py
weights(0,0): expected - actural 0.000095 - 1.000000
weights(0,1): expected - actural 0.000372 - 1.000000
weights(0,2): expected - actural 0.000512 - 1.000000
weights(1,0): expected - actural 0.000095 - 1.000000
weights(1,1): expected - actural 0.000372 - 1.000000
weights(1,2): expected - actural 0.000512 - 1.000000
weights(2,0): expected - actural 0.000095 - 1.000000
weights(2,1): expected - actural 0.000372 - 1.000000
weights(2,2): expected - actural 0.000512 - 1.000000

Process finished with exit code 0

按calc_delta_k的修改后的程序,输出如下:
D:\python_2.7\python.exe D:/python_code/learn_dl-master/rnn.py
weights(0,0): expected - actural -0.001360 - -0.001360
weights(0,1): expected - actural 0.000520 - 0.000520
weights(0,2): expected - actural 0.000452 - 0.000452
weights(1,0): expected - actural -0.001360 - -0.001360
weights(1,1): expected - actural 0.000520 - 0.000520
weights(1,2): expected - actural 0.000452 - 0.000452
weights(2,0): expected - actural -0.001360 - -0.001360
weights(2,1): expected - actural 0.000520 - 0.000520
weights(2,2): expected - actural 0.000452 - 0.000452

Process finished with exit code 0

由此可以验证原程序calc_delta_k函数是不正确的,修改后的是正确的。

@GSD-Dreammark
Copy link

这里确实有问题,和文中的公式3明显对不上。问个问题在bp.py文件中的梯度检查方法gradient_check 中计算网络误差 network_error=lambda vec1,vec2:0.5reduce(lambda a,b:a+b,map(lambda v:(v[0]-v[1])(v[0]-v[1]),zip(vec1,vec2)))为啥是这个公式?

@JnuSimba
Copy link

JnuSimba commented Jul 3, 2023

@cyrixlin 求问这里是不是对应公式4,若是的话 self.W 是否应该改成 self.U ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants