numpy - Wrong values for partial derivatives in neural network python -
i implementing simple neural network classifier iris dataset. nn has 3 input nodes, 1 hidden layer 2 nodes, , 3 output nodes. have implemented evrything values of partial derivatives not calculated correctly. have exhausted myself looking solution couldn't. here code calculating partial derivatives.
def derivative_cost_function(self,x,y,thetas): ''' computes derivates of cost function w.r.t input parameters (thetas) given input , labels. input: ------ x: can either single d x n-dimensional vector or d x n dimensional matrix of inputs theata: must dk x 1-dimensional vector representing vectors of k classes y: must k x n-dimensional label vector returns: ------ partial_thetas: dk x 1-dimensional vector of partial derivatives of cost function w.r.t parameters.. ''' #forward pass a2, a3=self.forward_pass(x,thetas) #now back-propogate # unroll thetas l1theta, l2theta = self.unroll_thetas(thetas) nexamples=float(x.shape[1]) # compute delta3, l2theta a3 = np.array(a3) a2 = np.array(a2) y = np.array(y) a3 = a3.t delta3 = (a3 * (1 - a3)) * (((a3 - y)/((a3)*(1-a3)))) l2derivatives = np.dot(delta3, a2) #print "layer 2 derivatives shape = ", l2derivatives.shape #print "layer 2 derivatives = ", l2derivatives # compute delta2, l1 theta a2 = a2.t dotproduct = np.dot(l2theta.t,delta3) delta2 = dotproduct * (a2) * (1- a2) l1derivatives = np.dot(delta2[1:], x.t) #print "layer 1 derivatives shape = ", l1derivatives.shape #print "layer 1 derivatives = ", l1derivatives #remember exclude last element of delta2, representing deltas of bias terms... # i.e. delta2=delta2[:-1] # roll thetas big vector thetas=(self.roll_thetas(l1derivatives,l2derivatives)).reshape(thetas.shape) # return same shape received return thetas
why not have of implementation in https://github.com/zizhaozhang/simple_neutral_network/blob/master/nn.py
the derivatives here:
def dcostfunction(self, theta, in_dim, hidden_dim, num_labels, x, y): #compute gradient t1, t2 = self.uncat(theta, in_dim, hidden_dim) a1, z2, a2, z3, a3 = self._forward(x, t1, t2) # p x s matrix # t1 = t1[1:, :] # remove bias term # t2 = t2[1:, :] sigma3 = -(y - a3) * self.dactivation(z3) # not apply dsigmode here? should sigma2 = np.dot(t2, sigma3) term = np.ones((1,num_labels)) sigma2 = sigma2 * np.concatenate((term, self.dactivation(z2)),axis=0) theta2_grad = np.dot(sigma3, a2.t) theta1_grad = np.dot(sigma2[1:,:], a1.t) theta1_grad = theta1_grad / num_labels theta2_grad = theta2_grad / num_labels return self.cat(theta1_grad.t, theta2_grad.t) hope helps
Comments
Post a Comment