Backpropagation in Practice
You ARE What You DO!
Implementation Note: Unrolling Parameters : article
- with neural networks, we are working with sets of matrics:
(1) (2) (3) Θ , Θ , Θ ... (1) (2) (3) D , D , D ...
Octave Example
- Theta1
Theta1 = ones(10, 11) Theta1 = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
- Theta2
Theta2 = 2*ones(10, 11) Theta2 = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
- Theta3
Theta3 = 3*ones(1, 11) Theta3 = 3 3 3 3 3 3 3 3 3 3 3
- thetaVec
thetaVec = [ Theta1(:); Theta2(:); Theta3(:) ] thetaVec = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 . . . . . 2 2 2 2 2 2 . . . . 3 3 3 3 >> size(thetaVec) ans = 231 1
- reshape
reshape(thetaVec(1:110), 10, 11) ans = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 % reshape % reshape(thetaVec(111:220), 10, 11) ans = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 % reshape % reshape(thetaVec(221:231), 1, 11) ans = 3 3 3 3 3 3 3 3 3 3 3
- Summary
Gradient checking : article
Numerical estimation of gradients
Gradient Checking
Note:
Octave example
for i = 1:n,
thetaPlus = theta;
thetaPlus(i) = thetaPlus(i) + EPSILON; % ϵ (EPSILON) %
thetaMinus = theta;
thetaMinus(i) = thetaMinus(i) - EPSILON;
gradApprox(i) = (J(thetaPlus) - J(thetaMinus)) / (2*EPSILON);
end;
- We previously saw how to calculate the deltaVector. So once we compute our gradApprox vector, we can check that gradApprox ≈ deltaVector.
Random Initialzation : article
我發現這幾篇開始,都跟實作有關西! 所以,我決定要用 python 來練習!
-
zero initialization —> GG 不優
-
Random initialization: Symmetry breaking –> Good! 棒棒~