機器學習  computing parameters analytically
Normal Equation : article
Size (feet^2)  Number of bedrooms  Number of floors  Age of home (years)  Price ($1000) 
2104  5  1  45  460 
1416  3  2  40  232 
1534  3  2  30  315 
852  2  1  36  178 
 take data set add a new column
[x0]  Size (feet^2) [x1]  Number of bedrooms [x2]  Number of floors [x3]  Age of home (years) [x4]  Price ($1000) [y] 
1  2104  5  1  45  460 
1  1416  3  2  40  232 
1  1534  3  2  30  315 
1  852  2  1  36  178 
example: m =4
 1 2104 5 1 45   460 
 1 1416 3 2 40   232 
x =  1 1534 3 2 30  y =  315 
 1 852 2 1 36   178 
m x ( n + 1 ) m  dimensional vector
重要公式： <<< θ = ( x^T * x )^1 * x^T * y >>>
花生什麼事！？

來解釋拉:
m examples (x^(1), y(1)), ..., (x^(m), y^(m)); n features ｜ x1^(i) ｜ ｜ x1^(i) ｜ ｜ x1^(i) ｜ x^(i) = ｜ . ｜ ⊂ R^n+1 ｜ . ｜ ｜ . ｜ ｜ x1^(i) ｜
design matrix: ｜ (x1^(1))^T ｜ ｜ (x1^(2))^T ｜ ｜ (x1^(3))^T ｜ x = ｜ . ｜ (design ｜ . ｜ matrix) ｜ . ｜ ｜ (x1^(m))^T ｜

EX:
 1  if x^(i) =  x1^(i)   1 x1^(1)   y^(1)   1 x1^(2)   y^(2)   1 .   .  x =  1 .  y =  .   1 .   .   1 x1^(m)   y^(m)  θ = ( x^T * x )^1 * x^T * y
θ = ( x^T * x )^1 * x^T * y :
( x^T * x )^1 is inverse of matrix x^T * x

There is NO NEED to do feature scaling with the normal equation.

comparison of gradient descent and the normal equation:
Gradient Descent
Normal Equation
Need to choose alpha No need to choose alpha noeeds many iterations No need to iterate O (kn^2) O (k^3), need to calculate inverse of x^T * x work well when n is large slow if n is very large
Normal Equation Noninvertibility : article
Normal equation:
θ = ( x^T * x )^1 * x^T * y

What if x^T * x is noninvertible ? (sigular/degenerate)

Octave:
pinv(x'*x)*x'*y
What if x^T * x is noninvertible?
 Redundant features (linearly dependent)
Ex: x1 = size in feet^2 x2 = sinze in m^2 (1m = 3.28 feet) ===> x1 = (3.28)^2 * x2
 Too many features (ex: m ≤ n)
 Delete some features, or use regularization