2 minute read

Tags:

Normal Equation : article

Size (feet^2) Number of bedrooms Number of floors Age of home (years) Price ($1000)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
  • take data set add a new column

example1

[x0] Size (feet^2) [x1] Number of bedrooms [x2] Number of floors [x3] Age of home (years) [x4] Price ($1000) [y]
1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
example: m =4

    | 1 2104 5 1 45 |        | 460 |
    | 1 1416 3 2 40 |        | 232 |
x = | 1 1534 3 2 30 |    y = | 315 |     
    | 1  852 2 1 36 |        | 178 |

      m x ( n + 1 )          m - dimensional vector

重要公式: <<<  θ = ( x^T * x )^-1 * x^T * y  >>>

花生什麼事!?

  • 來解釋拉:

      m examples (x^(1), y(1)), ..., (x^(m), y^(m)); n features
        
        
              | x1^(i) |
              | x1^(i) |
              | x1^(i) |
      x^(i) = |   .   | ⊂ R^n+1
              |   .   |
              |   .   |
              | x1^(i) |
        
    
      design matrix:
                       | (x1^(1))^T |
                       | (x1^(2))^T |
                       | (x1^(3))^T |
            x     =    |     .     | 
          (design      |     .     |
           matrix)     |     .     |
                       | (x1^(m))^T |
    
  • EX:

                 |  1     |
      if x^(i) = | x1^(i) |
        
          | 1  x1^(1) |       | y^(1) |   
          | 1  x1^(2) |       | y^(2) |    
          | 1    .    |       |   .   |   
      x = | 1    .    |   y = |   .   |  
          | 1    .    |       |   .   |   
          | 1  x1^(m) |       | y^(m) | 
        
       θ = ( x^T * x )^-1 * x^T * y
    

θ = ( x^T * x )^-1 * x^T * y :

( x^T * x )^-1 is inverse of matrix x^T * x

  • There is NO NEED to do feature scaling with the normal equation.

  • comparison of gradient descent and the normal equation:

    Gradient Descent Normal Equation
    Need to choose alpha No need to choose alpha
    noeeds many iterations No need to iterate
    O (kn^2) O (k^3), need to calculate inverse of x^T * x
    work well when n is large slow if n is very large

Normal Equation Noninvertibility : article

Normal equation:

θ = ( x^T * x )^-1 * x^T * y

  • What if x^T * x is non-invertible ? (sigular/degenerate)

  • Octave:

     pinv(x'*x)*x'*y
    

What if x^T * x is non-invertible?

  • Redundant features (linearly dependent)
    Ex:
      x1 = size in feet^2
      x2 = sinze in m^2     (1m = 3.28 feet)
    ===>  x1 = (3.28)^2 * x2
    
  • Too many features (ex: m ≤ n)
    • Delete some features, or use regularization