1 minute read

Tags:

Multiple features : article

Size (feet^2) Number of bedrooms Number of floors Age of home (years) Price ($1000)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178

Notataion:

  • n = number of features
  • x^(i) = input (features) of i^th training example.
  •   ^(i)
     x     = value of feature j in i ^ th training example
      j   
    

EX:

         | 1416 |
         |    3 |
x^(2) =  |    2 |
         |   40 |

  ^(2)
 x     = 2 (Number of floors)
  3   

Hypothesis:

Hypothesis : 
    hθ(x) = θ0 + θ1x1 + θ2x2 + θ3x3 + θ4x4 

ex: 
    hθ(x) = 80 + 0.1x1 + 0.01x2 + 3x3 + -2x4 
                     |        |     |      |  
                 (size) (bedrooms) (floor) (age)

Hypothesis form:

hθ(x) = θ0 + θ1x1 + θ2x2 + ... + θnxn

for convenience of notation, define x0 = 1  (x^(i) = 1)
=> 
hθ(x) = θ0x0 + θ1x1 + θ2x2 + ... + θnxn

Gradient Descent for Multiple Variables : article

Hypothesis:
    hθ(x) = θ0x0 + θ1x1 + θ2x2 + ... + θnxn
Parameters: 
    θ0,θ1,θ2,θ3, ... ,θn (n+1 dimensional vector)
Cost Function:
    J(θ0, θ1, ... , θn) = 1/2m * ∑ ( hθ * ( x^(i) ) - y^(i) )^2
Gradient descent:
    Repeat {
        θj := θj -  α * ∂/∂θj * J(θ0, ... ,θn)
    }   (simultaneously update for every j=0,...n)

Imgur

Gradient Descent in practice I - Feature : article

ideal: Make sure features are on a similar scale

Feature Scaling:

Get every feature into approximately a -1 <= xi <= 1 range

Feature scaling involves dividing the input values by the range (i.e. the maximum value minus the minimum value) of the input variable, resulting in a new range of just 1

Mean normmalization:

Replace xi wiht xi - μi to make features have approximately zero mean (Do not apply to x0=1)

EX: 
 x1 = size - 1000 / 2000
 x2 = #bedrooms -2 / 5


x1 = x1 - μ1  / s1

μ1 --> avg value of x1, in training set

s1 --> range (max-min) (or standard deviation)