機器學習  multivariate linear regression
Multiple features : article
Size (feet^2)  Number of bedrooms  Number of floors  Age of home (years)  Price ($1000) 
2104  5  1  45  460 
1416  3  2  40  232 
1534  3  2  30  315 
852  2  1  36  178 
…  …  …  …  … 
Notataion:
 n = number of features
 x^(i) = input (features) of i^th training example.

^(i) x = value of feature j in i ^ th training example j
EX:
 1416 
 3 
x^(2) =  2 
 40 
^(2)
x = 2 (Number of floors)
3
Hypothesis:
Hypothesis :
hθ(x) = θ0 + θ1x1 + θ2x2 + θ3x3 + θ4x4
ex:
hθ(x) = 80 + 0.1x1 + 0.01x2 + 3x3 + 2x4
   
(size) (bedrooms) (floor) (age)
Hypothesis form:
hθ(x) = θ0 + θ1x1 + θ2x2 + ... + θnxn
for convenience of notation, define x0 = 1 (x^(i) = 1)
=>
hθ(x) = θ0x0 + θ1x1 + θ2x2 + ... + θnxn
Gradient Descent for Multiple Variables : article
Hypothesis:
hθ(x) = θ0x0 + θ1x1 + θ2x2 + ... + θnxn
Parameters:
θ0,θ1,θ2,θ3, ... ,θn (n+1 dimensional vector)
Cost Function:
J(θ0, θ1, ... , θn) = 1/2m * ∑ ( hθ * ( x^(i) )  y^(i) )^2
Gradient descent:
Repeat {
θj := θj  α * ∂/∂θj * J(θ0, ... ,θn)
} (simultaneously update for every j=0,...n)
Gradient Descent in practice I  Feature : article
ideal: Make sure features are on a similar scale
Feature Scaling:
Get every feature into approximately a
1 <= xi <= 1
range
Feature scaling involves dividing the input values by the range (i.e. the maximum value minus the minimum value) of the input variable, resulting in a new range of just 1
Mean normmalization:
Replace xi wiht xi  μi to make features have approximately zero mean (Do not apply to x0=1)
EX:
x1 = size  1000 / 2000
x2 = #bedrooms 2 / 5
x1 = x1  μ1 / s1
μ1 > avg value of x1, in training set
s1 > range (maxmin) (or standard deviation)