機器學習 - multivariate linear regression
Multiple features : article
Size (feet^2) | Number of bedrooms | Number of floors | Age of home (years) | Price ($1000) |
2104 | 5 | 1 | 45 | 460 |
1416 | 3 | 2 | 40 | 232 |
1534 | 3 | 2 | 30 | 315 |
852 | 2 | 1 | 36 | 178 |
… | … | … | … | … |
Notataion:
- n = number of features
- x^(i) = input (features) of i^th training example.
-
^(i) x = value of feature j in i ^ th training example j
EX:
| 1416 |
| 3 |
x^(2) = | 2 |
| 40 |
^(2)
x = 2 (Number of floors)
3
Hypothesis:
Hypothesis :
hθ(x) = θ0 + θ1x1 + θ2x2 + θ3x3 + θ4x4
ex:
hθ(x) = 80 + 0.1x1 + 0.01x2 + 3x3 + -2x4
| | | |
(size) (bedrooms) (floor) (age)
Hypothesis form:
hθ(x) = θ0 + θ1x1 + θ2x2 + ... + θnxn
for convenience of notation, define x0 = 1 (x^(i) = 1)
=>
hθ(x) = θ0x0 + θ1x1 + θ2x2 + ... + θnxn
Gradient Descent for Multiple Variables : article
Hypothesis:
hθ(x) = θ0x0 + θ1x1 + θ2x2 + ... + θnxn
Parameters:
θ0,θ1,θ2,θ3, ... ,θn (n+1 dimensional vector)
Cost Function:
J(θ0, θ1, ... , θn) = 1/2m * ∑ ( hθ * ( x^(i) ) - y^(i) )^2
Gradient descent:
Repeat {
θj := θj - α * ∂/∂θj * J(θ0, ... ,θn)
} (simultaneously update for every j=0,...n)
Gradient Descent in practice I - Feature : article
ideal: Make sure features are on a similar scale
Feature Scaling:
Get every feature into approximately a
-1 <= xi <= 1
range
Feature scaling involves dividing the input values by the range (i.e. the maximum value minus the minimum value) of the input variable, resulting in a new range of just 1
Mean normmalization:
Replace xi wiht xi - μi to make features have approximately zero mean (Do not apply to x0=1)
EX:
x1 = size - 1000 / 2000
x2 = #bedrooms -2 / 5
x1 = x1 - μ1 / s1
μ1 --> avg value of x1, in training set
s1 --> range (max-min) (or standard deviation)