# 機器學習 - multivariate linear regression

## Multiple features : article

 Size (feet^2) Number of bedrooms Number of floors Age of home (years) Price (\$1000) 2104 5 1 45 460 1416 3 2 40 232 1534 3 2 30 315 852 2 1 36 178 … … … … …

Notataion:

• n = number of features
• x^(i) = input (features) of i^th training example.
• ``````  ^(i)
x     = value of feature j in i ^ th training example
j
``````

EX:

``````         | 1416 |
|    3 |
x^(2) =  |    2 |
|   40 |

^(2)
x     = 2 (Number of floors)
3
``````

Hypothesis:

``````Hypothesis :
hθ(x) = θ0 + θ1x1 + θ2x2 + θ3x3 + θ4x4

ex:
hθ(x) = 80 + 0.1x1 + 0.01x2 + 3x3 + -2x4
|        |     |      |
(size) (bedrooms) (floor) (age)
``````

Hypothesis form:

``````hθ(x) = θ0 + θ1x1 + θ2x2 + ... + θnxn

for convenience of notation, define x0 = 1  (x^(i) = 1)
=>
hθ(x) = θ0x0 + θ1x1 + θ2x2 + ... + θnxn
``````

## Gradient Descent for Multiple Variables : article

``````Hypothesis:
hθ(x) = θ0x0 + θ1x1 + θ2x2 + ... + θnxn
Parameters:
θ0,θ1,θ2,θ3, ... ,θn (n+1 dimensional vector)
Cost Function:
J(θ0, θ1, ... , θn) = 1/2m * ∑ ( hθ * ( x^(i) ) - y^(i) )^2
Repeat {
θj := θj -  α * ∂/∂θj * J(θ0, ... ,θn)
}   (simultaneously update for every j=0,...n)

`````` ## Gradient Descent in practice I - Feature : article

ideal: Make sure features are on a similar scale

Feature Scaling:

Get every feature into approximately a `-1 <= xi <= 1` range

Feature scaling involves dividing the input values by the range (i.e. the maximum value minus the minimum value) of the input variable, resulting in a new range of just 1

Mean normmalization:

Replace xi wiht xi - μi to make features have approximately zero mean (Do not apply to x0=1)

``````EX:
x1 = size - 1000 / 2000
x2 = #bedrooms -2 / 5

x1 = x1 - μ1  / s1

μ1 --> avg value of x1, in training set

s1 --> range (max-min) (or standard deviation)

``````

Updated: