# 機器學習 - parameter learning

### Gradient Descent : article

- Gradient Descent —> to minimize the J(θ0, θ1)

Outline :

start with some (θ0, θ1)

keep changing θ0, θ1 to reduce J(θ0, θ1) until we hopefully end up at a minimum`Have some function J(θ0, θ1) Want min[θ0, θ1] J(θ0, θ1)`

Gradien descent algorithm :`repeat until convergence { θj := θj − α * ∂/∂θj * J(θ0,θ1) } (for j = 0, and j = 1)`

- Assignment : [ computer operation ]
`a := b a := a + 1`

- Truth assertion :
`a = b`

α : Learngin rate

θj := θj −

`α`

* ∂/∂θj * J(θ0,θ1)∂/∂θj * J(θ0,θ1) : derivative term

θj := θj − α *

`∂/∂θj * J(θ0,θ1)`

Simultaneous update :

- Simultaneously update θ0 and θ1 :
`θj := θj − α * ∂/∂θj * J(θ0,θ1)`

### Gradient Descent Intuition : article

Simplified:

- Repeat until convergence :
`θ1 := θ1 − α * d/dθ1 * J(θ1)`

slope :

`d/dθ1`

- When the slope is negative, the value of θ1 increases and when it is positive, the value of θ1 decreases.

- Adjusting our parameter
α:

to ensure that the gradient descent algorith coverges in a reasonable time.

θ1 := θ1 − α * d/dθ1 * J(θ1)

- if
αis too small, gradient descent can be slow.- if
αis too large, gradient descetnt can overshoot the minimum. It may fail to converege, or even diverge.

- if θ1 is at a local optimum of J(θ1), gradient descent will leave θ1, unchange!
`θ1 := θ1 - α * 0`

- Gradient descent can converage to a local minimum, even with the learning rate α fixed.

### Gradient Descent For Linear Regression : article

When specifically applied to the case of linear regression, a new form of the gradient descent equation can be derived:

- Gradient descent algorithm :

repeat until convergence {

θj := θj − α * ∂/∂θj * J( θ0, θ1 )

(for j=1 and j=0)

}Linear Regression Model :

J( θ0, θ1 ) = 1/2m * ∑ ( hθ * ( x^(i) ) - y^(i) )^2The point of all this is that if we start with a guess for our hypothesis and then repeatedly apply these gradient descent equations, our hypothesis will become more and more accurate.