# 機器學習 - parameter learning

Tags:

• Gradient Descent —> to minimize the J(θ0, θ1)

Outline :
keep changing θ0, θ1 to reduce J(θ0, θ1) until we hopefully end up at a minimum

``````Have some function J(θ0, θ1)
Want min[θ0, θ1]  J(θ0, θ1)
``````

``````repeat until convergence {
θj := θj − α * ∂/∂θj * J(θ0,θ1)
}
(for j = 0, and j = 1)
``````
• Assignment : [ computer operation ]
``````a := b
a := a + 1
``````
• Truth assertion :
``````a = b
``````
• α : Learngin rate

θj := θj − `α` * ∂/∂θj * J(θ0,θ1)

• ∂/∂θj * J(θ0,θ1) : derivative term

θj := θj − α * `∂/∂θj * J(θ0,θ1)`

Simultaneous update :

• Simultaneously update θ0 and θ1 : `θj := θj − α * ∂/∂θj * J(θ0,θ1)` ### Gradient Descent Intuition : article

Simplified:

• Repeat until convergence :
`````` θ1 := θ1 − α * d/dθ1 * J(θ1)
``````

slope :

``````d/dθ1​
``````
• When the slope is negative, the value of θ1 increases and when it is positive, the value of θ1 decreases. • Adjusting our parameter α :
to ensure that the gradient descent algorith coverges in a reasonable time.
θ1 := θ1 − α * d/dθ1 * J(θ1)
• if α is too small, gradient descent can be slow.
• if α is too large, gradient descetnt can overshoot the minimum. It may fail to converege, or even diverge. • if θ1 is at a local optimum of J(θ1), gradient descent will leave θ1, unchange!
``````θ1 := θ1 - α * 0
``````

• Gradient descent can converage to a local minimum, even with the learning rate α fixed. ### Gradient Descent For Linear Regression : article

• When specifically applied to the case of linear regression, a new form of the gradient descent equation can be derived:

repeat until convergence {
θj := θj − α * ∂/∂θj * J( θ0, θ1 )
(for j=1 and j=0)
}
• Linear Regression Model :
J( θ0, θ1 ) = 1/2m * ∑ ( hθ * ( x^(i) ) - y^(i) )^2

• • • The point of all this is that if we start with a guess for our hypothesis and then repeatedly apply these gradient descent equations, our hypothesis will become more and more accurate.

Tags:

Updated: