• sgd:
• 每次 只看一筆資料，走最好的方向
• GD VS. SGD:

• Steps:

• GD: fewer steps
• SGD: more steps
• Computation of each step

• GD: look through all the training instances
• SGD: look onlu one training instance
• Pros and Cons of SGD:

• Pros:
• When the training data is large with some (near) redundant instances, SGD is usually much faster to converge than GD

• Supports online learning: model 比較能反映出 features 和 target variables 相對應得關係

• Sometimes can pass local minimum

• Cons:

• Tends to bouncing around minimum

Close form solution

• 當problem 是 multiple linear regerssion的時候算 θ:

• 可用 close form solution:

Close form solution VS. Gradient Descent

• If the number of features is small, close form solution is probably acceptable

• However, if the number of features is large, using gradient descent is more efficient

• Moreove, gradient descent is capable of solving more complex optimization problem

• in many cases, `∂J(θ)/∂θ = 0` has no closed-from solution…. , But we can still apply gradient descent

