昨日 2019/06/24 因為 2019/06/23 的 遊行，在現場淋雨 5小!!! 進度有點順延～ 哈哈 今日努力補回兒：）
確保 gradient descent 正常兒運作 :
- For sufficiently small α, J(θ) should decrease on every iteration.
- But if α is too small, gradient descent can be slow to converge.
- If α is too small: slow convergence.
- If α is too large: J(θ) ￼may not decrease on every iteration and thus may not converge.
好… 說實話！ 不懂．．．這篇主要要做啥 ＸＤＤ 那就先貼比大神的筆記吧～
- We can improve our features and the form of our hypothesis function in a couple different ways.
We can combine multiple features into one. For example, we can combine x1 and x2 into a new feature x3 by taking x1 * x2.
- 就是當 hypothesis function 跑出來的曲線和預測的有出入時，就要來喬事情啦~
We can change the behavior or curve of our hypothesis function by making it a quadratic, cubic or square root function (or any other form).
- EX: if our hypothesis function is
hθ(x) = θ0 + θ1x1
- then we can create additional features based on x1,
to get quadratic function :
hθ(x) = θ0 + θ1x1 + θ2x1^2
- the cubic function
hθ(x) = θ0 + θ1x1 + θ2x1^2 + θ3x1^3
in the cubic version, we have created new features x2 and x3 where x2 = x1^2 and x3 = x1^3
- square root function, we can do:
hθ(x) = θ0 + θ1x1 + θ2√x1
注意瞜！！如果把 features 喬成這副德性 –> features scaling becomes very importnat! ex: if x1 has range 1- 1000 then range of x1^2 becomes 1 - 100000 and that of x1^3 becomes 1 - 1000000000
- then we can create additional features based on x1, to get quadratic function :