Handling Skewed Data
Error Metrics for Skewed Classes
Precision / Recall

Precision
(Of all patients where we predicted y = 1, what fraction actually has cancer?)True positives True positives ___________________ = ______________________________ # predicted positive True positive + False Positive

Recall
(Of all patients that actually have cancer, what fraction did we correctly detect as having cancer ?)True positives True positives __________________ = _____________________________ # actual positive True positive + False negative
a classifier of a high precision or high recall actually is a good classifier
if a classifier is getting high precision and high recall, then we are actually confident that the algorithm has to be doing well, even if we have very skewed classes. by 大神 !
Trading Off Precision and Recall
 Trading off precision and recall
 Logistic regression: 0 ≤ hθ(x) ≤ 1
 Predict 1 if hθ(x) ≥ 0.5
 Predict 0 if hθ(x) < 0.5
Higher precision, lower recall 很確切的知道病患有得癌症的機率才告知，避免病患緊張過度!
Higher recall, lower precision  有可能罹患癌症的時候就告知，怕錯過深度觀察或著治療等
 很確切的知道病患有得癌症的機率才告知，避免病患緊張過度!
_以上兩種都可以用各自的觀點解讀唷!
Precision / Recall curve

More generally: Predict 1 if hθ(x) ≥ threshold

different shape :

F1 Score(F score)

How to compare precision/recall numbers?