Handling Skewed Data
Error Metrics for Skewed Classes

Precision / Recall

-
Precision
(Of all patients where we predicted y = 1, what fraction actually has cancer?)True positives True positives ___________________ = ______________________________ # predicted positive True positive + False Positive -
Recall
(Of all patients that actually have cancer, what fraction did we correctly detect as having cancer ?)True positives True positives __________________ = _____________________________ # actual positive True positive + False negative
a classifier of a high precision or high recall actually is a good classifier
if a classifier is getting high precision and high recall, then we are actually confident that the algorithm has to be doing well, even if we have very skewed classes. by 大神 !
Trading Off Precision and Recall
- Trading off precision and recall
- Logistic regression: 0 ≤ hθ(x) ≤ 1
- Predict 1 if hθ(x) ≥ 0.5
- Predict 0 if hθ(x) < 0.5
Higher precision, lower recall- 很確切的知道病患有得癌症的機率才告知,避免病患緊張過度!
Higher recall, lower precision - 有可能罹患癌症的時候就告知,怕錯過深度觀察或著治療等

- 很確切的知道病患有得癌症的機率才告知,避免病患緊張過度!
_以上兩種都可以用各自的觀點解讀唷!
Precision / Recall curve
-
More generally: Predict 1 if hθ(x) ≥ threshold

-
different shape :

-
F1 Score(F score)
-
How to compare precision/recall numbers?
