Photo OCR (Optical Character Recognition)
Photo OCR: Problem Description and Pipeline
- OCR: Optical Character Recognition
OCR example: Sliding Windows
Artificial data synthesis
Dissussion on getting more data
-
Make sure you have a low bias classifier before expending the effort. (Plot learnign curves) E.g. keep increasing the number of features/number or hidden units neural network until you have a low bias classifier.
-
“How much work would it be to get 10x as much data as we currently have?”
- Artificial data synthesis
-
Collect / label it yourself
# 有時候就真的靜下好好 label 一番, # 仔細算也不過一兩天(幾小時)的事情, # 卻可以讓模型變成好棒棒的兒~ # # ex: M = 1,000 筆數 # 人工 label 一筆 10 秒 # 總共花 1,000 * 10 秒
- “Croed source” (E.g. Amazon Mechanical Turk)
Ceiling analysis: What part of the pipeline to work on next
Face Recognition Example
Conclusion: Summary and Thank you
我好棒棒阿!!!!!!!!!!!!
Supervise Learning
- Linear regression, logistic regression, neural networks, SVMs
Unsupervised Learning
- K-means, PCA, Anomaly detection
Special applications/special topics
- Recommender systems, large scale machine learning
Advice on building a machine learning system
- Bias/variance, regularization; deciding what to work on next: evaluation of learning algorithms, learning curves, error analysis, ceiling analysis