less than 1 minute read

Tags: ,

Photo OCR: Problem Description and Pipeline

  • OCR: Optical Character Recognition

Imgur

Imgur

Imgur

OCR example: Sliding Windows

Imgur

Imgur

Imgur

Imgur

Imgur

Imgur

Imgur

Artificial data synthesis

Imgur

Imgur

Imgur

Imgur

Dissussion on getting more data

  1. Make sure you have a low bias classifier before expending the effort. (Plot learnign curves) E.g. keep increasing the number of features/number or hidden units neural network until you have a low bias classifier.

  2. “How much work would it be to get 10x as much data as we currently have?”

    • Artificial data synthesis
    • Collect / label it yourself

       # 有時候就真的靜下好好 label 一番,
       # 仔細算也不過一兩天(幾小時)的事情,
       # 卻可以讓模型變成好棒棒的兒~ 
       #
       # ex: M = 1,000 筆數
       #     人工 label 一筆 10 秒
       #    總共花 1,000 * 10 秒
      
    • “Croed source” (E.g. Amazon Mechanical Turk)

Ceiling analysis: What part of the pipeline to work on next

Imgur

Imgur

Imgur

Face Recognition Example

Imgur

Imgur

Imgur

Conclusion: Summary and Thank you

我好棒棒阿!!!!!!!!!!!! :heart:

Supervise Learning

  • Linear regression, logistic regression, neural networks, SVMs

Unsupervised Learning

  • K-means, PCA, Anomaly detection

Special applications/special topics

  • Recommender systems, large scale machine learning

Advice on building a machine learning system

  • Bias/variance, regularization; deciding what to work on next: evaluation of learning algorithms, learning curves, error analysis, ceiling analysis