Using Large Data Sets

July 29, 2019 1 minute read

瘋狂的一週開始! 很多要補完! 速度上工拉~!

Data For Machine Learning

Dsigning a high accuracy learning system

研究這主題的大神 : Banko and Brill 2011

E.G. Classify between confusable words.
   {to, two, too}, {then, than}
For breakfast I ate _____ eggs.

Perceptron (Logistic regression)
Winnow
Memory-based
Naive Bayes
“It’s not who has the best algorithm that wins. It’s who has the most data.”
So, When is this true and when is this not true?

Useful test: Given the input x, can a human expert confidently predict y ?

Use a learning algorithm with many parameters (e.g. logistic regression/linear regression with many features; neural network with many hidden units)

Use a very large training set(unlikely to overfit)

Imgur

The large training set is unlikely to help when:

The features x do not contain enough information to predict y accurately (such as predicting a house’s price from only its size), and we are using a simple learning algorithm such as logistic regression
The features x do not contain enough information to predict y accurately (such as predicting a house’s price from only its size), even if we are using a neural network with a large number of hidden units.