1 minute read

Tags:

對的事情做,不對的事情不要做,認真做。

每日一 seafood :fish: ~~~~

Prioitizing What to Work On : article

Machine learning System Design

  • Bulding a spam classifier as an example:

    • Supervised learning.

       x = features of email.
       y = spam(1) or not spam(0)
       Features x: Choose 100 words indicative of spam/not spam
            
       # Note: 
       In practice, take most frequently occurring n words (10,000 to 50,000) 
       in training set, rather than manually pick 100 words.
      
       ex:
          deal, buy, discont, now, andrew, ...
      
               | 0 | andrew
               | 1 | buy
               | 1 | deal
          x =  | 0 | discount
               | . | .
               | . | .
               | 1 | now
               | . | .
               | . | .          ,  xj = { 1 if word j appears in email
                                        { 0 otherwise
        --------------------------------------------
         From: cheeapsales@buystufffromme.com
         To: ang@cs.stanford.edu
      
         Deal of the week! Buy now!
      
    • improving the accuracy of this classifier

      • Collect lots of data
      • Develop sophisticated features (ex: using email header data in spam emails)
      • Develop algorithms to process your input in different ways (ex: recognizing misspellings in spam)

      it is difficult to tell which of the options will be most helpful

Error Analysis : article

Recommended approach

  • Start with a simple algorithm that you can implement quickly. Implement it and test it on your cross-validation data.
  • Plot leraning curves to decide if more datra, more features, ect. are likely to help
  • Error analysis: Manually examine the examples (in cross validation set) that your algorithm made errors on. See if you spot any systematic trend in what type of examples it is making errors on.

Error Analysis Imgur

  • Imgur

VERY IMPORTANT: to get error results as a single, numerical value. Otherwise it is difficult to assess your algorithm’s performance.