Continuing from the previous post at Introduction-to-machine-learning-1
We examine the training of Machine Learning Algorithms now.
These are done with Training Samples or Training Sets, then tested with Testing Samples or Testing Sets.
Consider the following (fictitious data)
City Temperature Ice Cream Price
Varanasi 40oC ₹ 100
Varanasi 50oC ₹ 200
Varanasi 46oC ₹ 200
Varanasi 44oC ₹ 100
The Algorithm will quickly learn that a temperature less than equal to 44oC means an Ice Cream price of ₹ 100 and above or equal to 46oC means ₹ 200.
We can create Testing Sets and validate the algorithm.
Underfitting happens when the Training Set is inadequate and the Algorithm has no proper answer for certain situations.
What happens if our Alorithm is asked the Ice Cream price at 45oC .The Algorithm would have no proper answer but can make an educated guess. The average of ₹ 100 and ₹ 200 would be a good answer. What answer should the Algorithm give for 10oC?
The Algorithm isn’t well trained for this. This is high bias and low variance.
Tho other case is Overfitting
Consider the following Training Set.
- Who is the highest scorer in Maths?A. Pappu -100, B Appu 34, C Tappu 76
- Who is the lowest scorer in Physics?
A. Appu 76, Tappu 86, Pappu 33
- Who is the highest scorer in Chemistry?
A. Tappu 23, Pappu 79, Appu 10
The answer in each of these cases is Pappu.
So, how does the Algorithm answer the following question?
4. Who is the highest scorer in Geography?
A. Tappu 30, B Pappu 25, C Appu 77
The Algorithm answers Pappu. It thinks that there is some link with the name.
This is Overfitting. One of the solutions is cross validation. Divide the Training Set into parts. Train on one and then validate with others. A data set with different names would uncover the problem in this case.
Regularization is another solution.
This simply means simplification. Remove the names here. Divide each mark by 100 and reduce everything to values between 0 and 1.
This is a form of regression, that constrains/ regularizes or shrinks the coefficient estimates towards zero. In other words, this technique discourages learning a more complex or flexible model, so as to avoid the risk of overfitting.
Feature Selection and Dimensionality Reduction
Essentially meaning that you reduce some of the features being put in the Training Set(fictitious).
Think about this situation where we try and pick a laptop.
Model Name Manufacturer Price Installed Memory Color Warranty
MMX-1 Menovo ₹ 100 5GB Black 1year
……. …… ….. …. ….. ……
We will need to make decisions for all these features. You decide the Manufacturer(options 2). Next pick the price (options 2). This makes 4 options. For a product with n features you will 2n decisions to make.
We can reduce the number of features under consideration, reduce the dimensionality to achieve a faster, better tuned Algorithm.
So, how would you g about implementing a Machine Learning Solution .
- Define the Machine Learning Problem to be solved. Let us say we have e commerce company?
So, how many of how many types to stock?
- Initial Information input. Get initial information from an expert, or go for exploration, market survey etc and get some data
- Get Data from the Information. Process the information. Rectify, classify, discover features, dimensions and create Training Sets and Testing Sets.
- Machine Learning Modeling. Create a Machine Learning Algorithm and train it.
- Machine Learning Algorithm Testing. Test the data and then go back to previous steps if necessary. Test again.
- Deploy the solution.
Some more steps.
- Supply Missing Values
How to deal with missing values. No values for rainfall in a city for some duration. You can get the data from some source and then rectify. If not, try and deduce from available data. Get the mean of neighboring data maybe.
- Encoding of Labels
Go back to our example on Pappu, Tappu etc. Use roll numbers instead of names.
Other type of encoding will be.
Again the data is fictitious here, and we have passing marks 40
Rollno Physics Chemistry Maths
1 45 34 77
2 33 76 98
This can be encoded as1101
Scale all values between a single range. Percentages, or between 0 and 1.
- Specializing or Partitioning
Go back to our E Commerce example. Might need different Machine Learning Algorithms for different geographical areas or products.