After the process of segregating and compiling data and setting up the problem, we are supposed to apply the necessary algorithms and tools to solve the problem. How do we do that?

Machine Learning provides the answers to that. However, before getting the answers a whole lot of time has to be invested in choosing the right algorithm to get effective and speedy answers to tedious problems.

More than one metric should be tested because one may give desirable results while the other may sabotage the model with its findings. Thus, theoretical knowledge will always help evaluate the model better. A machine will always produce an outcome and we have no idea it is the correct one or not unless someone hints that out in our model. However, a thoughtful evaluation may help us catch the problem and avoid making silly mistakes with grave results. Evaluation is thus important we just cannot do away with. It is better to avoid the shadow of a doubt when the theory of Relativism works so strongly when we deal with real-life data which needs possible human intervention to be analyzed so that machines do not give a purely mechanical answer.

**How to Evaluate Machine Learning Algorithms**

It is always better to choose a set pattern to evaluate the problem so that you do not deviate from any important concerns. Test Harness is a spot-checking measure that gives a fair measure of the worthiness of the data set. Further involved in the selection of test and training datasets and various measures of performance to make the problem meaningful and insightful.

**Test Harness –**To check the effectiveness of our particular answer, we need to run a test harness to think deeply and make insightful interpretations about the problem. The estimation of the problem set by applying various algorithms to it will make for a better harness. A quick insight into the learnability of the problems will indicate whether we should carry on the problem and its evaluation further. It implies the beneficial structure of the problem which can be easily read and learned by the machine. Test Harness could also use cross-validation to estimate the working effectiveness of the chosen algorithm.

**The measure of Problem Solving ability**– A better algorithm will always put forward an easy way to solve a given problem. Classification, regression analysis, and clustering are some of the methods which help to create meaningful results. It is not that challenging to find a good performance measure.

**Sets of Training and Tests –**Instead of testing the whole data we may use a proportion of the whole data and divide it among testing and training. The sample should be a good representative of your population. It is not necessary to train the entire model on the given dataset.

**Algorithm Test –**After choosing an algorithm based on various parameters, it is always useful to test it and see the structure it generates and whether the machine is capable of learning and analyzing the given dataset and its provided structure. Spot checking is one of the ways to do that.

**Metrics generally used to evaluate models**

**Classification metrics –**The outcomes make up for four types being the ‘true positives’, ‘true negatives’, ‘false positives’ and ‘false negatives’ when placed on a matrix commonly known as the confusion matrix solve the cases of binary classifications. It gives an evaluation of the complete working and performance statistics of the given model. Each prediction will be then evaluated as being an indication of either of these outcomes.

Precision and Recall are the three main metrics used to evaluate models of classification. In cases the data is uneven, precision and recall are more useful. If those metrics are combined, the f-score is evaluated. A greater F- score is always preferred in case of the same number of independent variables in the model.

Metrics, however, are not the most effective and we have the possibility and availability of better measures to do the same. They might give us a false sense of hope and an incorrect idea of accuracy.

**Regression metrics –**These evaluation metrics are very different than the classification models. Instead of a discrete range, we are confining ourselves to continuous data in this case. Variance, R-squared (Coefficient of Determination), Adjusted R-squared, Mean Squared Error, Mean Absolute Error, etc. are some of the regression tools applied to the analysis of data. Adjusted R-squared is preferred over normal R-squared due to the calculation of marginal rather than absolute improvement in the former. MSE is said to be advantageous as compared to MAE due to easy computation of the gradient We might be interested in realizing that there ultimately lies a difference between bias and variance which has to be very clear from the beginning. Overfitting and underfitting have to checked and ruled out.

**The area under Curves –**Binary classification problems again find extensive use of this mechanism. Generalized validation curves help us do the same and find the spot or balance in between. Learning curves are a second tool for estimating differences between bias and variance. We learn about the sensitivity and specificity of the data using these measures.