Machine Learning Algorithms – Regression Metrics

regression metrics image

Regression metrics have been very useful to assess the quality of models while optimizing temporary hiring or identifying key Customer Satisfaction drivers. Basically, when you try to predict a continuous value – a number, to put it bluntly – you use a regression model. So, let’s dive deeper to understand how we can assess the quality of regression models.

Understanding Errors to build regression metrics

Errors play a major role in calculating regression metrics to evaluate a model quality.

Once a regression model is generated, it uses part of the original dataset to predict outcomes and compare them to the actual ones. What we call “errors” is the positive or negative difference between these predictions and the actual values.

Errors for Regression Metrics

Each dark blue dot on the diagonal graph is an actual output linked to the input values while light blue dots are the values predicted by the regression model using the same input values. The difference between these two points is an error. 

When building a regression model, we are attempting to reduce the error an algorithm does. To do that, we select a function to measure the error, also called cost function. Then, the different Regression Metrics used to assess the prediction results are :

  • Mean Absolute Error (MAE),
  • Mean Absolute Percentage Error (MAPE),
  • Mean Squared Error (MSE),
  • Root-Mean-Squared-Error (RMSE),
  • Maximum Error (ME),
  • R² or Coefficient of Determination.

Mean Absolute Error

In Regression Metrics, Mean Absolute Error (MAE) is the average of the absolute differences between the actual value and the model’s predicted value.

formula to measure MAE

The bigger the MAE, the more critical the error is. The MAE unit is the same as the predicted variable unit, i.e., a distance is estimated in km, a weight in kilograms. 

Therefore, the MAE cannot compare regression models’ performance for distinct categories of data. It is robust to outliers, i.e., extreme values. Hence, it is not suitable for applications where you want to pay more attention to these outliers.

Mean Absolute Percentage Error

Mean Absolute Percentage Error (MAPE) is the average absolute difference between the actual value and the value predicted by the model divided by the real value. 

MAPE formula

Its usage is comparable to MAE. Since it is a percentage, it allows comparison between regression models designed for diverse categories of data. It does not give a specific focus to outliers. However, in some cases, we want to use a cost function that emphasizes outliers. 

Mean Squared Error (MSE) and Root Mean Squared Error (RMSE)

MSE or Mean Squared Error is one of the most popular regression metrics. It is merely the average of the real value’s squared difference with the regression model’s predicted value. 

Mesuring the MSE

As it squares the differences, it is harder on outliers, leading to over-estimating how bad the model is. Its unit is the square of the variable’s unit.

The RMSE or Root Mean Squared Error is the average root-squared difference between the real value and the predicted value. Its use is similar to the MSE.

formula to mesure RMSE

Maximum Error

ME or Maximum Error is the absolute value of the most significant difference between a predicted variable and its real value. 

calcul of the max error

It is interesting to spot out rapidly how well the model integrates outliers. Typically, if the Maximum Error is much bigger than the RMSE, it might mean that the model has not correctly predicted outliers. 

R² or Coefficient of Determination – the king of Regression Metrics

R² or Coefficient of Determination is a prevalent metric that uses two mean squared error calculations. While the former is the mean square of each real value versus the average of observations, the latter is the mean squared error of the actual value versus the predicted one.

formula to mesure R2 - regression metrics

R² is the one complement of the ratio between these two MSE. R² score ranges from -∞ to 1. The closest to 1 the R², the better the regression model is. If R² is equal to 0, the model is not performing better than a random model. 

If R² is negative, the regression model is erroneous. Therefore this last Machine Learning Metric is an excellent tool to evaluate the efficiency of a regression model. 

Regression Metrics In Business

Knowing regression metrics helps better assess your predictive models’ performance when it comes to business questions such as Price optimization, Marketing budget allocation or similar topics where continuous values must be predicted and optimized. If these topics ring a bell, feel free to contact us!

Share

Start making sense of  your data

Test easily Quaartz with our test data here

You might also like...

MyDataModels France Digitale

MyDataModels among France Digitale’s top AI startups!

The CIAR Project: Frugal Data Analysis for Process Optimization

Frugal Data Analysis for Process Optimization

BlueGuard - Accurate Classification Models for Accelerated Decision-Making

BlueGuard – Building Highly Accurate Classification Models

How Small Data Predicts Delivery Delays

How Small Data Predicts Delivery Delays