Models of equipment failures are based on data referencing observations of past machine runs and failures. We can use Machine Learning approaches to model current situations so that we can predict and anticipate machine failures and schedule maintenance preemptively.
Problems to solve
How to detect when a machine is going to break?
How to anticipate maintenance and prevent downtime?
How to move from preventive to predictive maintenance?
Can machine learning help in these matters and how accurate predictive models can be to predict failures?
Benefits of TADA
Manufacturing, Maintenance and Operation Managers could benefit from predictive models, but they are not data scientists and may not have the required skills in machine learning nor coding experience to build them. Even if data handled by these professionals could be considered as Big Data (data from sensors for instance) the risks they want to predict are Small Data, and more specifically failures.
In this case, historical data contains at most a few hundreds of failures but rarely thousands or millions (as in Big Data). Traditional machine learning tools work well with Big Data but do not perform for prediction of Small Data within Big Data (unbalanced dataset).
MyDataModels allows domain experts to automatically build predictive models from Small Data. They can use their raw data directly: no normalization, outliers handling nor feature engineering are required. Thanks to this limited data preparation, the results from this specific dataset were obtained with a few clicks in less than two minutes on a standard laptop.
MyDataModels brings a self-service solution for those who have Small Data and no data scientists.
Manufacturers are constantly under pressure to stay competitive by optimizing processes, improving efficiency of aging infrastructure, reducing unplanned downtime, sudden failures and maintenance costs.
A CXP Group study found that 95% of companies describe their current maintenance processes as not very efficient. As of now, production managers and machine operators operate on scheduled maintenance to prevent downtime. Unfortunately, 50% of these preventive maintenance activities are ineffective.
In this failure detection use case, the results obtained from MyDataModels’ predictive models are more than helpful with a 96% accuracy rate.
By using an automated machine learning solution like TADA, companies can now proactively identify problems by running a root cause analysis and push fixes including spare-parts, software, hardware and firmware to eliminate possible points of failure or degraded performance that end-users could experience – ultimately increasing customer satisfaction and competitive advantage.
Automated Machine Learning solutions consist of predicting the future with historical data. To predict a future result, you must bring your descriptive data and the past result obtained.
TADA allows you to simply create a relevant predictive model from your data and apply it to future data.
In this case, the descriptive data are machines’ information on their current state of working.
The goal of the dataset is to predict if a machine is broken or no, it’s a binary task (yes/no).
To generate a model, the steps are the following ones:
- Create your project and load your data as a CSV table (with data in rows and variables in columns).
- Select the variable you want to predict, called Goal. In this case, the Goal is the variable "Target" (a visualization of the variable is provided).
Select your data for the model generation. This step is called "Creating the Variable set" and allows you to manually select the descriptive variables you want to use. By default, they are all selected.
TADA identifies the relevant descriptive variables by itself, which affects the calculation time required to create the model.
The fewer variables selected, the faster the model creation
Create your model.
At creation, default values are proposed to you: Name of models, Population, Iteration. You only need to validate the default values to start model generation. ‘Best practices’ are at your disposal to guide you in the choice of these parameters.
Depending on the size of the file this step can take between a few seconds and ten minutes.
Once the model is created, you can see the results of the model using metrics and charts so you can judge its relevance.
To apply a model that you think is relevant, you can:
- Retrieve the associated mathematical formula and apply it (for instance on Excel)
- Retrieve the source code of the formula and use it by yourself (Valid only on TADA paying offers). The source code is available in R, Java, C ++ and soon Python.
- In order to use our "Predict" feature on the product, you will have to upload your file containing the data to be predicted. You will be returned a downloadable file containing the given data, with
the calculated predictions.
The screenshot below shows an extract of the dataset.
Each row is a machine and each column is a variable.
This dataset comes from a company that uses many machines to build final products. As production is stopped every time a machine has a failure, management would like to create a predictive model that finds which machine is going to fail next.
As we explored the data, we understood that the company is using 1000 machines. On average, these machines have a failure every 55 weeks. Some of these machines are brand new, others have been running for almost two years. In our dataset, almost 40 % of the machines had a failure in the past two years.
Task type: Binary Classification
Number of columns: 9
Number of rows: 1.000 samples
Target variable: (broken) Machine actually broken? yes/no.
Weight: Positive class (broken) 40%, Negative class: 60%
The variables are:
- Machine nbr: from 1 to 1000
- “lifetime” indicates number of weeks since the machine has been used
then we have 3 numeric variables related to
and 2 variables related to
- The team using the machine
- The machine’s provider.
“broken” which is our Goal (Yes or No)
ResultsThe results of the model are available following the generation of the model.
They present the performance of the predictive model.
The type of predictive model and the measurement indicators of the associated model are related to the Goal (Variable to be predicted) and the values of this variable.
The type of model you make is shown on the model results display.
According to the type of the Goal (in our case, the Goal is "Target"), we can make three types of predictions:
- Binary classification: Discrete value taking only two values (yes / no for instance)
- Multiclass classification: Discrete value taking more than two values (for instance a status of state with values like: On, Risk of breakdown, Down, etc.)
- Regression: Continuous value that can take an infinite number of values (a temperature, a pressure, a turnover, the price of a house, etc.)
At the generation of the model and according to the practices and state of the art of Machine Learning, your dataset will be divided into three parts by TADA:
- A training part which represents 40% of your dataset, it allows to train a certain number of formulas,
- A validation part, which represents 30% of your dataset, which validates and selects the best formulas found in the previous step,
- A test part which represents the last 30% of the model and which corresponds to the test of the formulas approved by the preceding stage. The performance measurement and the evaluation of your model should mainly be done on this partition (Standard and state of the art of Machine Learning) because the present data were not used in the learning and validation phase of the machine learning model and serve just to measure its performance.
ACC (Accuracy) represents the overall accuracy rate of the model, it is the percentage of classes that are well distributed (here we have 95.67% predictions that are correct)
TPR (True Positive Rate) represents the accuracy rate of the prediction of the positive class, i.e. of the "yes" class
TNR (True Negative Rate) represents the accuracy rate of the prediction of the positive class, i.e. of the "No" class
MCC (Matthew's Correlation Coefficient) represents the good prediction as a whole, that is, if we were able to divide the predictions between the two classes.
Here, the confusion matrix represents a visual way of interpreting the metrics.
In this case, TADA predicted 119 times that a machine was broken, and was wrong 5 times (TADA put a flag on 5 correctly running machines).
In parallel, TADA predicted 181 times that a machine was not broken and was wrong 8 times (TADA missed 8 machine failures).
Ready to use TADA?
You don't have immediate data?
No problem, data are available to make your trial as relevant as possible!Try it now!