Why use artificial intelligence (AI) in your embedded systems?
The idea is gaining ground in all sectors, from aircraft to drones, cars, security equipment, robots, industrial sensors and biomedical monitors. In the latter systems involving human life, every millisecond counts. By processing the data close to the sensors that produce it, network latency times are avoided, in-responsiveness and safety rise up.
Problems to solve
- How can embedded systems benefit from machine learning?
- How can IoT make predictions for their applications? For instance, how to predict remotely if a person is likely to have a risk of cardiovascular diseases?
- Can machine learning help in this matter? How accurate predictive models can be used to detect such threats? How easily can these models be embedded in a system?
Benefits of TADA
Domain & data experts are not data scientists. They may not have the required skills in machine learning nor coding to build predictive models. Moreover, most data handled by these professionals are Small Data, meaning that their historical data often contains few hundreds or thousands of patients but rarely millions (aka Big Data). Traditional machine learning tools work well with Big Data but do not perform well with Small Data.
MyDataModels allows domain experts to automatically build predictive models from Small Data. They can use their raw data directly: no normalization, outliers handling nor feature engineering are required. Thanks to this limited data preparation, the results from this specific dataset were obtained with a few clicks in less than a minute on a standard laptop.
Thanks to its light weight (2Ko), a TADA model can be embedded into devices microcontrollers.
Therefore, in order to use a model, a device does not need to compute the data into the cloud but can do it locally in the edge environment.
Edge computing allows lower latency and reduced costs with a greater reliability, as opposed to a more traditional cloud computing approach using APIs.
By processing the data close to the sensors that produce it, network latency times are avoided, with important gains in responsiveness and safety.
No concern for data security either: in applications such as predictive maintenance & healthcare, manufacturers may be unwilling to put their production data in the cloud for analyze.
MyDataModels brings a self-service solution for those who have Small Data and no data scientists.
Now, the medical and healthcare world can combine the use of IoT and machine learning to detect health problems more accurately and faster than ever.
This symbiotic use makes it possible to train the model in the cloud or in a datacenter while using it in the embedded system.
The main benefit of this mode of operation is that you can take advantage of all the computing power of the cloud and advanced machine learning techniques to build models while being able to execute them remotely. Taking a decision, monitoring or raising an alert locally on an embedded device within seconds may avoid serious health complications or even save lives.
Automated Machine Learning solutions consist of predicting the future with historical data. To predict a future result, you must bring your descriptive data and the past result obtained.
TADA allows you to simply create a relevant predictive model from your data and apply it to future data.
In this case, the descriptive data are patient’s information.
The goal of the dataset is to predict if a patient has a cardiovascular problem or no: it’s a binary task (True/False).
To generate a model, the steps are the following:
- Create your project and load your data as a CSV table (with data in rows and variables in columns).
Select the variable you want to predict, called Goal.
In this case, the Goal is the variable "Class Pathology" (a visualization of the variable is provided).
Select your data for the model generation. This step is called "Creating the Variable set" and allows you to manually select the descriptive variables you want to use. By default, they are all selected.
TADA identifies the relevant descriptive variables by itself, which affects the calculation time required to create the model.
The fewer variables selected, the faster the model creation
Create your model.
At creation, default values are proposed to you: Name of models, Population, Iteration. You only need to validate the default values to start model generation.
‘Best practices’ are at your disposal to guide you in the choice of these parameters.
Depending on the size of the file this step can take between a few seconds and ten minutes.
Once the model is created, you can see the results of the model using metrics and charts so you can judge its relevance.
To apply a model that you think is relevant, you can:
- Retrieve the associated mathematical formula and apply it (for instance on Excel)
- Retrieve the source code of the formula and use it by yourself (Valid only on TADA paying offers). The source code is available in R, Java, C ++ and soon Python.
- In order to use our "Predict" feature on the product, you will have to upload your file containing the data to be predicted. You will be returned a downloadable file containing the given data, with
the calculated predictions.
The screenshot below shows an extract of the dataset.
Each row is an electrocardiogram signal (ECG) coming from a patient, and each column is a variable which can be used in the model.
- Task type: Binary Classification
- Number of columns: 25
- Number of rows: 240
- Target: class Pathology (T, F)
- Weight: Positive class (T = True Pathology) 17%, Negative class (F = False Pathology) 83%
The Variables are:
1) Item #, from 1 to 240
2) Pathology: target variable, F = False = no pathology detected, T = True = pathology detected
3) => 23) Alarms from electrocardiogram (ECG) signal, heart and breathing rates, temperature, walking speed, etc.
Given the unbalance in the target dataset, instead of evaluating the model using its accuracy, a better way to assess the performance of the model is to look at the sensitivity (True Positive Rate - TPR).
Since the goal “Pathology” is the positive class, the sensitivity can be seen as the probability that the test is positive given that the patient is sick.
The results of the model are available following the generation of the model.
They present the performance of the predictive model.
The type of predictive model and the measurement indicators of the associated model are related to the Goal (Variable to be predicted) and the values of this variable.
The type of model you make is shown on the model results display.
According to the type of the Goal (in our case, the Goal is "Target"), we can make three types of predictions:
- Binary classification: Discrete value taking only two values (yes / no for instance)
- Multiclass classification: Discrete value taking more than two values (for instance a status of state with values like: On, Risk of breakdown, Down, etc.)
- Regression: Continuous value that can take an infinite number of values (a temperature, a pressure, a turnover, the price of a house, etc.)
At the generation of the model and according to the practices and state of the art of Machine Learning, your dataset will be divided into three parts by TADA:
- A training part which represents 40% of your dataset, it allows to train a certain number of formulas,
- A validation part, which represents 30% of your dataset, which validates and selects the best formulas found in the previous step,
A test part which represents the last 30% of the model and which corresponds to the test of the formulas approved by the preceding stage. The performance measurement and the evaluation of your model should mainly be done on this partition (Standard and state of the art of Machine Learning) because the present data were not used in the learning and validation phase of the machine learning model and serve just to measure its performance.
ACC (Accuracy) represents the overall accuracy rate of the model, it is the percentage of classes that are well distributed (here we have 79.17% predictions that are correct)
TPR (True Positive Rate) represents the accuracy rate of the prediction of the positive class, i.e. of the "yes/T" class
TNR (True Negative Rate) represents the accuracy rate of the prediction of the positive class, i.e. of the "No/F" class
MCC (Matthew's Correlation Coefficient) represents the good prediction as a whole, that is, if we were able to divide the predictions between the two classes.
Here, the confusion matrix represents a visual way to interpret the metrics.
In this case, TADA predicted 24 times that a patient has a pathology and was wrong 14 times (TADA predicted that 14 patients have a pathology, while they actually don’t).
In parallel, TADA predicted 48 times that a patient doesn’t have a pathology and was wrong 1 time (TADA missed only 1 pathology).
Ready to use TADA?
You don't have immediate data?
No problem, data are available to make your trial as relevant as possible!Try it now!