Back

Health, Operations

Pathology prediction for patient orientation

disease-prediction.jpg

How can AI help pathology identification and improve patient treatment?

As patients’ number increases, health care professionals have less and less time to determine what pathology they suffer, which means that medical error risk increases.

Using Artificial Intelligence to build predictive models out of their existing data could help health institutions to better identify their patients’ pathologies and take better care all along their care path.

Problems to solve

  • How to identify as quickly as possible the pathology of which the patient suffers?
  • How to help health care professionals in improving their patients’ treatment?

Benefits of TADA

Health care professionals could use predictive models to help them diagnose pathologies. However, they are not Data Scientists and don’t have the right skills in Machine Learning and code to build predictive models.

Healthcare professionals handle a small quantity of data that is essentially their patients’ health data and is considered as Small Data. Traditional Machine Learning tools don’t handle Small Data well.

In that context, MyDataModels offers TADA, a solution to help healthcare professionals such as doctors, researchers, nurses, etc. automatically create predictive models out of their Small Data.

No Data science knowledge is required to use TADA. Healthcare professionals can use their own data without normalization or preprocessing and get convincing results in less than a minute.

MyDataModels brings a self-service solution for those who have Small Data and no data science knowledge.

Conclusion

Correctly guiding patients and optimizing care paths will be among the biggest challenges that healthcare will have to deal with in the next years. Indeed, ER and call centers receive an increasing number of requests that they handle with increasing difficulty.

Using TADA and asking a few questions to the patients would help them determine with great precision what pathology they suffer and act quicker so that they can reduce waiting times and improve patient handling.

TADA will never replace healthcare professionals’ expertise, but it can help them to diagnose better, work better and treat their patients better.

Case Study

Solution

Automated Machine Learning tools help users to predict the future thanks to historical data. To predict a future result, you must compile your descriptive data and the past results obtained.

TADA allows you to easily create a relevant predictive model from your data and apply it to future data.

In this use case, the descriptive data is a list of 41 pathologies that are defined by 132 symptoms. TADA will try and predict the right pathology according to a combination of symptoms.

You can generate a model in just 4 steps: 

  • Step 1: create your project and upload your data as a CSV file (with data in rows and variables in column).

  • Step 2: Select the variable you want to predict, called “Goal”. In this use case, the goal is the “prognosis” variable.

  • Step 3: Select your data for the model generation. This step is called "Creating the Variable set" and allows you to manually select the descriptive variables you want to use. By default, they are all selected.

TADA identifies the relevant descriptive variables by itself which affects the calculation time required to create the model.

The fewer variables selected the faster the model creation.

  • Step 4: Create your model. When creating your model, some default values are proposed for the name of the model, the size of the population and the number of iterations.

You can start your model generation by validating the default values or editing them according to your preferences. You’ll find best practices at your disposal to guide you in the choice of these parameters in the TADA UI.

According to the size of the file, this step can take between a few seconds and ten minutes. Once the model created, you have access to metrics and graphs to evaluate its relevance.

How can we go further?

You have various options to put your model into practice:

  • Use the « Predict » feature of TADA: upload a CSV file with the data to predict. In return, TADA will generate a CSV file with the calculated predictions.

  • Retrieve the associated mathematical formula and apply it (for instance on Excel).

  • Retrieve the source code of the mathematical formula and use it on your own apps. The source code is available in R, Java, C++ and Python soon. (This option is only available in TADA Premium and Pro).

Dataset Information

The below screenshot comes from public data of patients. Each row is a patient and each column are a symptom that is considered as a variable. All variables are binary and indicate if the symptom was present with value 1 or was not present with value 0.

dataset.png

The following 132 variables are present in the dataset:

itching; skin_rash; nodal_skin_eruptions; continuous_sneezing; shivering; chills; joint_pain; stomach_pain; acidity; ulcers_on_tongue; muscle_wasting; vomiting; burning_micturition; spotting_; urination; fatigue; weight_gain; anxiety; cold_hands_and_feets; mood_swings; weight_loss; restlessness; lethargy; patches_in_throat; irregular_sugar_level; cough; high_fever; sunken_eyes; breathlessness; sweating; dehydration; indigestion; headache; yellowish_skin; dark_urine; nausea; loss_of_appetite; pain_behind_the_eyes; back_pain; constipation; abdominal_pain; diarrhea; mild_fever; yellow_urine; yellowing_of_eyes; acute_liver_failure; fluid_overload; swelling_of_stomach; swelled_lymph_nodes; malaise; blurred_and_distorted_vision; phlegm; throat_irritation; redness_of_eyes; sinus_pressure; runny_nose; congestion; chest_pain; weakness_in_limbs; fast_heart_rate; pain_during_bowel_movements; pain_in_anal_region; bloody_stool; irritation_in_anus; neck_pain; dizziness; cramps; bruising; obesity; swollen_legs; swollen_blood_vessels; puffy_face_and_eyes; enlarged_thyroid; brittle_nails; swollen_extremeties; excessive_hunger; extra_marital_contacts; drying_and_tingling_lips; slurred_speech; knee_pain; hip_joint_pain; muscle_weakness; stiff_nec; swelling_joints; movement_stiffness; spinning_movements; loss_of_balance; unsteadiness; weakness_of_one_body_side; loss_of_smell; bladder_discomfort; foul_smell_of; urine; continuous_feel_of_urine; passage_of_gases; internal_itching; toxic_look_(typhos); depression; irritability; muscle_pain; altered_sensorium; red_spots_over_body; belly_pain; abnormal_menstruation; dischromic_patches; watering_from_eyes; increased_appetite; polyuria; family_history; mucoid_sputum; rusty_sputum; lack_of_concentration; visual_disturbances; receiving_blood_transfusion; receiving_unsterile_injections; coma; stomach_bleeding; distention_of_abdomen; history_of_alcohol_consumption; fluid_overload; blood_in_sputum; prominent_veins_on_calf; palpitations; painful_walking; pus_filled_pimples; blackheads; scurring; skin_peeling; silver_like_dusting; small_dents_in_nails; inflammatory_nails; blister; red_sore_around_nose; yellow_crust_ooze; prognosis.

Model type: multiclass classification
Column number: 132
Row number: 3444
Goal : Prognosis

Results

The results show how the predictive model performs.

The predictive model type and its metrics are linked to the Goal and its values. The model type is shown on the model results display.

Three types of prediction can be done according to the Goal data. Here, our goal is “prognosis”:

  1. Binary classification: a discrete value taking only two values, such as Yes/No.

  2. Multiclass classification: a discrete value with more than two values, such as status of state with values like “On”, “At Risk”, “Down”, etc.

  3. Regression: a continuous value that can take an infinite number of values, such as a temperature, a pressure, a turnover or the price of a house.

When generating the model and according to the state of the art of Machine Learning, TADA will divide your dataset in three parts:

  • Part 1: A Training part which represents 40% of the data and is used to train a certain number of models,

  • Part 2: A Validation part which represents 30% of the data and is used to validate and select the best models found in the previous step,

  • Part 3: A Test part which represents 30% of the data and is used to test the model approved during the validation step. 

The performance measurement and the model evaluation must be done on the Test part (according to Machine Learning standards) as the data used during this phase was not used to build the model and is just used to measure its performance.

metrics.png

The above metrics values go from 0 to 1, with 1 being a perfect prediction.

Here, every metric is around 0.95, which shows the great quality of prediction of the model.

Accuracy (ACC) shows that the model is right for 95.16% of the predictions.

Matthew’s Correlation Coefficient (MCC) shows that no class is overvalued and that pathologies are correctly identified and allocated.

Confusion_matrix1.png

Confusion_matrix2.png

The good results shown in green in the confusion matrix are a consequence of the good quality of data and the good class allocation.

Glossary

Accuracy (ACC) is the overall accuracy rate of the model: it is the percentage of classes that are well distributed (here, 95.16% predictions are correct)

Matthew’s correlation coefficient (MCC) is an indicator of the general quality of the model and shows the quality of the allocation of the values among the different classes.

Kappa, also known as Cohen’s KAPPA, is a statistical measure that shows the reliability of predictions among different classes. Classes are considered as well identified when Kappa is superior to 0.6.

Ready to use TADA?

You don't have immediate data?

No problem, data are available to make your trial as relevant as possible!

Try it now!

Detailed informations

General

Artificial intelligence: Theories and techniques aiming to simulate intelligence (human, animal or other).

Binary Classification: It is the problem type when you are trying to predict one of two states, e.g. yes/no, true/ false, A/B, 0/1, red/green, etc. This kind of analysis requires that the Goal variable type is of type CLASS. Binary Classification analysis also requires that there be only 2 different values in the Goal column. Otherwise, it is not a binary problem (two choices and no more).

Convolutional Neural Network: This type of network is dedicated to object recognition. They are generally composed of several layers of convolutions + pooling followed by one or more FC layers. A convolutional layer can be seen as a filter. Thus, the first layer of a CNN make it possible to filter the corners, curves and segments and the following ones, more and more complex forms.

Data Mining: Field of data science aimed at extracting knowledge and / or information from a body of data.

Deep Learning: Deep Learning is a category of so-called "layered" machine learning algorithms. A deep learning algorithm is a neural network with a large number of layers. The main interest of these networks is their ability to learn models from raw data, thus reducing pre-processing (often important in the case of classical algorithms).

Fully Convolutional Networks: An FCN is a CNN with the last FC layers removed. This type of network is currently not used much but can be very useful if it is succeeded by an RNN network allowing integration of the time dimension in a visual recognition analysis.

GRU (Gated Recurrent Unit): A GRU network is a simplified LSTM invented recently (2014) and allowing better predictions and easier parameterization.

LSTM (Long Short-Term Memory): An LSTM is an RNN to which a system has been added to control access to memory cells. We speak of "Gated Activation Function". LSTMs perform better than conventional RNNs.

Machine learning : Subfield of Artificial Intelligence (AI), Machine Learning is the scientific study of algorithms and statistical models that provides systems the ability to learn and improve any specific tasks without explicit programming.

Multi Classification: Classification when there is more than two classes in the goal variable, e.g. A/B/C/D, red/orange/green, etc.

Multilayer perceptron: This is a classic neural network. Generally, all the neurons of a layer are connected to all the neurons of the next layer. We are talking about Fully Connected (FC) layers.

RCNN (Regional CNN): This type of network compensates for the shortcomings of a classic CNN and answers the question: what to do when an image contains several objects to recognize? An RCNN makes it possible to extract several labels (each associated with a bounding box) of an image.

Regression: Set of statistical processes to predict a specific number or value. Regression analysis requires the type of Goal variable to be numeric (INTEGER or DOUBLE).

Reinforcement learning: Reinforcement learning is about supervised learning. It involves using new predicted data to improve the learning model (calculated upstream).

RNN (Recurrent Neural Networks): Recurrent networks are a set of networks integrating the temporal dimension. Thus, from one prediction to another, information is shared. These networks are mainly used for the recognition of activities or actions via video or other sensors.

Semi supervised learning: Semi-supervised learning is a special case of supervised learning. Semi-supervised learning is when training data is incomplete. The interest is to learn a model with little labeled data.

Stratified sampling: It is a method of sampling such that the distribution of goal observations in each stratum of the sample is the same as the distribution of goal observations in the population. TADA uses this method to shuffle the data set from binary and multi classification projects.

Simple random sampling: It is a method of sampling in which each observation is equally likely to be chosen randomly. TADA uses this method to shuffle the data set from regression projects.

Supervised learning: Sub-domain of machine learning, supervised learning aims to generalize and extract rules from labeled data. All this in order to make predictions (to predict the label associated with a data without label).

Transfer learning: Brought up to date by deep learning, transfer learning consists of reusing pre-learned learning models in order not to reinvent the wheel at each learning.

Unsupervised learning: Sub-domain of machine learning, unsupervised learning aims to group data that are similar and divide/separate different data. We talk about minimizing intra-class variance and maximizing inter-class variance.


Metrics

Binary

ACC (Accuracy): Percentage of samples in the test set correctly classified by the model.

Actual Negative: Number of samples of negative case in the raw source data subset.

Actual Positive: Number of samples of positive case in the raw source data subset.

AUC: Area Under the Curve (AUC) of the Receiver Operating characteristic (ROC) curve. It is in the interval [0;1]. A perfect predictive model gives an AUC score of 1. A predictive model which makes random guesses has an AUC score of 0.5.

F1 score: Single value metric that gives an indication of a Binary Classification model's efficiency at predicting both True and False predictions. It is computed using the harmonic mean of PPV and TPR.

False Negative: Number of positive class samples in the source data subset that were incorrectly predicted as negative.

False Positive: Number of negative class samples in the source data subset that were incorrectly predicted as positive.

MCC (Matthews Correlation Coefficient): Single value metric that gives an indication of a Binary Classification model's efficacy at predicting both classes. This value ranges between -1 to +1 with +1 being a perfect classifier.

PPV (Positive Predictive Value/Precision): Number of a model's True Positive predictions divided by the number of (True Positives + False Positives) in the test set.

Predicted Positive: Number of samples in the source data subset predicted as the positive case by the model.

Predicted Negative: Number of samples in the source data subset predicted as the negative case by the model.

True Positive: Number of positive class samples in the source data subset accurately predicted by the model.

True Negative: Number of negative class samples in the source data subset accurately predicted by the model.

TPR (True Positive Rate/Sensitivity/Recall): Ratio of True Positive predictions to actual positives with respect to the test set. It is calculated by dividing the true positive count by the actual positive count.

TNR (True Negative Rate/Specificity): Ratio of True Negative predictions to actual negatives with respect to the test set. It is calculated by dividing the True Negative count by the actual negative count.

 

Multi classification

ACC (Accuracy): Ratio of the correctly classified samples over all the samples.

Actual Total: Total number of samples in the source data subset that were of the given class.

Cohen’s Kappa (K): Coefficient that measures inter-rater agreement for categorical items, it tells how much better a classifier is performing over the performance of a classifier that simply guesses at random according to the frequency of each class. It is in the interval [-1:1]. A coefficient of +1 represents a perfect prediction, 0 no better than random prediction and −1 indicates total disagreement.

False Negative: Number of positive class samples in the source data subset that were incorrectly predicted as negative.

False Positive: Number of negative class samples in the source data subset that were incorrectly predicted as positive.

Macro-PPV (Positive Predictive Value/Precision): The mean of the computed PPV within each class (independently of the other classes). Each PPV is the number of True Positive (TP) predictions divided by the total number of positive predictions (TP+FP, with FP for False Positive) within each class. PPV is in the interval [0;1]. The higher this value, the better the confidence that positive results are true.

Macro-TPR (True Positive Rate/Recall): The mean of the computed TPR within each class (independently of the other classes). Each TPR is the proportion of samples predicted Truly Positive (TP) out of all the samples that actually are positive (TP+FN, with FN for False Negative). TPR is in the interval [0;1]. The higher this value, the fewer actual samples of positive class are labeled as negative.

Macro F1 score: Harmonic mean of macro-average PPV and TPR. F1 Score is in the interval [0;1]. The F1 Score can be interpreted as a weighted average of the PPV and TPR values. It reaches its best value at 1 and worst value at 0.

MCC (Matthews Correlation Coefficient): Represents the multi class confusion matrix with a single value. Precision and recall for all the classes are computed and averaged into a single real number within the interval [-1;1]. However, in the multiclass case, its minimum value lies between -1 (total disagreement between prediction and truth) and 0 (no better than random) depending on the data distribution.

Predicted Total: Total number of samples in the source data subset that were predicted of the given class.

True Positive: Number of positive class samples in the source data subset accurately predicted by the model.

True Negative: Number of negative class samples in the source data subset accurately predicted by the model.

 

Regression

MAE (Mean Absolute Error): represents the average magnitude of the errors in a set of predictions, without considering their direction. It’s the average over the test sample of the absolute differences between prediction and actual observation where all individual differences have equal weight. MAE is in the intervall [0;+∞]. A coefficient of 0 represents a perfect prediction, the higher this value is the more error (relative error) the model have.

MAPE (Mean Absolute Percentage Error): MAPE is computed as the average of the absolute values of the deviations of the predicted versus actual values.

Max-Error: Maximum Error. The application considers here the magnitude (absolute error when identifying the maximum error. Thus -1.5 would be consider the maximum error over +1.3. The sign of the error however is still reported in this column in case it has domain significance for the user.

R2 (R Squared): also known as the Coefficient of Determination. The application computes the R2 statistic as 1 - (SSres / SStot) where SSres is the residual sum of squares and SStot is the total sum of squares.

RMSE: Root Mean Square Error against the Dataset partition selected. RMSE is computed as the square root of the mean of the squared deviations of the predicted from actual values.

SD-ERROR (Standard Deviation Error): Standard statistical measure used to quantify the amount of variation of a set of data values.