How can AI help pathology identification and improve patient treatment?
As patients’ number increases, health care professionals have less and less time to determine what pathology they suffer, which means that medical error risk increases.
Using Artificial Intelligence to build predictive models out of their existing data could help health institutions to better identify their patients’ pathologies and take better care all along their care path.
Problems to solve
- How to identify as quickly as possible the pathology of which the patient suffers?
- How to help health care professionals in improving their patients’ treatment?
Benefits of TADA
Health care professionals could use predictive models to help them diagnose pathologies. However, they are not Data Scientists and don’t have the right skills in Machine Learning and code to build predictive models.
Healthcare professionals handle a small quantity of data that is essentially their patients’ health data and is considered as Small Data. Traditional Machine Learning tools don’t handle Small Data well.
In that context, MyDataModels offers TADA, a solution to help healthcare professionals such as doctors, researchers, nurses, etc. automatically create predictive models out of their Small Data.
No Data science knowledge is required to use TADA. Healthcare professionals can use their own data without normalization or preprocessing and get convincing results in less than a minute.
MyDataModels brings a self-service solution for those who have Small Data and no data science knowledge.
Correctly guiding patients and optimizing care paths will be among the biggest challenges that healthcare will have to deal with in the next years. Indeed, ER and call centers receive an increasing number of requests that they handle with increasing difficulty.
Using TADA and asking a few questions to the patients would help them determine with great precision what pathology they suffer and act quicker so that they can reduce waiting times and improve patient handling.
TADA will never replace healthcare professionals’ expertise, but it can help them to diagnose better, work better and treat their patients better.
Automated Machine Learning tools help users to predict the future thanks to historical data. To predict a future result, you must compile your descriptive data and the past results obtained.
TADA allows you to easily create a relevant predictive model from your data and apply it to future data.
In this use case, the descriptive data is a list of 41 pathologies that are defined by 132 symptoms. TADA will try and predict the right pathology according to a combination of symptoms.
You can generate a model in just 4 steps:
- Step 1: create your project and upload your data as a CSV file (with data in rows and variables in column).
- Step 2: Select the variable you want to predict, called “Goal”. In this use case, the goal is the “prognosis” variable.
- Step 3: Select your data for the model generation. This step is called "Creating the Variable set" and allows you to manually select the descriptive variables you want to use. By default, they are all selected.
TADA identifies the relevant descriptive variables by itself which affects the calculation time required to create the model.
The fewer variables selected the faster the model creation.
- Step 4: Create your model. When creating your model, some default values are proposed for the name of the model, the size of the population and the number of iterations.
You can start your model generation by validating the default values or editing them according to your preferences. You’ll find best practices at your disposal to guide you in the choice of these parameters in the TADA UI.
According to the size of the file, this step can take between a few seconds and ten minutes. Once the model created, you have access to metrics and graphs to evaluate its relevance.
How can we go further?
You have various options to put your model into practice:
- Use the « Predict » feature of TADA: upload a CSV file with the data to predict. In return, TADA will generate a CSV file with the calculated predictions.
- Retrieve the associated mathematical formula and apply it (for instance on Excel).
- Retrieve the source code of the mathematical formula and use it on your own apps. The source code is available in R, Java, C++ and Python soon. (This option is only available in TADA Premium and Pro).
The below screenshot comes from public data of patients. Each row is a patient and each column are a symptom that is considered as a variable. All variables are binary and indicate if the symptom was present with value 1 or was not present with value 0.
The following 132 variables are present in the dataset:
itching; skin_rash; nodal_skin_eruptions; continuous_sneezing; shivering; chills; joint_pain; stomach_pain; acidity; ulcers_on_tongue; muscle_wasting; vomiting; burning_micturition; spotting_; urination; fatigue; weight_gain; anxiety; cold_hands_and_feets; mood_swings; weight_loss; restlessness; lethargy; patches_in_throat; irregular_sugar_level; cough; high_fever; sunken_eyes; breathlessness; sweating; dehydration; indigestion; headache; yellowish_skin; dark_urine; nausea; loss_of_appetite; pain_behind_the_eyes; back_pain; constipation; abdominal_pain; diarrhea; mild_fever; yellow_urine; yellowing_of_eyes; acute_liver_failure; fluid_overload; swelling_of_stomach; swelled_lymph_nodes; malaise; blurred_and_distorted_vision; phlegm; throat_irritation; redness_of_eyes; sinus_pressure; runny_nose; congestion; chest_pain; weakness_in_limbs; fast_heart_rate; pain_during_bowel_movements; pain_in_anal_region; bloody_stool; irritation_in_anus; neck_pain; dizziness; cramps; bruising; obesity; swollen_legs; swollen_blood_vessels; puffy_face_and_eyes; enlarged_thyroid; brittle_nails; swollen_extremeties; excessive_hunger; extra_marital_contacts; drying_and_tingling_lips; slurred_speech; knee_pain; hip_joint_pain; muscle_weakness; stiff_nec; swelling_joints; movement_stiffness; spinning_movements; loss_of_balance; unsteadiness; weakness_of_one_body_side; loss_of_smell; bladder_discomfort; foul_smell_of; urine; continuous_feel_of_urine; passage_of_gases; internal_itching; toxic_look_(typhos); depression; irritability; muscle_pain; altered_sensorium; red_spots_over_body; belly_pain; abnormal_menstruation; dischromic_patches; watering_from_eyes; increased_appetite; polyuria; family_history; mucoid_sputum; rusty_sputum; lack_of_concentration; visual_disturbances; receiving_blood_transfusion; receiving_unsterile_injections; coma; stomach_bleeding; distention_of_abdomen; history_of_alcohol_consumption; fluid_overload; blood_in_sputum; prominent_veins_on_calf; palpitations; painful_walking; pus_filled_pimples; blackheads; scurring; skin_peeling; silver_like_dusting; small_dents_in_nails; inflammatory_nails; blister; red_sore_around_nose; yellow_crust_ooze; prognosis.
Model type: multiclass classification
Column number: 132
Row number: 3444
Goal : Prognosis
The results show how the predictive model performs.
The predictive model type and its metrics are linked to the Goal and its values. The model type is shown on the model results display.
Three types of prediction can be done according to the Goal data. Here, our goal is “prognosis”:
Binary classification: a discrete value taking only two values, such as Yes/No.
Multiclass classification: a discrete value with more than two values, such as status of state with values like “On”, “At Risk”, “Down”, etc.
- Regression: a continuous value that can take an infinite number of values, such as a temperature, a pressure, a turnover or the price of a house.
When generating the model and according to the state of the art of Machine Learning, TADA will divide your dataset in three parts:
Part 1: A Training part which represents 40% of the data and is used to train a certain number of models,
Part 2: A Validation part which represents 30% of the data and is used to validate and select the best models found in the previous step,
- Part 3: A Test part which represents 30% of the data and is used to test the model approved during the validation step.
The performance measurement and the model evaluation must be done on the Test part (according to Machine Learning standards) as the data used during this phase was not used to build the model and is just used to measure its performance.
The above metrics values go from 0 to 1, with 1 being a perfect prediction.
Here, every metric is around 0.95, which shows the great quality of prediction of the model.
Accuracy (ACC) shows that the model is right for 95.16% of the predictions.
Matthew’s Correlation Coefficient (MCC) shows that no class is overvalued and that pathologies are correctly identified and allocated.
The good results shown in green in the confusion matrix are a consequence of the good quality of data and the good class allocation.
Accuracy (ACC) is the overall accuracy rate of the model: it is the percentage of classes that are well distributed (here, 95.16% predictions are correct)
Matthew’s correlation coefficient (MCC) is an indicator of the general quality of the model and shows the quality of the allocation of the values among the different classes.
Kappa, also known as Cohen’s KAPPA, is a statistical measure that shows the reliability of predictions among different classes. Classes are considered as well identified when Kappa is superior to 0.6.
Ready to use TADA?
You don't have immediate data?
No problem, data are available to make your trial as relevant as possible!Try it now!