Sensitivity analysis: why bother? Let’s assume that you have generated an outstanding model predicting severe cases of COVID-19 based on a data set including subjects age, gender, BMI, pre-existing conditions. Your next assignment is to present to a team of stakeholders (i.e., general practitioners) how you obtained these outcomes. How would you share it with them? How will they respond to your use of sophisticated AI metrics? What will be their first questions?
They will most likely ask the criteria they should look for in a patient to anticipate severe forms of the disease? Is the weight the main issue? Should diabetic patients be examined with specific attention? Or even in a simpler use case, when diagnosing COVID-19 which criteria are key? Sensitivity analysis is the answer.
When using black-box algorithms such as neural networks, it is challenging to discriminate among various criteria, which most impact the model’s output. And when the general practitioners ask what they should pay attention to in a patient, it is complicated to provide a simple, demonstrated answer.
Hence, the importance of providing white-box models which are readable and accessible by general practitioners and data scientists alike. These models implement various features of sensitivity analysis. It is with this goal that Quaartz has added new explainability features to its model. There are three new features:
- Global influence,
- Live predict.
They are accessible through a straightforward user interface; here is what it looks like:
Global influence uses the fact that not all features are equally important. It is an overview of sensitivity analysis. But, an essential issue is, how do we grasp which features are more significant? They do not contribute identically to the output. In case we want to drop some of them, we want to identify which parts are less relevant. Quaartz provides the features of importance in the first screen. For instance, in our use case, general practitioners and COVID-19, the first screen is:
It shows that anxiety, chest pain intensity, shortness of breath, and stiffness intensity are the model’s main criteria. Our models are versatile. So is our sensitivity analysis. They can be used to predict the fire resistance of helicopter’s materials; Quaartz tells us that the main criteria are the resin density, the material’s thickness, and the prepreg resin used.
And when used to predict patients’ gene predisposition to uterus cancer, Quaartz identifies a set of genes.
Relative Feature Importance
So now we know which features have the most impact on the output. But we do not understand their relative importance. Our sensitivity analysis so far is global. Feature importance provides a measure that shows the value of each attribute in the construction of the model. The more an attribute has an impact on the output of the model, the greater its relative importance. This importance is explicitly calculated for each feature in the dataset, allowing us to classify and compare characteristics. Sensitivity analysis is thus performed per feature. It can provide surprising results, as in uterus cancer prediction. In this case, the four genes identified as influencing uterus cancer development have the same weight in the model’s output.
This feature rating is what we call “global influence.”
In some cases, one feature overrides all the others, as in the helicopter’s case:
In this case, the material’s thickness accounts for 66% of the outcome, meaning it is the most prominent criteria and overrides the other two.
In other cases, two features might stand out, as in the COVID-19 use case:
The stiffness intensity and the chest pain intensity account for 76% of the actual model’s output, far ahead of shortness of breath and anxiety.
It is even possible to understand further how one specific feature impacts the result. The second readability and explainability feature listed above provides this kind of insights. It is what we call “Interpret.” It goes one step further in sensitivity analysis. Here is what it looks like in the General Practitioners COVID case
Thanks to the “Interpret” feature, we can see that the patient is always diagnosed sick with a stiffness ranked above 4.
With a chest pain intensity above 2, the same happens, the patient is diagnosed sick.
It can also provide critical insights into the case. For instance, in the helicopter’s material use case, “Interpret” tells us that with a thickness between 2.83 and 4.35 cm, the fire resistance is optimal.
It is more complex to interpret the outcome in uterus cancer prediction,
And last but not least, our “Live predict” feature is a “what-if” way of looking at one criterion. A live sensitivity analysis. What-if stiffness increases by one in a likely COVID-19 patient? What-if the material used in a helicopter becomes thicker by 0.2 cm? It is a “live” way to change one of the data in the data set and see right away its impact on the outcome.
With our what-if analysis called “Live predict” in helicopter material design, we can combine the three most important criteria: thickness, prepreg resin, and resin density. Then we can see how it impacts the fire resistance of the material. Or in the general practitioners’ use case, we can simultaneously modify the stiffness intensity, the chest pain intensity, the shortness of breath, and the anxiety and see the impact on the COVID diagnosis.
Take Away Sensitivity Analysis
In a nutshell, using an Interpretable Machine Learning tool such as Quaartz is a great asset. It allows the user to make accurate predictions and understand the criteria impacting these predictions.
To answer the most likely question, the general practitioners will ask once presented with the predictive model for COVID-19 diagnosis, i.e., “What should we look for in our patients?”. Quaartz answers, ” You should monitor stiffness intensity and chest pain intensity.” Nice, isn’t it?