Diagnose Uterus Cancer based on genes

Having a pool of 53000+ known genes belonging to a patient, TADA can identify genetic predispositions for uterus cancer based on a small subset of 4 genes.


Healthcare – Medical Research 

Project Duration and Effort

Three days

Customer Benefits

  • Accurate results obtained in 3 days instead of 6 months
  • Results based on a subset of 4 genes rather than 53000+

Problem to solve

Twenty medical researchers based in Paris, France, have access to 90 patients’ genotypes containing 53000+ genes each. 21 out of the 90 patients have uterus cancer. They want to find a way to predict future uterus cancer in a woman based on her genotype. Not all 53000+ genes are relevant. They want to know which genes are pertinent to use in a predictor. It does not mean that these genes indicate uterus cancer. It means that they are the key variables used to find out about uterus cancer.  

They also want to know the accuracy of these predictions. There are different purposes to achieve in cancer prediction. Whether it is to decrease the number of genes used, reduce the cost of the analysis, or increase the speed of analysis and stay ahead of medical research.


  • Accurately identify the uterus cancer cases while reducing the number of genes in scope as rapidly as possible.
  • Make the best use of the massive amount of data (90 times 53000+) available with no data scientist on board.


It raises the following medical research question:
Can we predict with precision which women are, or are going to be, sick with uterus cancer based on best predictor genes analyzed?

Performance of model prediction
Global influence overview of the model

The following diagrams express the likelihood of cancer, based on the gene expression value:

Training phases of TADA


We began with a clean data set that included 53000+ genes from 90 patients. It also contained information about who, among these 90 patients, was sick with uterus cancer. Such information as the heredity of uterus cancer, cancers in general, age, weight, height can be factored in. 

TADA helped to recognize the key factors, i.e., the most critical four genes in a genotype of 53000+. It took three days to complete this analysis, while it takes six months on average. The accuracy is excellent, with 83% of True Positives, 74% of True Negatives.

Customer Benefits

In three days, medical researchers gained a competitive edge by:

  • Identifying essential genes to discriminate uterus cancer: 4 out of 53.000+
  • Produce ad-hoc models and equations that drastically reduce overall cost and time: 3 days vs. six months while maintaining accuracy: 83% of True Positives