Marketing departments spend millions in campaigns, and it is difficult to predict any ROI.
Digital Marketing now allows experts to collect large amounts of data which can be analyzed to make actionable decisions and optimize the ROI of Marketing campaigns.
This case study is about building a predictive model to identify prospects who will subscribe to a bank term deposit after a direct Marketing campaign.
Problems to solve
- How to assess the performance of a Marketing campaign?
- How to predict if a prospect engaged by a direct Marketing campaign is going to buy a product or subscribe to a service?
- How to improve lead qualification to help salespeople focus on the right targets?
- Can machine learning help with these matters and how accurate predictive models can be to detect future customers and optimize Marketing ROI?
Benefits of TADA
Marketing and Communication experts are not data scientists. They may not have the required skills in machine learning nor coding to build predictive models. Moreover, most data handled by these professionals are Small Data, meaning that their historical data often contains a limited number of campaigns and thousands of customers but rarely millions (like in Big Data). Traditional machine learning tools work well with Big Data but do not perform with Small Data.
MyDataModels allows domain experts to build automatically predictive models from Small Data. No training is required, and domain experts can use their raw data directly: no normalization, outliers handling nor feature engineering are required. Thanks to this limited data preparation, the results from this specific dataset were obtained with a few clicks in less than 5 minutes on a standard laptop.
MyDataModels brings a self-service solution for the domain experts who have Small Data and no data scientists.
Conclusion
Marketing experts spend millions in customer acquisition campaigns. Targeting and ROI optimization is key to generate qualified leads and turn them into customers. Campaigns are now increasingly digital. Hence, marketing experts collect large amounts of data which are unfortunately not “mined” to discover hidden information for effective decision making.
In this specific use case, the results from MyDataModels’ predictive model reached an 85% accuracy rate which is a satisfying score for most professionals.
Marketing & Communications departments could use more machine learning to assess the quality of campaigns in general and lead conversion in particular. Even better, TADA can help Marketers identifying the key aspects of a campaign that are most likely to drive its performance.
By predicting conversion rates to their clients and providing more actionable reporting with meaningful KPIs, Marketing agencies using this solution have a competitive edge.
With this automated technology, Marketing experts spend less time on data and deliver more qualified leads to Sales.
Case study
Solution
Automated Machine Learning solutions consist of predicting the future with historical data. To predict a future result, you must bring your descriptive data and the past result obtained.
TADA allows you to simply create a relevant predictive model from your data and apply it to future data.
In this case, the descriptive data are client’s information on their actual situation with the telecom company.
The goal of the dataset is to predict if a client will or won’t buy: it’s a binary task (yes/no).
To generate a model, the steps are the following ones:
- Create your project and load your data as a CSV table (with data in rows and variables in columns).
-
Select the variable you want to predict, called Goal.
In this case, the Goal is the variable "Target" (a visualization of the variable is provided). -
Select your data for the model generation. This step is called "Creating the Variable set" and allows you to manually select the descriptive variables you want to use. By default, they are all selected.
TADA identifies the relevant descriptive variables by itself, which affects the calculation time required to create the model.
The fewer variables selected, the faster the model creation. -
Create your model.
At creation, default values are proposed to you: Name of models, Population, Iteration. You only need to validate the default values to start model generation.
‘Best practices’ are at your disposal to guide you in the choice of these parameters.
Depending on the size of the descriptive data file, this step can take between a few seconds and ten minutes.
Once the model is created, you can see the results of the model using metrics and charts so you can judge its relevance.
Note:
To apply a model that you think is relevant, you can:
- Retrieve the associated mathematical formula and apply it (for instance on Excel)
- Retrieve the source code of the formula and use it by yourself (Valid only on TADA paying offers). The source code is available in R, Java, C ++ and soon Python.
- In order to use our "Predict" feature on the product, you will have to upload your file containing the data to be predicted. You will be returned a downloadable file containing the given data, with
the calculated predictions.
Dataset information
The screenshot below shows an extract of the public dataset.
Each row (2883) is a prospect and each column (28) is a variable which can be used in the model.
The dataset is coming from direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. In most cases, more than one contact to the same client was required. The classification goal is to predict if the client will subscribe ('yes') or not ('no') a bank term deposit
28 Variables were used:
1- Output variable (target): has the client subscribed to a term deposit? (binary: 1/yes, 0/no)
2 - age (numeric)
3 - duration: last contact duration, in seconds (numeric).
4 - campaign: number of contacts performed during this campaign
5 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)
6 - previous: number of contacts performed before this campaign and for this client (numeric)
7 - emp.var.rate: employment variation rate - quarterly indicator (numeric)
8 - cons.price.idx: consumer price index - monthly indicator (numeric)
9 - cons.conf.idx: consumer confidence index - monthly indicator (numeric)
10 - euribor3m: euribor 3 month rate - daily indicator (numeric)
11 - nr.employed: number of employees - quarterly indicator (numeric)
12 - job : type of job (categorical)
13 - education (categorical)
14 - month: last contact month of year ('mar', ..., 'nov', 'dec')(One Hot encoded)
15 - day_of_week: last contact day of the week ('mon','tue','wed','thu','fri')(One Hot encoded)
- Task Type: Binary Classification
- Number of variables: 28
- Number of rows: 2883
- Goal: (target) will the client subscribe a bank term deposit? (1/yes or 0/no).
- Weight: Positive class: 89.1%, Negative class: 10.9%
Results
The results of the model are available following the generation of the model.
They present the performance of the predictive model.
The type of predictive model and the measurement indicators of the associated model are related to the Goal (Variable to be predicted) and the values of this variable.
The type of model you make is shown on the model results display.
According to the type of the Goal (in our case, the Goal is "Target"), we can make three types of predictions:
- Binary classification: Discrete value taking only two values (yes / no for instance)
- Multiclass classification: Discrete value taking more than two values (for instance a status of state with values like: On, Risk of breakdown, Down, etc.)
- Regression: Continuous value that can take an infinite number of values (a temperature, a pressure, a turnover, the price of a house, etc.)
At the generation of the model, and according to the practices and state of the art of Machine Learning, your dataset will be divided into three parts by TADA:
- A training part which represents 40% of your dataset, it allows to train a certain number of formulas,
- A validation part, which represents 30% of your dataset, which validates and selects the best formulas found in the previous step,
- A test part which represents the last 30% of the model and which corresponds to the test of the formulas approved by the preceding stage. The performance measurement and the evaluation of your model should mainly be done on this partition (Standard and state of the art of Machine Learning) because the present data were not used in the learning and validation phase of the machine learning model and serve just to measure its performance.
ACC (Accuracy) represents the overall accuracy rate of the model, it is the percentage of classes that are well distributed (here we have 85.8% predictions that are correct)
TPR (True Positive Rate) represents the accuracy rate of the prediction of the positive class, i.e. of the "yes/1" class
TNR (True Negative Rate) represents the accuracy rate of the prediction of the positive class, i.e. of the "No/0" class
MCC (Matthew's Correlation Coefficient) represents the good prediction as a whole, that is, if we were able to divide the predictions between the two classes.
Confusion matrix
The confusion matrix is a visual way of interpreting the metrics.
In this case, TADA predicted 202 times that a client would buy, and was wrong 109 times (TADA predicted that 109 people would buy, which they didn’t).
In parallel, TADA predicted 664 times that the client would not buy and was wrong 14 times (TADA predicted that 14 people would not buy, which they did).
Ready to use TADA?
You don't have immediate data?
No problem, data are available to make your trial as relevant as possible!
Try it now!