# Data modeling

In Artificial Intelligence and, more specifically, in Machine Learning, a model represents a decision process in an abstract manner. The model’s primary goal is to enable automation of the decision process, often applied to business. And in some cases, the model helps with understanding the modeled process itself. Machine Learning models are mathematical algorithms that are “trained” using data. Ideally, the model should also explain the reason behind its decision to help understand the decision process (though often challenging).

## Models categories

Data models have a specific purpose. We can classify them as follows:

### Predictive models

If archeologists can use predictive models to discover never unearthed spots, imagine what high-level predictive models can do for healthcare issues, trading algorithms, or customer relationship management. Predictive models can provide meaningful analytics, and thanks to their anticipation ability, help gain competitive edges.

## Predictive models

### Descriptive models

Descriptive models are an abstract representation of the system they model. They enable a better understanding of the relationships within this system, which might be customer-driven, and the interactions between internal and external events or behaviors. It’s useful to optimize a workflow to improve active customers’ ROI, for example.

## See use cases

### Decision models

Fed by qualitative and quantitative data, we build decision models to help us make decisions. They help us perceive, organize, and manage the business rules. Planning, prices, logistics can benefit from decision models.

## Predictive modeling

Regarding predictive modeling, we can distinguish different tasks depending on the nature of the variable to predict. If the variable is continuous, it is a regression task, and in that case, the models return an actual value. If the variable is discrete and divided into categories, it is a classification task, and the models deliver a class. When there are two classes, we talk about binary classification and multi-classification otherwise.

Several algorithms exist to build models able to perform regression and classification tasks: regression algorithms, Bayesian algorithms, kernel algorithms, decision trees, neural networks, and evolutionary algorithms such as ZGP (the core engine of TADA).

## Algorithms & engine

Numerous algorithms and engines exist nowadays, expanding the scope of available possibilities. Amongst them:

### Regression

Regression algorithms gather supervised machine learning techniques, where algorithms are trained before being applied to data to create a prediction. They are useful to assess the causal effect of a (or multiple) variable upon another.

### Decision tree

Part of supervised machine learning technique as well, decision trees are used to predict a goal or a target based upon a series of questions. It can operate through classification (categories) or regression (numbers).

### Time series

Time series are used to comprehend the behavior of a given asset over time, and therefore build accurate predictions about its future. It is done by indexing series of data points in time order, whether they are listed or graphically represented.

### Small Data

Small Data is a new frontier in data. It represents up to 85% of all the data collected. It then challenges the capability to create algorithms capable of working on datasets with little or no history and yet being able to provide meaningful insights through efficient predictive modeling.

### ZGP Engine

ZGP is a unique mathematical expression engine inspired by evolutionary algorithms. It is able to create simple mathematical expressions that are particularly good at predicting or classifying based on small datasets.

Algorithms and engines are an ever evolving topic in the AI world. Stay tuned for more insights.

## Data modeling features

Data modeling tools have to perform specific tasks to meet challenging business goals.

### Data analysis

Data analysis is the process that converts raw data into usable insights.

### Sensitivity analysis

Sensitivity analysis allows one to assess the causal effect of a (or multiple) variable(s) upon one another. It helps test the robustness of a model and optimize it. It does so by assessing the uncertainty caused by a given variable.

### Data visualisation

Data visualization consists in representing raw intel through visual representation. It takes reporting to another level. It can be a means for spotting weak signals, thus generating a competitive edge.

### Live predict

By instantaneously extrapolating the machine learning results from live databases, it is possible to provide a dynamic sensitivity analysis. Therefore it becomes an opportunity to create more business value from data modeling and predictive analytics.

### Correlation insight

Statistical and Machine Learning correlation evaluate whether or not there is a relationship between variables or assets. Along with an adequate understanding of probabilities, it can help forecast, target, or improve sales.

Data-driven decisions are crucial for success. Modeling techniques applied to business processes contribute to the growth of any business.

### Data mining

Data mining identifies significant patterns within the datasets. It can also make use of supervised algorithms—this approach differs from data modeling.

Data modeling emphasizes the importance of data in business decisions. Reducing the scope of possible decisions helps make quicker, better decisions.

### Pre-Processing

It’s the first step of data cleaning. It consists in transforming the data so that its structure becomes uniform and usable by algorithms.

### Data preparation

Preparing a dataset for its processing can be harsh for non-data-scientists. Format, outliers, missing values are common setbacks. Sometimes, feature engineering can be required. TADA takes care of this for you.

### Testing

When generating a predictive model, it is necessary to validate its outcomes’ accuracy. It is the testing phase.

### Validating

A part of the data is set aside before the model’s generation. This dataset is used to compare the model’s prediction with actual values. It is the model’s validation.

### Evaluating

Of course, there is no such thing as a perfect model with 100% accuracy. Therefore, there is an evaluation phase that serves to rank the model against its peers. Evaluation provides metrics such as confusion matrix, F1 score, or the Kappa cohen score, amongst others, to assess the model’s accuracy.

### Deploy

In the data modeling world, deploying a model means applying its algorithm to new data in a real business environment. It can serve to make business decisions. It is data science used in real life.