How to prepare a DataSet?

The big question, what do you want to predict?

Now you collect, in your everyday business life, plenty of data. If you’re a digital marketer, you collect Google Analytics data in Google Analytics format. If you’re a salesperson, you collect data in your CRM. If you’re a scientist, you collect data from experiments in a spreadsheet.

You collect a variety of information for each ‘record.’ For each sale, a salesman records the customer’s name, if it’s a new or returning customer, the amount of the deal, the type and quantity of sales, the date of the sale. 

What does the salesman want help with? What does he want to predict? Does he want to predict the total amount of sales he is going to make over one year? Does he want to predict how much revenue he is going to make with a specific customer? Does he want to predict how much of a particular product he is going to sell?

What is a goal?

The ‘goal’ defined in TADA is the value you want to predict. If you’re going to predict the monthly sales, the goal is the monthly sales figure. 

What is a Dataset?

A dataset, or data set, is only a collection of data.

The most accessible and most typical format for datasets you’ll see is a spreadsheet or CSV format — a unique file classified as a table of rows and columns. Still, some datasets are stored in other forms, and they don’t have to be merely a single file. Seldom a dataset may be a zip file or folder comprising many data tables with associated data.

Rows, Columns, and Goal

The columns describe one characteristic of your Dataset:

  • One column for the customer’s name.
  • One column for the type of product he purchased.
  • One column for the revenue generated from this sale.

A row described an item, a record. In the salesman example, a row represents a sale. In the digital marketing world, a row describes a campaign. 

The ‘goal’ is one of the columns. It is the one column you want to predict. If you’re going to predict the total revenue generated with a customer, the column’s goal is ‘Revenue generated with customer X.’

Past, Present, Future?

A dataset is a combination of data that gathers information about past events. Dataset preparation means collecting this past data. This data may exist in different locations, including CRM, spreadsheets, property management systems, data studios, or other third-party sources. 

The issue with this is that frequently data that can support the decision-making process is siloed over many departments. Or worse, it is located in challenging to access external sources. 

Yet, the dataset preparation is always about past data. 

This past data trains TADA into learning how the different columns are related together to generate the ‘goal.’ For instance, TADA might find correlations between the season (i.e., the date) and one customer’s orders. TADA predicts the future values of the data from learning from the past matters.

Simple Dataset Preparation: Excel plugin

To make data preparation easier, MyDataModels has created an excel plugin. This Excel plugin can be added to your excel configuration and automatically extract your excel worksheets to TADA.

In our next article, we explain how to create your first model.

Need support ?

Questions? Problems? Need more info? Contact us, and we can help!

Was this page helpful?

On this page

Was this page helpful?

Start making sense of  your data

Test easily TADA with our test data here