Data is everywhere now. From the smallest start-ups to the biggest corporations, Data is used to build new applications and services, improve customer service, accelerate internal processes, and even improve medicine.
Though Big Data is now clearly identified as a major topic of interest by businesses, media, and governments, Small Data is still a bit shady. Indeed, the lack of attention shown to Small Data does it a great disservice. It has the potential to be life-changing for most departments and domain experts within an organization.
We define small datasets as data that is useful, easily accessible, and beneficial to a department of an organization. Small Data is used regularly by domain experts and is rarely a centralized data owned solely by the IT department.
Small Data is not an alternative to Big Data but a complement. They both work together within an organization as they address different levels and audiences.
Know What You Have – Perform a Small Data Audit!
The first step is to know what small datasets you possess, how good it is, and how much datasets you have access to.
As Small Data is in every department, how do you know where to find it, and how do you extract its maximum value? Let’s find out by making a Small Data audit.
First, let’s make a list of all the different types of dataset you might have. According to your job and industry, the list will differ, but here are a few examples:
- CRM extracts;
- Purchase information about raw materials, equipment, marketing materials, etc.;
- Online shopping cart data;
- Sales by customer and by product/service;
- Behavioral data from your website;
- Data from a machine;
- Performance data, etc.
Once you’ve listed the types of data you have access to, just follow these steps:
- Find out where your Data is;
- Interview the key users of this Data;
- Prioritize and organize Data ;
- Track how this data is being used.
Now that you have your small datasets locked and loaded let’s see how we can get value out of it!
How to work with small datasets?
Two problems occur when working with small datasets using traditional datasets science approaches:
- The first problem is overfitting. For many algorithms, small datasets leads to models that exploit details in your data rather than modeling the underlying mechanics. This essentially means that the model is good at predicting the datatest you already have, but not good at modeling anything else.
- The second problem is outliers. Outliers are small amounts of data whose values differ a lot from most of the data; the average value of the data will largely deviate. For a large class of modeling algorithms, outliers can be very damaging to final model predictive accuracy.
So, what can you do when traditional approaches don’t work with Small Data? The easiest and best way to work with small datasets is to use TADA by MyDataModels.
Built to help domain experts extract value from Small Data, TADA does not require any code or data science knowledge.
Fast and user-friendly, TADA helps users build predictive models in a few hours and provides them with small, explainable models. TADA can be used directly on a computer, in the Cloud, or on mobile devices.
If you have Small datasets and want to give TADA a try, start your 14 days free trail now!