Blog: How we improved customer analytics with machine learning algorithms

For retail businesses with distributed point of sales (POS) systems, you can often run into the problem where stores fail to synchronize their data with the centralized data warehouse. If you are not aware of this situation, you can end up taking faulty business decisions when making seasonal adjustments for your supply chain. The top-selling product for the last week at a given store ends up not getting resupplied. Even worse is when last week’s hot seller, no longer is. Given today’s just-in-time inventory processes, the retail store could end up with a new inventory that is no longer meeting the hot-demand from the prior reporting period.

In this blog post, we’ll discuss how machine learning anomaly detection for retail POS systems can identify sites that fail to consistently report data into a centralized data warehouse. This process makes use of time series-based algorithms and the sequence of the steps for training a neural network.

machine learning anomaly detection architecture diagram

Faulty sales conversion statistics

The DB Best client, one of the largest international beauty products’ providers, sought to track and count visitors per every POS and the purchases made. The tracking services send the overall number of visitors and sales to the centralized data warehouse every hour. And there’s no mechanism for detecting and reporting on the Internet connection issues, server issues, data inconsistency, or other problems that arise on the provider’s side. As a result, the datasets not validated between one another get to the centralized data warehouse. In the long run, the client has faulty statistics on conversions from the total number of visitors and the volume and the type of products sold.

To give you a perspective of the scale of the problem, with 2000 offline stores in more than 30 countries all over the world, our client has a huge volume of data aggregated and an enormous probability of malfunctions.

As a result, this client faces:

  • Unstructured and conflicting input from sales points;
  • A high percentage of incorrect data in the centralized data warehouse;
  • Production plans based on faulty sales data.

Predicting misleading data according to sales trends

To provide our client with proper statistics, we came up with a solution to identify the hours with a high probability of faulty results and send notifications. Thus, inaccurate data will not influence the statistics. Now the DB Best team is responsible for providing the client with reports that we generate with an ML-based service. This service automatically identifies anomalies in the centralized data warehouse and notifies users.

What is more, we created a nice intuitive user interface for the reporting system managers to operate the data and acquire business intelligence statistics conveniently. For putting our web application online, we use an open-source Shiny Server service, so no extra hosting fees were required.

total quantity of sales

Implementing a seasonal-trend decomposition algorithm

People buy different products in the summer and winter periods as well as on holidays. To make reliable predictions and detect faulty data pieces we wanted to rely on the statistics of the seasonal sales trends. For the implementation of this project, we used an unsupervised learning technique. To solve our customer’s problem and build a machine learning algorithm, we went with the Loess-based approach for seasonal-trend decomposition. Using the Loess-based approach, we split the time-series signal into three parts: seasonal, trend, and remainder. In this way, we could sort data to forecast a signal. After, we test whether the value we’re tracking varies from the forecasted value enough to consider it an anomaly.

Our general flow for anomaly detection involves the following workflows:

  1. First, we separate the time series into three components: the seasonal, the trend, and the remainder.
  2. When we have the remainder extracted, we apply anomaly detection methods to this component.
  3. Based on the historical data, we calculate limits that distinguish so-called “adequate” data from anomalies.

The first workflow 

Within the first workflow, the machine learning algorithm performs a time series decomposition on the target value. Here’s the sequence of steps:

  1. We estimate seasonal or cyclical trends and long-term trends.
  2. We find the remainder by taking the actual values minus seasonal and long-term trends.
  3. Now we have the remainder that we want to analyze for detecting anomalies.

The second workflow

At this stage, we perform anomaly detection on the remainder:

  1. First, we estimate the lower limit of the remainder.
  2. Then, we estimate the upper limit of the remainder.
  3. Now we extract the remainders that do not fit the limits to determine the anomalies.

The third workflow

Finally, we need to go back to the original values that contain alterations from the seasonal or cyclical trends. This workflow recomposes the season, trend, and the estimated lower/upper limits of the remainder into new limits that bound the observed values. In this way, we have two new values created:

  1. The lower bound of anomalies around the observed value.
  2. The upper bound of anomalies around the observed value.

sold quantity analysis

Client’s gains and priceless perspectives

Now our client may rely on their reports thanks to the anomalies detection service that we apply hourly at every physical sales point. When the ML algorithm detects anomalies, it pushes them to the business intelligence report. Then administrators of the reporting system get an alert on the anomalies. After that they can exclude faulty figures from the data transferred to the centralized data warehouse.

Applying this anomaly detection method, our client can now make smarter business decisions based on accurate sales data.

What is more, now the company has seasonal and other sales trends numbers detected, which allows for predictions for any parameters. The service that we built is multipurpose. This means that any kind of time series-based data for our client can be analyzed, and anomalies will be detected. Basically, by structuring data for the described project, our team and the clients’ team achieved a fruitful data-driven ground for many more ML algorithms.

sales quantity vs observation hour

Consider machine learning for your business!

Applying the time series analysis, we managed to resolve the problem of false data affecting sales statistics. Now, the client takes to account only reliable data sets for generating conversion statistics and making business decisions.

If you feel like you could benefit from aggregating or managing your data more efficiently, don’t hesitate to contact our team for a BI consulting. Besides, we offer you to dive deeper into the countless perspectives that machine learning opens for your business and apply for data science consulting services from our experts.

Keep pace with the latest data science breakthroughs and get the most out of your data!

Share this...
Share on Facebook
Tweet about this on Twitter
Share on LinkedIn