This article aims to unpack the step-by-step process of cleaning and classifying data within the DotActiv software using the ‘Screening And Classification’ feature.
Ensure that your Import Utility has been set up and that you are working on a database before continuing with the following steps.
To log in to a database, open the DotActiv software. Select the applicable configuration, untick the ‘No Database’ box, enter the relevant details and click ‘OK’, as seen below.
In the following steps, we will show you how to screen and classify data within the primary, and secondary display structure as well as the product and market fields of the database that you are logged on to.
How to set up the Import Utilities
In order to fully make use of the “Screening and Classification” feature, you will need an import utilities instance set up with the necessary model trainers.
Please note, if you are not sure if your environment has an import utilities instance, then please communicate with your DotActiv Account Manager or IT.
A new modal trainer will need to be created. Firstly name your Model Trainer and then enter all the necessary credentials under the connection tab, this tab is where you link your import utilities to your DotActiv database.
Then under the Models tab, you will select what Model type you want to run for your instance so that the software can learn about what already exists and then provide predictions for that field.
The schedule tab is where you can choose to set up a schedule that the model will run on or you can choose to not have a schedule and to run the model manually.
Please note, that your DotActiv software will need to access and connect to the Import Utility in order to obtain the information from the training model. This is dependent on the server that the Import Utilities are installed on and if the network it is on is private or public facing. Depending on how the server is set up, the Import Utilities will need to be configured in a certain way to allow a link of communication between the DotActiv software and Import Utilities.
Machine Learning
To understand how Machine Learning impacts this feature, it is important to understand the core of Machine Learning. Machine Learning is defined as: “…the use and development of computer systems that are able to learn and adapt without following explicit instructions, by using algorithms and statistical models to analyze and draw inferences (conclusions reached on the basis of evidence and reasoning) from patterns in data.”
The Screening and Classification feature makes use of Machine Learning in order to provide suggestions as to what data should populate your fields. When your training models are selected (be it the Primary Display Structure, Secondary Display Structure, Brand, Suppliers, Dimensions, or Size and UOM) the software will take the existing few populated fields and pick up a pattern. From there, the software will make statistical predictions based on these patterns as to what the blank values could potentially be.
In order for these decisions to be made by the software, existing data needs to be present. The software cannot make use of empty fields to make statistical predictions. The software requires a minimum of 2 to 3 distinct classifications with at least 5 products in each of those distinct classifications in order to function correctly. Without this foundation, the software cannot train itself.
Every time the schedule runs for the model (weekly/daily/or on select days) the model will learn from previous decisions that were confirmed/changed manually by the user in the screening process. This further improves the decision whenever new classifications need to be populated as it ‘learns’ from previous runs.
How To Screen Data
Step 1: Once logged in, navigate to the ‘Data’ tab, and click on ‘Screening And Classification’ which is found in the Processing section of the toolbar.
The ‘Screening And Classification’ feature uses a wizard setup approach which guides you through the process step-by-step.
Step 2: After you clicked on ‘Screening And Classification’ the Data Screening Wizard will open up. The first step is to select the data screening area that you would like to work with. You have the option to choose from the following when clicking on the drop-down menu:
- Primary Display Structure
- Secondary Display Structure
- Product Fields
- Market Fields
Once happy, click ‘Next’.
Step 3: You are now able to select the fields that you would like to screen for missing classifications by simply ticking the boxes of the relevant fields, as seen below. You can also tick the box next to ‘Field’ as a shortcut if you’d like to use all the fields listed in the window.
The ‘Master Display Structure Coverage’ checkbox is displayed at the bottom of the page if you have selected either the primary or secondary display structure as your screening area. Tick the ‘Master Display Structure Coverage’ box.
Click on ‘Next’ once you’re done.
Step 4: Lastly, you have the option to add any data screening filters.
If you would like to add filters, click on the ‘Edit’ button as seen in the screenshot below.
This will open the ‘Filters’ window where you can now add the filters. The fields available for filtering are applicable to the screening area that you have selected. For example, product fields for primary and secondary display structure or product fields, and market fields for market.
Once you have added your filters, click on ‘OK’.
Please note that this step is optional.
You can click on ‘Finish’ to start the screening process.
This will open up the ‘Data Screening Results View’ which gives you a summary of the ‘Total Missing’ and ‘% Missing Of Items Screened’ by each field that you selected in the setup process.
From the ‘View’ section in the toolbar, you are able to view the summary in either a ‘Grid’ or ‘Graph’ form by switching between them.
If you select ‘Grid’, a grid with all screened fields, their results and the filters used (if applicable will be displayed.
If you select ‘Graph’, you will be presented with the following in which you can choose how you’d like the software to display the data. For example, it can be displayed in a Stacked Graph, or Pie Chart or you can view all fields in a Multiple Pie.
The Graph View shows two graphs that have combined results from all fields screened and one graph per field screened.
Please note that you can export these graphs by clicking on the ‘Export’ button on the right-hand side of the graph. This will export the graph to a PNG file format.
How To Classify Data
Step 1: From the ‘Data Screening Results View’, navigate to the ‘Action’ section in the toolbar. Here, you can click on ‘Classify’, if you’d like to classify the data you selected during the setup process.
Step 2: Once you’ve clicked on ‘Classify’ the ‘Data Classification Wizard’ will appear.
The following steps in the wizard are applicable to the screening area that you have selected. We have therefore split the rest of the steps into two different sections. The first is applicable to those who have selected the primary or secondary display structure or product fields as their screening area. The second section will detail the steps to follow if you have selected market fields as your screening area.
Primary Display Structure/ Secondary Display Structure/ Product Fields
Firstly, you will be presented with the ‘Missing Required Product Values’. If you have selected the primary or secondary display structure or products fields as your screening area, and have required key field values and description values missing you will need to complete this step.
In this step, you will need to provide values where key ‘Barcode’ and ‘Product Description’ values are missing. No duplicate key values will be allowed.
Click ‘Next’ to continue.
You will now be presented with the ‘Display Structure Review Page, applicable to the primary and secondary display structure screening areas. In this window, you will see the applicable master display structure.
This display structure can be filtered on the left-hand side of the page and zoomed in at the bottom. If you’d like to export the display structure, you can export it to a PNG file format from the bottom right of the window. However, if you would like to reset the display structure, you can do so from the bottom left of the window.
Click ‘Next’ once you are happy.
The ‘Classification’ page will appear and it will first load up all predictions and product library values and customer values where there are values missing.
The ‘Prediction Models’ button on the top left-hand side of the page is enabled if the Import Utility integration has been completed and a connection can be made. By clicking this button, a window will pop up detailing what models exist throughout the display structure.
Any new classifications in the grid will have a coloured badge in the cell indicating where the value is coming from. We break down the colour indicators below.
- Green, Orange and Red coloured badges with a percentage are predictions. Predictions require baseline classifications to first be completed so that predictive models can be created using the baseline classifications.
- Grey with ‘GPL’ text is from the Global Product Library.
- Grey with ‘CF’ text is from the product corresponding custom field which is the field with the same name in the custom section.
Please note that you will not be able to reclassify Group, Department or Category fields if there is an existing value in the grid cell.
When classifying the data, a dropdown is shown with the product description for reference and then prediction values (if the Import Utility is integrated) with their accuracy, Product Library values for the field (fetched my matching on products barcode), Master Display Structure values for the field and Customer fields values as a reference point.
If you select a new value, the software will re-predict values down the display structure taking into account the newly selected value.
Please note that any violations of the Master Display Structure are shown in red, and you should not be able to continue to the next page if there are any violations.
On the ‘Review Classifications’ page, modified product rows with changes are indicated in red. Here you can edit the modified values.
Market Fields
Firstly, you will be presented with the ‘Missing Required Market Values’ page. If you have selected market fields as your screening area, and have values missing you will need to complete this step. In this step, you will need to provide values where key, Store Name and Chain/Retailer Name values are missing. No duplicate key values will be allowed.
Next, you will be presented with the ‘Geographic Data Classification’ page.
The geographic fields screened are shown in a grid and missing data is looked up from the Google Places API using a combination of the Store Name and Chain/Retailer Name information.
Results can be viewed in a Google Maps window by clicking on the ‘View’ button in the results column. In this view, new geographic data can be used by searching and selecting the marker and clicking ‘Use This Location’ in the pop-up shown next to the marker.
If you select the ‘Lookup Location’ button on the top left-hand side of the screen, it looks up new data using the current Store Name and Chain/Retailer Name values.
The ‘Field Mappings’ button on the top right of the page allows the mapping of fields to those returned from the Google Places API.
Data Classification Non-geographic fields:
Non-geographic fields screened are shown in a grid and data can be classified by user input.
Lastly, we have the ‘Review Classifications’ page, in which changes are also shown in red and can be modified.
Completing The Classification Process
Finishing the Data Classification Wizard for all the different screening areas will show an ‘Update Database’ window, which will prompt you to accept if you’d like to update the database.
After updating the database you should be prompted to generate a ‘Data Screening Post Classification Report’ which compares the screening results pre- and post-classification.