Hi! I am Susan Walsh - the Classification Guru - and I'm happy to join 365 Data Science in their great initiative: 365 Data Use Cases. I specialize in spend data classification. It also happens to be my favorite data use case and I am excited to share more about it with you in this article.
You can check out our video on the topic below or just scroll down to keep on reading.
What Is Spend Data Classification?
Spend data classification is usually performed by the procurement department or the finance department in an organization and it takes a list of everything that the organization has bought from a given supplier over a certain period of time. Spend data usually contains the supplier name and normally an invoice description of the goods that they bought.
Let me give you an example.
Let's say Staples is the supplier name and the goods your company has purchased are pens, pencils, paper, paper clips, etc. Or if you're in manufacturing, employees could have bought nuts, bolts, and screws, so that's what's listed within the company's spend data.
Why Do You Need to Normalize Data?
The normalization of data is crucial for data classification within the procurement process. And, as a first step, you need to standardize data. So, what I do is take the file with all the suppliers and standardize them to one term. The reason behind this is that most of the time procurement management executives don't actually know how much they're spending with one supplier as the latter might have different divisions or different countries and all that data is piled up together. In other words, it's all in different formats.
Data Standardization and Data Taxonomy
A really great example is IBM. There are plenty of different variations - IBM, IBM Inc, IBM Ltd., I.B.M, etc. So, they should all be standardized to just IBM.
As the next step, I go through the list and I will classify the data using a taxonomy. Broadly speaking, the result is a table with one to three levels but there could possibly be a fourth and a fifth level as well. Generally, it will be the set of classifications that I will use in the data set. To illustrate with the Staples example - facilities might be level one, office supplies might be level two and paper might be level three.
Then I go through the categories and perform data standardization and data classification once again. This is necessary because there are numerous ways to describe the same product on invoices and descriptions. As a matter of fact, screws are a great example of that. They could be listed as screw, screws, SCR, SCW, SCWs... As you can see, there are loads of different versions.
What Is the Benefit of Spend Analysis and Spend Data Classification?
Data Classification enables me to show the company's procurement analyst exactly how much they're spending on each product.
And what's the benefit of that?
Well, first of all, it translates into cost savings. So, maybe the procurement manager doesn't realize how many screws or paper they were buying. Spend data classification will help them discover that they could be negotiating better rates with their suppliers. Or maybe they will arrive at the following conclusion: “Oh, Supplier A is charging us this, but Supplier B's charging us that. So, we need to start using supplier A more.” Another benefit is that spend data classification can also show you how many suppliers per category you have. This is very important, especially if it turns out your company has 20 vendors for office supplies and you don't need that many. Certainly, you could negotiate a better rate with two or three suppliers that you have.
So that was our whistle-stop tour of spend data classification. I hope you enjoyed it and it helped you learn the basic idea behind spend data classification.
If you have any questions, feel free to contact me on LinkedIn. I also have a website and a YouTube channel - The Classification Guru, where you can find more resources on classification, taxonomy, normalization.
And if you’re new to data science and want to fully understand and distinguish between the various terms and processes in the field, check out the 365 Data Literacy course.