Blog

Experimenting with Large Language Models for Brand Matching

Blog

We tested a Large Language Model (LLM) vs. existing Machine Learning (ML) models already in place. And interestingly, we found… that the large language model found a brand in almost all of the samples that the original ML method couldn’t.

Blog-Experimenting-with-Large-Language-Models

Around the time that Azure OpenAI and ChatGPT hit the market, we began working with a client to demonstrate the viability of a Large Language Model (LLM). Our goals were twofold: First, we wanted to prove the success of applying a GPT model to a private dataset. Second, we aimed to test the success rate of an Artificial Intelligence LLM vs. existing Machine Learning models already in place.

In this case, we were working with a large retailer that aimed to increase the throughput of brand matching: that is, how quickly they could add new products into their catalog/marketplace by matching each product against a list of known brands. The retailer receives thousands of new products to add to their catalog each month. These products must be processed, validated, and matched (to check for existing or duplicate brands) before they can be entered into the Product Information Management (PIM) system.

Previously, the retailer was relying heavily on manual effort to evaluate the vendor product descriptions, identify the product brand, and then match that brand against the list of known brands. This manual effort is costly and difficult to scale, as the number of vendors and products continues to grow. Already the client was seeing bottlenecks and backlogs in the workflow to publish new products. Humans were spending many hours on tasks that could possibly be automated or augmented, including:

  • Looking at a product description from a vendor
  • Identifying the brand either from experience or by looking it up on the Internet
  • Matching the brand with a known brand or creating a new brand in the PIM

Identifying a brand from the product description might be more difficult than you would think. That’s because:

  • Some vendors list the product name before the brand name (i.e. Ultimate Lipstick Love – Petal by Becca for Women – 0.12 oz Lipstick)
  • Others list the brand name first (i.e. Krud Kutter 1014061 5 gal Heavy Duty Cleaner & Disinfectant)
  • Sometimes it’s unclear what is the brand name vs. part of the product descriptor
  • Still other times, the product name and company name may both be considered to be the desired brand name (ex. Kellogg Pop-Tarts)

Results

We were successful in fine-tuning an LLM using the private dataset. We found that overall, the LLM was significantly faster than humans – with sub-second latency for the LLM solution, whereas the human latency was measured in minutes. The LLM was also able to obtain the same level of accuracy as the manual process. (Note that the humans involved in the manual process were very accurate but not infallible, and objective accuracy was accomplished through an audit of both outputs).

We also compared the LLM approach to a traditional ML approach. The traditional ML approach uses classification models, such as Named Entity Recognition, Long-Term Short-Term Memory (LTSTM), and n-grams, trained on labeled data (supervised learning). The LLM and the traditional ML model performed similarly in terms of latency and accuracy of about 80%. But the LLM was 20% more efficient – that is, it was able to identify the brand in approximately 20% more samples than the traditional ML model. The interesting part was that the LLM found a brand in almost all of the samples that the original ML method couldn’t. (These are cases where the traditional ML model response was inconclusive or displayed low confidence.) Upon inspection, we found that the LLM successfully captured some of the more obscure edge cases for brand identification, which was a critical success factor for the client. Given the advancement of LLMs in just the last few months, we believe that the efficiency of the LLM approach would likely be even higher today.

Our Process

In our trial, we first applied a general pre-trained language model. That was not very helpful, because the language model wasn’t designed to respond to a variety of natural language prompts. We then dynamically fine-tuned the Generative Pre-trained Transformer (GPT) model by applying prompt engineering on existing, approved training data (vendor descriptions next to their approved brand matches). We were able to create 1,000 fine-tuned prompts.

For prompt engineering, we provided input in the form of natural language to show examples of how to solve specific tasks. We had to train the model to “infer” a brand, or else say “I don’t know.” (This task is similar to the Feature Engineering task associated with preparing data for the training of standard ML models.)

Once those 1,000 fine-tuned prompts were created, we were able to train the LLM using these prompts. The test data set (a collection of new and previously unseen product descriptions) was then formatted using the same prompt format, and the LLM was tasked to process the test data. As discussed above, the LLM was able to find the brand name in 20% more cases than traditional ML, and with an accuracy that was equivalent to the manual process.

ChatGPT is forcing many companies to explore the question of how to determine AI’s applicability for your organization overall. CoStrategix can help you consider new use cases and the implications for your business model. Contact us to learn more.