Blog

Balancing Flexibility and Stability in Data Analytics

Blog

Stability and flexibility are seemingly conflicting and divergent requirements. So how do you balance stability and flexibility in data analytics?

Blog-Balancing-Flexibility-and-Stability-in-Data-Analytics

If you work in the data industry, you’re probably familiar with this all-too-common paradoxical scenario: Almost daily, your business users request improvements to their data. These requests might include improving the accuracy of existing data or adding additional records to fill in data gaps. Yet the minute you deliver these improvements, those same business users complain that the data changes that they requested have impacted the historical data queries that they rely on, and can’t you please change them back?

How do you handle such finicky behavior – finicky, at least, from your point of view? We sympathize, and we have two pieces of advice for you:

  1. Avoid destructive changes to your analytics database
  2. Involve business users in your planning.

Avoid Destructive Changes to Your Analytics Database

One approach to help achieve stability in a dynamic database is to avoid destructive changes. Destructive changes are anything that would affect a query of historical data. For example: changing the values (or meaning) of an existing data column, or adding or deleting records for a past time period.

Whenever possible, you should add new datasets instead of altering existing datasets, or add new columns instead of modifying existing columns. You can generally make these non-destructive changes without impacting your existing users. The users who do want the new and improved data can use the new dataset or new columns. Other users can simply ignore them. (Note that you should encourage your users to use explicitly named fields in their “SELECT” statements, and avoid using “SELECT *”, which could unintentionally pick up newly added columns.)

Adding datasets that are identical to a previous set but with an additional tranche of records, or adding a new column that is identical to a previous column but with one additional category of values, may seem anathema to a database administrator schooled in 3rd order form and data normalization. But this seeming inefficiency allows your business users to continue working in blissful ignorance of the day-to-day changes and improvements that occur behind the scenes. The extra storage space is more than paid for by the ongoing productivity of the business users.

Won’t these non-destructive changes eventually pile up or clutter your analytics database? Yes, you will eventually have multiple versions of many tables, and multiple columns that all have confusingly similar names or meanings. But you can manage this clutter via a “Versioning and Restatement” process.

Keep in mind that not all desired changes are significant enough to warrant a new column or new table. Place minor changes that affect only a small number of records, and therefore are not material to general trends or insights, on a backlog to be worked at a later time. The Versioning and Restatement process also allows for those minor changes to be implemented, thus clearing out your data backlog as well.

Involve Business Teams in Your Planning

The terminal stakeholders of your data analytics system are the business users. Ultimately, data scientists and data analysts are really just trying to find insights that will allow the business to grow revenue, save costs, or reduce risk. The business team may have ideas about the results that they are trying to obtain, but they may not know the best way to achieve them. For instance, they may say they need a better drill, but what they really need is a hole.

Therefore involving business users directly in the planning process for your analytics database is vital. Maintaining the data catalog and data dictionary in real time is a basic part of any data management plan. But this is one-way communication. To foster greater involvement, you need a feedback loop where the business users can communicate both requests and problems and see updates and fixes as they arrive.

One possibility is an issue-tracking system accessible to both data engineers and business users. The issue-tracking system can communicate both problems (from the business) as well as the resolutions to those problems (from the engineers). For extra bonus points, you can ingest the issues themselves into the data lake on an hourly or daily basis. You can even publish a weekly or monthly newsletter to highlight the major changes and present new use cases.

You can collaborate toward the common goal better when 1) Business teams understand the concepts of destructive vs. non-destructive data changes, and 2) Data teams understand the historical queries or reports upon which the business teams depend. Together, you can achieve the impossible and balance stability and flexibility in data analytics.

CoStrategix is a strategic technology consulting and implementation company that bridges the gap between technology and business teams to build value with digital and data solutions. If you are looking for guidance on how to mature your data analytics capabilities, we can help you leverage best practices to enhance the value of your data.