Insights

Data to Value: is my process data good enough?

Written by Melina Weckman | 22/1/2024

According to a recent survey by Deloitte (2023) a striking 86% of surveyed US manufacturing executives are of the opinion that the main driver of competitiveness in the next five years are smart factory solutions. At the core of most of these solutions lie data, or more specifically, high-quality data.

So, what makes data quality good?

In the process industry, "good data" refers to accurate, reliable, and relevant data collected from various sources within the industrial processes with the purpose of producing high-quality information. The data is often stored in a data historian. This system automatically acquires data from the diverse devices scattered throughout the plant and compresses it for prolonged storage. If this sounds familiar, you are most likely set for data analysis.

Let's however delve deeper into the components that constitute "good data" in the process industry to learn more.

1. Accuracy

Accuracy is a key component of data quality. Inaccurate data leads to flawed analyses and misguided decisions, resulting in operational inefficiencies. To ensure maximal accuracy of process data, use robust data collection methods and error-minimizing technologies, such as smart sensors. For machine learning models that lay the foundation of advanced monitoring software solutions such as Factory Harmonizer, the rule of thumb is that the more precise the data, the more accurate the analysis thereof.

2. Reliability

Reliable data is a pillar of informed decision-making. Consistency and stability in data streaming and collection are crucial. Occasional disruptions in the data collection cause turbulence for machine learning models that attempt to replicate/model the as-is process. Uphold reliability through regular maintenance, calibration, and monitoring. To detect sensor faults, consider adding soft sensors to your toolbox.

3. Relevance

Good data isn't just about the number of decimal points or continuous measurements - it's also about relevance. Data in itself does not provide any value before it can be turned into information. The most valuable information provides the user with actionable insights that they can use to improve the process. Thus, it is important to align data collection with the specific goals of your operations by first defining clear objectives. What are you hoping to achieve with the data in the long run and how does this align with your overall strategy? Once defined, spend a significant amount of time planning out your instrumentation architecture to install measurement instruments in all places that provide crucial input for process analytics.

4. Quantity

While quantity does not always equal quality, when it comes to data, more is often more. While this remains true, many manufacturers are left with a data overload -problem, finding themselves suffocating under a pile of unused data. Outsourcing the data processing to a provider who is an expert in extracting value from data is an effective method to maximize the output of the investments made in data collection. If the outsourced partner is using machine learning for data analysis, an estimated minimum of 300 counts for each “tag”, or metadata label, is preferred to build a reliable and accurate model. However, the number of tags required is highly dependent on the other contributing factors mentioned above as well as the nature of the conducted analysis.

5. Continuity

While continuity and reliability have their similarities, in the process industry it is important to separate one from the other. For batch production for example, data can be reliable while at the same time, the sheer nature of batch production causes regular breaks in the collection of data and the collected data differs from one batch to another. For machine learning models that depend on the continuity of data, such as the one behind the Harmony module of Factory Harmonizer, continuous production is often most suitable for maximal value extraction. On the other hand, soft sensors that produce continuous measurements for noncontinuous variables such as time-consuming lab measurements, can also be applied to batch production.

Data Health Check

The list above may leave you feeling hazy about your data quality. No worries, most of our customers have been hesitant about the quality of their data. While data is collected, standard connectivity protocols are not easy to abide by as old legacy systems are replaced one at a time. Similarly, while regular maintenance can improve reliability, sensor faults are bound to occur.

At SimAnalytics, we therefore conduct a thorough data health check before the start of any project. This process ensures that the quality of the data is sufficient to produce models that provide users with actionable insights. Furthermore, before subjecting data to a health check, a preparatory phase involves cleaning up the dataset(s) to first maximize its quality. This can among others include removing textual elements and trimming redundant information, ensuring a streamlined dataset.

Thereafter the data is subject to our health check funnel. Imagine a sophisticated system acting as a gatekeeper for data integrity. This system sifts through data, excluding what can be deemed as "bad data". Null values and outliers are excluded, tags (sets of labelled data) with too many outliers are highlighted, tag activity is checked for, and a minimum amount of available tag measurements are secured. At the end of the funnel, a customer is left with a list of all tags that can be subject to machine learning as well as a list of insufficient tags that they can use to improve their data collection. The list of usable tags is often longer than expected.

Elevating Your Data Game

In conclusion, the journey from raw process data to valuable insights is a critical one for manufacturers seeking to enhance decision-making, optimize processes, and improve operational efficiency. As highlighted in this blog, the quality of data plays a pivotal role in unlocking the potential of smart factory solutions. However, while data quality is important, many underestimate the potential of their data. Thus, it is important to seek an analytics provider who can help to maximize its value. Read more about making this choice in our blog post ‘Key Factors to Consider When Selecting an Analytics Provider for Your Production’.