AI applications are about decisions made based on data, which in engineering applications are usually obtained using sensors. It is important to be able to trust the measured variables. It is therefore inevitable to deal with measurement errors that inevitably occur in practice and can lead to wrong decisions through the use of AI. According to the International Organization for Standardization (ISO), a measurement error is defined as the result of a measurement minus the true value of the quantity. In this context, measurement error cannot always be clearly separated from the processing of the data. At PANDA, we group measurement errors by strategies to identify and deal with them.
Bias is an error due to a value that is shifted, or generally systematically biased, compared to the normal behavior of a sensor. Avoiding this error usually requires an adjustment with a reference. If this is not done correctly, or only partially done correctly, it will result in an incorrect conversion and, as a consequence, a systematically incorrectly measured quantity. Common errors include incorrect gain factors, incorrect offset, unit conversion errors and non-linearities that have not been taken into account. These errors also occur by an incorrectly performed calibration of a sensor or the measurement chain. The reference, which can be a reference voltage on the AD converter, for example, can also be subject to changes.
If the sensor is incorrectly designed, the measurement level that occurs can exceed the measurement range of the sensor or the electronics, so that values are recorded that no longer show any change beyond a limit value. This behavior is called clipping.
Drifts are measured values that deviate from their true value over time due to changes in the sensor, for example due to aging or an irreversible chemical reaction, or due to temperature dependencies caused by changing environmental conditions, and thus no longer correspond to a calibration and the calibration must be adjusted.
A common error in sensor data is missing data in the recording. The result is an incomplete data set. Missing data is caused by a variety of factors, including unstable connections due to network problems, especially wireless connections; sensor device failures due to their limited electronics or battery life; environmental disturbances, such as people, walls, and weather conditions; and software bugs.
Hanging values are values that show a constant value over time when they should be changing. They are usually caused by a faulty sensor or transmission problems, and therefore may actually turn out to be missing data. They are often values that are constant at zero ("stuck-at-zero") or a typical offset from zero over a long period of time, which can be used to infer a defect in the electronics.
A defect at the electronics or the wiring shows up with high-frequency sensors by the fact that the circuit captures strong noise or an oscillation like the mains frequency of 50Hz. Here it is often a question of interrupted connections, so that a potential is exposed by cable break or loose contact in the connector and becomes susceptible to electromagnetic radiation of any kind. Likewise, these effects are found when unshielded cables are used.
The type of error most often discussed in scientific publications is outliers. They are also referred to as anomalies. They are values that deviate significantly from the behavior dictated by a model consideration - as, for example, compared to a modeling of the data, where the data is found not to follow a Gaussian distribution or a Markov process. Even in the absence of scientific modeling, a sensor data measurement is usually considered as outlier if it is strikingly different from its previous and next observations or observations from neighboring sensor nodes, such as when single peaks or drops occur in time series.
Noise is a type of error that is not a technical error per se. There are (always) small fluctuations in the data set caused by effects of statistical physics such as thermal noise and effects of signal processing such as high frequency oscillations and aliasing in AD conversion. Changes to the measured values show up even though no changes to the sensor values are expected because the system is knowingly at rest in a macroscopic scale.
Uncertainty refers to the quantification of this effect in statistical terms as error deviation from the expected value. The assumption here is that a measured value can never be acquired without uncertainty, but a deviation is less likely the larger it is, so that confidence intervals can be assigned to the measured values. Error propagation can be used to describe the increase in error along a measurement chain. Noise increases steadily along an analog transmission chain without active measures. Only digitized values circumvent this effect.
While the previous error groups belong to the classical influences, there are also errors caused by conscious and unconscious data manipulation. The conscious manipulations range from a prettification of measured values by manual data corrections, to a cyber attack, where actually correctly recorded values are changed by external forces afterwards or in real time.
This is in contrast to operating and programming errors, which can also lead to sensor data errors under certain circumstances, often due to non-intuitive software or a lack of qualification on expert systems.
The diversity of errors listed reveals that there are many reasons why data from a recording and an AI application built on it should not be trusted.
Besides the issue of how AI applications handle regular but unknown data, sensor data errors can creep into regular data from subtle to extreme. At the model output, the erroneous data then interferes with the algorithm, which is why detection of sensor data errors after the AI application is severely hampered to impossible.
Sensor data errors, however, show sensor-specific characteristics that can be used for self-checks directly at the data input or in the upstream measurement chain and electronics. Sensor data errors can therefore be analyzed in parallel or in the preprocessing of the data input. Deterministic as well as AI methods can be used for this purpose. The basis of such a chain of trust is the mastery of the entire signal chain of data processing with checks along the entire chain.
When designing an AI application, in addition to creating the actual AI capability with all its special challenges, the goal is always to catch as many sources of error as possible that arise from sensor data errors. The methods for this should in principle be independent of the actual AI application or training.
In the microservice approach of PANDA DRIFT, each service communicates a status and a health status about the state of the service. This unified communication channel can be used to aggregate errors from various parts of an application and display them on dashboards, connect to ticketing systems, and invoke automated routines. While the infrastructural solution for handling errors can be solved quite easily and universally with microservices, different techniques are used at different points in PANDA to monitor the sensor in the five groups.
The first group includes misadjusted sensors. PANDA's Sensor Device can read EEPROM chips installed in sensors and connectors. Calibration data can be stored and documented here. Sensor data is stored by DRIFT internally as mV units and noise reduced, so that a wrong calibration or a wrong conversion of units can be corrected afterwards without changing the already stored data. In addition, machines can often be specifically placed in a maintenance mode where the behavior of the sensors is known and they can be automatically adjusted.
In the second group of disconnects, there are test routines built into the PANDA Sensor Device that access hardware routines of the AD converter. The acquired data is cached locally in ring buffers on the SD card and subsequently synchronized and post-processed on demand when the connection is restored.
In the third group of outliers, it makes sense to create robust machine learning services. There are two strategies for this. Either the algorithms are trained directly with outliers or the data is checked for statistical features before the actual processing. Deterministic methods are often more effective in handling here than neural networks, for example, when it comes to detecting that a camera is covered with dust or a value is hanging. The causes of individual outliers can be investigated by evaluating the captured quantities with the exact filter characteristics or the Nyquist-Shannon criteria of the hardware, where peaks and drops may only occur in a certain intensity. These quantities are known exactly for the PANDA sensor devices.
In the fourth group of noise and uncertainty, the use of high quality electronics and sensor devices with low noise helps. By analog and digital filtering as well as oversampling strategies, the effect can be largely reduced to a non-critical level. Algorithmically, frequency-based denoising as well as regularization methods can be used to subsequently remove noise. However, this is always only second choice when one can optimize the signal chain on the other hand.
Dealing with the fifth group does not fall within the scope of signal theory, but can be countered by state-of-the-art IT security principles and a focus on good API and UX design. In particular, a Linux system can be well maintained with reduced and managed dependencies. PANDA relies on Linux for instances of DRIFT on PANDA compute devices.