The concept of entropy of a data space can be considered as the information theoretical basis of the SixSigma concept for the proof of process capability and represents a generalization of this concept for arbitrarily complex data. The mathematics of metric spaces, which we provide in our open source framework METRIC, makes this possible. Mathematically, the calculation of metric entropy is based on a multidimensional Gaussian integration using a neighborhood graph of the respective metric space.
Let us consider metric entropy as a measure for a motion pattern that changes with increasing size towards more diversity.
A pattern is used to put individual observations into context. A person with arms, legs, torso and head remains a single person, no matter what posture he or she is in. Each posture corresponds to the observation, e.g. in the form of a photograph. The pattern is the regularity that connects these different photos.
A pattern is inseparably connected with a collection of observation, but not with a temporal sequence. Each observation stands alone. We can mix all the photos of a video. When calculating entropy, the continuous relationship is automatically established by a graph. It is not a condition and certainly not a necessity for pattern recognition.
This becomes clear when you look at entropy in statistical physics, because here it is the same principle.
The temperature of a molecule is proportional to the kinetic energy, which in turn is a measure of the scale with which the possibilities of the molecular state occur, but not which states occur structurally. This sum of states is described by entropy. Thermal energy can be added to an ice cube at 0°C without changing the temperature. Entropy is transferred here until finally the aggregate state of the ice cube tilts and the molecules in the form of liquid have more freedom of movement. Only after these new, structural possibilities have been created does the temperature or kinetic energy continue to rise.
Instead of investigating whether and when a pattern is traversed by the trajectory (e.g. a complex vibration in space), one assigns a probability of being observed to each pattern. A sine is therefore not observed as a continuous function, but as a loose collection of individual values. It does not matter in which phase the sine is currently located.
Entropy is a measure of the number of possible observations and thus a measure of the pattern itself. The metric or differential entropy is related to the intrinsic dimension. More precisely, entropy is the logarithm of the length of a trajectory through data space. However, since we do not assume exactly such a trajectory, a conversion is not possible in principle and entropy is the much more general measure.
The following animation shows for different example data sets the calculated metric (or differential) entropy for a geometric data set.
The entropy increases with increasing line of freedom or possible whereabouts or the variety of observations.
Entropy is an extrinsic quantity and is therefore dependent on the scaling of the data. Thus, changes in scaling have a direct effect on the entropy. To compare two data spaces based on metric entropy, the data spaces should therefore be normalized. Another special feature is that the smallest differential entropy does not go towards zero but towards minus infinity, so the entropy can take on negative values.
Let’s compare the entropy of the MNIST record. The digit 1 has by far the smallest entropy, followed by the 7th digit. The other digits have a comparable entropy.
So how can this knowledge be applied in practice?
Simply put, you don’t want your production process to seemingly spontaneously change its state of aggregation.
Although a process appears to be the same macroscopically (like the temperature of the ice cube), it is very likely that there is already a continuous increase in entropy that will cause the system to spontaneously collapse at some point. That the entropy increases continuously is not only a theoretical possibility. Assume that this increase always occurs in practice. A system such as an automated plant, which stands without external influence, always develops to a state of highest entropy. A scattering always develops in the direction of normal distribution, since this is the state with the highest entropy of all distributions (at least the unlimited ones). Such a process is finally in balance. In other words, it is stable.
However, a production machine is not closed off from the environment, so that the entropy and thus the equilibrium is constantly influenced from the outside and a new equilibrium with higher entropy is always established. This means greater scattering. If you readjust the process, you remove entropy from the system again.
The inevitable increase in entropy is the reason why machines need to be regularly maintained and readjusted, because your machine will become a better process by itself, is possible but about as likely as a stone suddenly floating upwards due to local pressure differences in the air.
The entropy of the data room can be ideally used for analysis, verification and monitoring of process capability. An increase in entropy can be determined and monitored by the continuous evaluation of observations using sensors, this concept can be used to achieve a completely new level for condition monitoring or a predictive maintenance application. Furthermore, metric entropy is a crucial building block for the creation of an autonomous process, since the calculation works for any data type and process and is an ideal target for automated optimization.