Correlation and Causality

Correlation and causality - does this still play a role for AI?

Oct 2019
Michael Welsch

Correlation and causality are both words with which a connection is expressed. While one word comes from statistics, the other is from physics. The differences are also based on these different fundamentals.

Correlation refers to a relationship between the expressions of two observations in a series of repetitions. If the observation of A changes, the observation of B also tends to change. The correlation coefficient is one way of quantifying such a correlation. The decisive factor, however, is the fact that repetitions are observed, because no statistics can be made without repetitions. One recognizes the connection by repetition and variations in these repetitions.

Causality refers to a causal relationship that exists independently of the form of observation of A and B and the necessity of repetitions between A and B. The causal relationship between A and B is not a causal relationship. When A changes, B changes or vice versa. Typically, a differential equation describes the general relationship between A and B and the boundary conditions describe the concrete expressions of the observations and the direction of action of A and B. The difference between A and B is the difference between A and B. The difference between A and B is the difference between A and B. Furthermore, physics requires that the observations must not be the source of the modelling, but must be derived from an independent principle. Often the distinction between the observations of A and B and the underlying internal events or states is made. Observations and models may be used to deduce inner states. One must however always say, what is a boundary condition, which equals the status of an adjustment parameter in the sense of an effect cause. If one models the connection of such a parameter and validates the result, a causal connection simply exists a-priori. Potentially causal correlations always arise when a human being tries a technical device and thereby raises an observation to a parameter or boundary condition, can change it in a controlled way and places it in a context with further observations. Thus, a direct cause-effect relationship is purposefully constructed and the relationship systematically isolated from environmental influences. One recognizes the connection by systematic simplification and isolation of influences.

Even if in the background an unknown third influence quantity acts and is completely responsible for the expressions of two observations, the expression of A cannot be changed independently of B, without these expressions having to be possible by the third influence quantity. Thus, it is not possible to distinguish whether there is a third influence if B can be predicted perfectly from A.

Let me give you an example. Let’s take an opening and closing door. On the one hand we determine the opening angle with a protractor, on the other hand we take a photo of the scene. Both are independent observations. Imagine you can change the angle directly. The question is how this affects the image. For this purpose, we model the relationship between angle and image to predict the image based on angle. With observations, it is now the case, at least in classical physics, that events are only observed and observations should not influence any parameters, at least in a good experimental setup. Now a human being, the third influence in our example, pulls on the fishing line at different times of the day and the change in the image depends not only on the position of the door but also on the ambient light. This does not change the basic relationship between the picture and the angle. If one extracts these two features from the image using appropriate mathematical methods, one feature correlates with the angle and the other feature with the time of day, while angle and time of day do not correlate.

The presence of additional features would mean that the cause of the variation in pixel values cannot be fully predicted by time of day and angle. In this case, there is an information gap.

This additional degree of freedom of the image is interesting from an information-theoretical point of view and causes unpredictable surprises. From a process engineering point of view, however, this means nothing more than an uncertain process. Of course, it would be best if the image, provided it is a quality feature, could be predicted completely from independent sources of information. Then the picture is boring from an information-theoretical point of view and the process is stable. Each undesired characteristic can be traced back to a specific characteristic of an observation, which ideally is an influenceable parameter. With regard to the question whether correlation and causality still play a role for AI, I would say that it is precisely the non-correlating and unknown relationships that are of greatest interest for AI applications. These can be both a source of danger and at least always a source of new knowledge. An AI based on neural networks that only predicts what is already known, especially if it needs excessive training, and does not classify the unknown as the unknown, but simply interprets the next best known, is not capable of doing so. The principle of correlation and causality is therefore of particular interest in order to adapt the principle of human attention, curiosity and vigilance. An AI must recognize its own limits of recognition and deliver statements on them at any time, a “I know, that I know nothing”, ability, but gladly in quantized and not binary form

This ability will be an essential criterion to make an AI auditable. The auditability of algorithms is estimated to be the decisive criterion for using such algorithms for automation from about 2025 onwards. Correlation and causality is therefore a top topic for AI.

Follow me on
We do not only optimize production processes, but also our website! For this, we use tools such as cookies for analysis and marketing purposes. You can change your cookie settings at any time. Information and Settings