Generally, this is known from mathematics or physics as follows: In order to illustrate anything, the first step is to create a coordinate system (usually a two- or three-dimensional Cartesian coordinate system). Geometry and differential calculus now take place in this space. Space is actually always the three-dimensional Euclidean space, in which we have lived for thousands of years and for which we therefore have a good intuitive feeling.
Beside the Euclidean space there are even more abstract vector spaces, but they are not the subject of today’s blog article. In fact, we completely dispense with the preconditions of vectors and algebraic structures and instead look at what we can calculate without them. Unlike in mathematics or physics, we gain this new (axis) freedom through requirements that are common and even omnipresent in data science. Well-defined vector spaces are almost never found in the wild.
By the lack of algebraic structures, or rather by omitting the preconditions for algebraic structures, most of the known operations of mathematics are absent for the moment. This includes, among others, the summation or multiplication of elements, but also integrals and differentials, because we no longer have well-defined objects. So first of all there is a feeling of homelessness or helplessness; or to put it in a positive light: it feels a backpacking trip to an unknown country. In our backpack, however, at least the numbers (or more generally speaking: elements) as such are still contained. Using the principles of set theory, we build a new home for ourselves by setting up tents locally and spontaneously. Our metric space offers us shelter.
Since a metric space is a set, it can be represented with a frame, following the set theory (see Fig. 2).
This set now contains elements, namely data points, observations and entries (Fig. 3). The terms are all synonymous with each other in the present context.
The number of elements in this space is finite. Because of the non-existence of vectors there is no zero and no direction, so there is no absolute position in space. Everything is “relative”.
The elements are recorded in some kind of table (Tab. 1).
Each row in this table corresponds to a related group of observations. Each column corresponds to a non-coherent observation.However, this convention is only useful for the clarity of complex and nested data types but is not a basic requirement. The “x” can be any complex data type or digital representation of an observation, starting with a single number, a curve, an image up to a collection of images in the sense of a sample collection. Everything in a single row of this table is one (1!) data point. Besides numbers, character strings or sentences are also allowed as information.
In order to turn these elements into a metric space, a distance function is defined per column. This function can be used to calculate the distance between two elements within the column, so that we can compare whole rows over their distance to each other.
This function, which is usually referred to and named metric, is subject to a few requirements, which we will discuss in more detail in another blog. Since only the distance between two elements has to be calculated, there is no need for an absolute coordinate system, which is thought to span a grid behind the data points. In principle, each metric describes the minimal costs of an information technology recoding under constraints. Simple metrics such as the Euclidean metric can be given as an explicit formula. In general, however, an optimization problem is solved in each case to compare two data points. The distance is the minimum required effort of a recoding. In Euclidean space it corresponds to the flight lines (Fig. 4).
The mere presence of this function and a collection of data points is enough to turn our table into a metric space. It is already implicitly defined by this, without the need to calculate anything.We obtain an explicit representation of the metric space by calculating all distances in pairs with the metric and entering them in a matrix called the distance matrix.
The diagonal entries of this matrix are zero. It is symmetric and positive definite or has a determinant greater than or equal to 0. These properties are guaranteed by the metric..
In principle, a metric space can be visualized using a graph, whose undirected edges we assign the value of the respective distance of the element pair.
All in all, the graph is the adequate means of visualization. However, a complete distance matrix cannot be interpreted intuitively and is very complex in terms of computation and storage due to its quadratic complexity. For an explicit representation not all paired distances have to be used. This graph can be thinned out in different steps up to a tree. This results in structures or allows structures to be made visible.
So now we have a space where we can calculate the distances between the elements. Due to the absence of an underlying grid or vectors, the individual elements such as strings or images still cannot be added, subtracted, multiplied etc.
First, certain properties of metric space can be calculated, such as entropy, which is a measure of intrinsic dimensionality and information diversity. Furthermore, the distortion can be determined, which checks how different the metric space is compared to a typical vector representation.Additionally, metric spaces can be mapped into classical vector spaces and back again - the spontaneous tents mentioned at the beginning. In this encoded form of a vector space, the usual operations such as interpolations, averaging, statistics, etc. can then be performed in familiar surroundings.However, metric spaces can also be directly related. For example, the correlation of two Metric Spaces, the entropy of mixtures or a distance between two Metric Spaces can be calculated by itself, which thus represent an element in itself. Let us take a look at the latter in an example.
For this purpose we are going to consider the well-known MNIST data set in the form of a sample collection.Each row of our table contains a vector with about 5000 data points of black and white images representing handwritten numbers (0-9) - a disordered sample collection.
So we have 10 rows and 1 column. The complex data type in each row is a vector, which contains any number of 2D arrays (the images) instead of single numbers.We will now first transform the images (28x28 pixels = 784 degrees of freedom) into a dimension reduced vector space with 20 dimensions using a linear feature encoder. Using a Euclidean metric, we will now map each individual 28x28 character into a new, low-dimensional but information-technologically equivalent metric space. We now define a superior metric on these 10 dimensionally reduced spaces and calculate the paired distances between the 10 rows of the table. We obtain a matrix from which we can read how the characters differ, taking into account the variation of each individual character: The distance matrix.
The distance matrix shows that the “1” has the largest distance to all other characters. Thus it is the most dissimilar to the other digits and therefore the best distinguishable character.
So in this example we have nested several times and demonstrated how to work systematically with the concept of metric spaces to process and relate complex information. This makes it easy to live in our new home - even with a single column.
In a subsequent blog, we will continue to explore this concept by looking at the metrics once selected. We will also show that it is possible to transfer methods from classical machine learning, such as a metric decision tree, which works on arbitrarily complex and combined data structures instead of individual numerical numbers.