Camera applications and image recognition have been used for quality control in industrial production for about 30 years. This has always involved a considerable amount of manual feature extraction. That is why they have so far only been used for particularly sensitive production steps. The engineering effort for each task is individual and exceeds by far the investment costs in hardware and software. Compared to other sensors, image recognition has the advantage that it can be used for a wide range of scenarios and works without contact. Computer vision or methods of machine vision based on neuron networks offer new possibilities. On the one hand, classifiers are already being used successfully on an industrial scale, although manually pre-classified data are used. The labeling of the data requires considerable effort and the results form a black box for the user. Changes in the production environment require the process to be repeated.
For some time neural networks have been researched, the structure of which leads to so-called autoencoders. In the basic principle, neuronal layers are first rejuvenated and then expanded again. Instead of correctly determining the label for a new data set, the nets are trained to correctly reproduce the image created at the input at the end of the net despite the bottleneck. If the training is successful, for each image in the bottleneck all image information or variation possibilities are available in a few numerical values and thus in coded or compressed form. In addition, the quality can be easily checked by reproducing the input and variations in the bottleneck provide information about the behaviour of the net. This is also referred to as Unsupervised Learning, while Supervising is understood as the provision of further information about the data, such as exemplary labels.
By applying the trained autoencoder, each image is now represented by a number vector in Feature Space. With 2 numerical values, the space is 2-dimensional. Each image corresponds to one point.
More precisely, the degrees of freedom of an image series are represented in Feature Space, while the exact image structure is statically represented in the weights of the neural network.
The degrees of freedom can now be used instead of a manually programmed feature of classical image recognition, e.g. for condition monitoring. The manual engineering process is eliminated. Thus, cameras can be used for quality control without the manual trade-off between cost and benefit.
In order to demonstrate this, an experimental setup was first conceived that is relevant to practice, includes a flexible setup and provides defined laboratory conditions. The setup is a miniaturized door that can be opened and closed continuously. At the same time, the illumination intensity can be varied, allowing two independent external and randomly generated lines of freedom to be set, which are now to be recognized as such, continuous degrees of freedom by the auto-encoder.
Figure 3 shows an optimized construction with Lego bricks and an electronic control of door angle and lighting, which has compact dimensions of 25x25x25. The control of the motor and the lighting is done by a Raspberry Pi and the training and real-time information is done on a Nvidia Jetson Nano.With a webapp the training can be controlled remotely and interactively.
The image from the Basler industrial camera used was scaled down to 200x200 pixels and a suitable lens was selected to capture the scene without autofocus.
A classic auto-encoder is characterized by good compression capability and good reconstruction quality. However, it creates discontinuities in the feature space. This is expressed, for example, by the fact that the opening of the door is not mapped continuously but has a discontinuity and maps the feature space into two segments. Feature vectors that do not lie exactly on the curves provide undefined and unusable reconstructions. In order to be able to use any combination of feature vectors and thus use the whole feature space, the concept of GAN (Generative Adversarial Networks) was introduced. However, this is not appropriate for condition monitoring. For this particular application, the focus is not on a high-quality reconstruction and arbitrary combination of features, but rather on a feature space that is as compact as possible, which prevents discontinuities, but is not intended to map every combination. For this compromise, a classic auto-encoder has proven itself, which is extended by a second discriminator network that provides a defined distribution of the Feature Space and thus ensures continuity. One could say that this Adversarial Autoencoder combines the advantages of both approaches. In the following the code for building the network in Keras is shown, a framework for modeling neural networks that allows the use of different frameworks in the backend. Here both Tensorflow and plaidML were used alternatively, but no major differences were found:
classAAE():
\# -----------------------------------------------------------------------------
\# Build the adversarial autoencoder and load the features from the last
\# training session.
def\_\_init\_\_(self,epochs=5,batchSize=32):
self.imageShape=(200,200,3)
self.latentDim=2
self.channels=3
self.epochs=epochs
self.batchSize=batchSize
self.latentPoints=\[]
optimizer=Adam(0.0002,0.5)
metric='accuracy'
binaryLoss='binary_crossentropy'
\# build discriminator
self.discriminator=self.buildDiscriminator()
self.discriminator.compile(optimizer=optimizer,loss=binaryLoss,
metrics=\[metric])
\# build encoder and decoder
self.encoder=self.buildEncoder()
self.decoder=self.buildDecoder()
\# define picture input
image=Input(shape=self.imageShape)
\# encoderResult are the features extracted
\# from the the images by the encoder
encoderResult=self.encoder(image)
\# decoderResult are the images reconstructed by the decoder
\# from the encoderResult
decoderResult=self.decoder(encoderResult)
\# set discriminator to untrainable
forlayerinself.discriminator.layers:
layer.trainable=False
\# discriminatorResult is the evaluation of the encoderResult
\# by the discriminator
discriminatorResult=self.discriminator(encoderResult)
\# create the autoencoder Model by assigning images as input and
\# reconstructed images (decoderResult) & discriminatorResult as outputs
self.autoencoder=Model(image, \[decoderResult, discriminatorResult])
\# compiling autoencoder, setting lossfuntion for the difference
\# between input image and reconstructed image to MSE &
\# lossfunction of discriminator result to Binary Crossentropy
self.autoencoder.compile(optimizer=optimizer,
loss=\['mse','binary_crossentropy'],
loss_weights=\[0.999,0.001])
self.loadLatentPoints()
\# -----------------------------------------------------------------------------
\# Build the encoder and return the Model
defbuildEncoder(self):
encoder=Sequential()
encoder.add(Conv2D(32,kernel_size=5,
input_shape=self.imageShape,padding='same',
kernel_initializer='random_uniform',
bias_initializer='zeros'))
encoder.add(LeakyReLU(alpha=0.2))
encoder.add(MaxPooling2D(pool_size=(2,2)))
encoder.add(Conv2D(64,kernel_size=5,padding='same',
kernel_initializer='random_uniform',
bias_initializer='zeros'))
\# encoder.add(BatchNormalization(axis=1))
encoder.add(LeakyReLU(alpha=0.2))
encoder.add(MaxPooling2D(pool_size=(5,5)))
encoder.add(Conv2D(128,kernel_size=5,padding='same',
kernel_initializer='random_uniform',
bias_initializer='zeros'))
\# encoder.add(BatchNormalization(axis=1))
encoder.add(LeakyReLU(alpha=0.2))
encoder.add(MaxPooling2D(pool_size=(5,5)))
encoder.add(Flatten())
encoder.add(Dense(self.latentDim,kernel_initializer='random_uniform',
bias_initializer='zeros'))
encoder.summary()
image=Input(shape=(self.imageShape))
latentCode=encoder(image)
returnModel(image, latentCode)
\# -----------------------------------------------------------------------------
\# Build the decoder and return the Model
defbuildDecoder(self):
decoder=Sequential()
decoder.add(Dense(128\*4\*4,activation="relu",
input_dim=self.latentDim,
kernel_initializer='random_uniform',
bias_initializer='zeros'))
decoder.add(Reshape((4,4,128)))
\# decoder.add(BatchNormalization())
decoder.add(LeakyReLU(alpha=0.2))
decoder.add(Conv2DTranspose(64,kernel_size=5,strides=5,
padding="same",
kernel_initializer='random_uniform',
bias_initializer='zeros'))
\# decoder.add(BatchNormalization())
decoder.add(LeakyReLU(alpha=0.2))
decoder.add(Conv2DTranspose(32,kernel_size=5,strides=5,
padding="same",
kernel_initializer='random_uniform',
bias_initializer='zeros'))
\# decoder.add(BatchNormalization())
decoder.add(LeakyReLU(alpha=0.2))
decoder.add(Conv2DTranspose(self.channels,kernel_size=5,strides=2,
padding="same",
kernel_initializer='random_uniform',
bias_initializer='zeros'))
\# decoder.add(BatchNormalization())
decoder.add(LeakyReLU(alpha=0.2))
decoder.add(Activation("tanh"))
decoder.summary()
z=Input(shape=(self.latentDim,))
reconstructionResult=decoder(z)
returnModel(z, reconstructionResult)
\# -----------------------------------------------------------------------------
\# Build the discriminator and return the Model
defbuildDiscriminator(self):
discriminator=Sequential()
discriminator.add(Dense(300,activation='relu',
input_dim=self.latentDim,
kernel_initializer='random_uniform',
bias_initializer='zeros'))
discriminator.add(Dense(300,activation='relu',
kernel_initializer='random_uniform',
bias_initializer='zeros'))
discriminator.add(Dense(1,activation='sigmoid',
kernel_initializer='random_uniform',
bias_initializer='zeros'))
discriminator.summary()
encoderResult=Input(shape=(self.latentDim,))
discriminatorResult=discriminator(encoderResult)
returnModel(encoderResult, discriminatorResult)
\# -----------------------------------------------------------------------------
\# Train the adversarial autoencoder model, save the features & model
\#
\# INPUT: - "filepaths" path to the folder containing the training data
\# relative to the folder this python code is
\# located in.
deftrain(self,filepaths):
forepochinrange(self.epochs):
random.shuffle(filepaths)
batches=len(filepaths)//self.batchSize
valid=np.ones((self.batchSize,1))
fake=np.zeros((self.batchSize,1))
foriinrange(batches):
print('epoch:{}batch:{}/{}'.format(epoch+1, i+1,
batches))
images=self.loadImages(
filepaths\[i\*self.batchSize:(i+1)\*self.batchSize])
latent_fake=self.encoder.predict(images)
latent_real=np.random.uniform(-2,2,size=(self.batchSize,
self.latentDim))
discriminator_loss_real=self.discriminator.train_on_batch(
latent_real, valid)
discriminator_loss_fake=self.discriminator.train_on_batch(
latent_fake, fake)
discriminator_loss=0.5*np.add(
discriminator_loss_real, discriminator_loss_fake)
genLoss=self.autoencoder.train_on_batch(
images, \[images, valid])
print(discriminator_loss\[0], genLoss\[0])
self.drawLatentPoints(filepaths)
self.saveModel()
\# -----------------------------------------------------------------------------
\# Additiuonal helper functions as well as the whole code for the backend
\# can be found in the git repo
\# -----------------------------------------------------------------------------
Frontend was designed in a way that different images can be displayed in the Backend and trained with a preconfigured network at the push of a button. The features were mapped in a diagram in real time and a rectangular window selection allows a rudimentary condition monitoring on the features.