Implementing PCA, Feedforward and Convolutional Autoencoders and using it for Image Reconstruction, Retrieval & Compression
My interaction with autoencoders is completely new. Naturally there will be some errors you might find in this blog post. I am actually going to implement some variants of autoencoders in Keras and write some theoretical stuffs along the way.
But first let’s get to know the first topic mentioned here. PCA or Principal Component Analysis.
Principal Component Analysis (PCA)
PCA is the most popular instance of second main class of unsupervised projection methods which is also known as Dimensionality Reduction methods.
If you find this writing about PCA dull. Go to this interactive visualization website to get more intuition.
AIM: The aim of PCA is to find small number of ‘directions’ in input space that explain variations in input data. Representing data by projecting along those direction.
One of the important questions is, what are the useful applications of PCA?
- Modeling — Good prior for new data
- Data Compression
- Assuming input data X with number of samples N with dimension of D. Representing as,
- Suppose the lower dimensional space is represented by M, so the objective would be to represent X in lower dimensional space M from dimension D. We can write
- PCA would search for orthogonal directions in space with respect to highest variance and then project data into this M subspace.
- Structure of data vectors is encoded in sample covariance.
Finding Principal Components
- To find the principal component directions, we have to centralize the data. Meaning, subtracting the sample mean from each variable.
- Then compute the empirical covariance matrix:
- Find the M eigenvectors with largest eigenvalues of C: These are the principal components
- Assemble these eigenvectors into a D X M matrix called U
- We can now express D-dimensional vectors x by projecting them to M-dimensional z
What it basically does is maximize variance while keeping the reconstruction error minimized.
PCA: Minimizing Reconstruction Error
Since we can think of PCA as projecting data onto a lower-dimensional subspace. ONe derivation is that we want to find the projection such that the best linear reconstructions of the data is as close as possible to the original data.
The loss function is simply MSE.
And we already know
PCA in a nutshell
Here is a summary of PCA for the impatients.
PCA : Autoencoders
I think this was enough to brush up the theories of PCA. Let’s get to the point, what is the relation between PCA and autoencoder, how could we define one and implement one in our favorite programming language Python and most favorite deep learning framework named Keras.
The goal is to encode the image information in lower dimensional space then reconstruct it again from encoded lower dimensional representation to original form.
Encoded space is lower dimensional so PCA has to learn the most important features from which it can decode it again to keep the Mean Square Error minimum.
Again some mathematical stuffs then we will get down to coding.
Thinking this with respect to image representation will help you to understand.
Suppose we are doing some affine transformation on input data then encoding it to a latent space named z. The representation would look something like this one.
Here f is an activation function, we are keeping it linear for the time being.
Great, we encoded all information of X into latent space z. Now we need to decode it back by some affine transformation again.
z being a function of f we can write,
Assuming g and f are linear activations. We can get rid of the functions.
The goal is to minimize this loss function with respect to W and V matrices.
What if g() is not linear, then we are basically doing nonlinear PCA.
Implementation of PCA Autoencoder
Yeah finally, but first, we need to download some dataset to test the autoencoder. No! Enough of MNIST dataset, let’s try something else to train on.
I am going to use this script to load the dataset.
import numpy as np
from sklearn.model_selection import train_test_split
from lfw_dataset import load_lfw_dataset
import tensorflow as tf
import keras, keras.layers as L
s = keras.backend.get_session()
# Loading and normalizing [Might take some time]
X, attr = load_lfw_dataset(use_raw=True,dimx=38,dimy=38)
X = X.astype('float32') / 255.0
img_shape = X.shape[1:]
X_train, X_test = train_test_split(X, test_size=0.1, random_state=42)
# Checking out some images
import matplotlib.pyplot as plt
for i in range(6):
X shape: (13143, 38, 38, 3)
attr shape: (13143, 73)
Coding the PCA Autoencoder
We could actually implement the autoencoder in a couple of ways.
Here is one way,
code_size = 32
pca_autoencoder = keras.models.Sequential()
# Input layer
# Flattening the layer
# Encoded space
# Output units should be image_size * image_size * channels
# Last layer
Layer (type) Output Shape Param #
input_10 (InputLayer) (None, 38, 38, 3) 0
flatten_6 (Flatten) (None, 4332) 0
dense_11 (Dense) (None, 32) 138656
dense_12 (Dense) (None, 4332) 142956
reshape_5 (Reshape) (None, 38, 38, 3) 0
Total params: 281,612
Trainable params: 281,612
Non-trainable params: 0
Training the model
Train on 11828 samples, validate on 1315 samples
loss: 0.0413 - val_loss: 0.0130
50us/step - loss: 0.0087 - val_loss: 0.0074
49us/step - loss: 0.0072 - val_loss: 0.0070
49us/step - loss: 0.0070 - val_loss: 0.0069
50us/step - loss: 0.0069 - val_loss: 0.0069
54us/step - loss: 0.0069 - val_loss: 0.0068
54us/step - loss: 0.0069 - val_loss: 0.0068
51us/step - loss: 0.0068 - val_loss: 0.0068
53us/step - loss: 0.0068 - val_loss: 0.0068
53us/step - loss: 0.0068 - val_loss: 0.0068
Coding the autoencoder in other way
Visualization of Reconstructed Output and the Code itself
We are going to need a helper function to visualize the codes along with the outputs.
Final MSE: 0.00678860718958
Making a Deep Autoencoder using Feedforward Neural Network
Autoencoders may be though of as being a special case of feedforward networks and can be trained with all of the same techniques.
General structure of an autoencoder is given below.
Where x is input and h is the internal representation and r is the reconstructed output from the representation.
Deep Feedforward Autoencoder
This image represents a rough idea, we are actually going to build an autoencoder deeper than the depicted image.
- There shouldn’t be any hidden layer smaller than bottleneck (encoder output)
- Adding nonlinearities between intermediate dense layers yield good result.
These are all examples of Undercomplete Autoencoders since the code dimension is less than the input dimension. If the encoder and decoder are allowed too much capacity, the autoencoder can learn to perform the copying task without extracting useful information about the distribution of data.
Deep Convolutional Autoencoder
Author of Keras has already explained and implemented variations of AE in his post. You can check this out.
Here’s another image from internet to visualize autoencoders in a more intuitive way.
Building Convolutional Autoencoder is simple as building a ConvNet, the decoder is the mirror image of encoder. That’s basically it!
Regularized ‘X’ Autoencoder
Simply add a
kernel_regularizer to the last layer of encoder. You can make any autoencoder regularized by this way. Here is the general step.
Here is the computational graph from Deep Learning Textbook.
At training time I will add random gaussian noise to training dataset.
noise = np.random.normal(loc=0.0, scale=sigma, size=X.shape)
return X + noise
# Clipping the images for showing
plt.imshow( np.clip(apply_gaussian_noise(X[:1],sigma=0.01), 0, 1))
plt.imshow(np.clip(apply_gaussian_noise(X[:1],sigma=0.1), 0, 1))
Preparing the model again for new code size.
encoder,decoder = build_deep_conv_autoencoder((44, 44, 3),code_size=512)
inp = L.Input(img_shape)
code = encoder(inp)
reconstruction = decoder(code)
autoencoder = keras.models.Model(inp,reconstruction)
# Training with noise
for i in range(50):
print("Epoch %i/50, Generating corrupted samples..."%i)
X_train_noise = apply_gaussian_noise(X_train)
X_test_noise = apply_gaussian_noise(X_test)
# Evaluation using noisy input
denoising_mse = autoencoder.evaluate(apply_gaussian_noise(X_test),X_test,verbose=0)
print("Final MSE:", denoising_mse)
for i in range(5):
img = X_test[i]
Great, we have implemented some of the autoencoders. So far autoencoders have not been useful to us. Let’s try to do some fun things using it.
This result was made using KNN with encoded size of 32. Not 512! To get similar result, you might have to train your autoencoder with this settings.
images = X_train
# Hashing the image with encoder
codes = encoder.predict(images)
plt.imshow(np.clip(x + 0.5, 0, 1))
# Fitting the codes
from sklearn.neighbors.unsupervised import NearestNeighbors
nei_clf = NearestNeighbors(metric="euclidean")
def get_similar(image, n_neighbors=5):
assert image.ndim==3,"image must be [batch,height,width,3]"
code = encoder.predict(image[None])
(distances,),(idx,) = nei_clf.kneighbors(code,n_neighbors=n_neighbors)
distances,neighbors = get_similar(image,n_neighbors=3)
for i in range(3):
# Cherry picked examples
That’s it for today! Codes will be uploaded to GitHub soon enough! I hope you had fun as much I had exploring autoencoders. Pretty neat thing, right? Thanks for reading!