Implementing PCA, Feedforward and Convolutional Autoencoders and using it for Image Reconstruction, Retrieval & Compression
My interaction with autoencoders is completely new. Naturally there will be some errors you might find in this blog post. I am actually going to implement some variants of autoencoders in Keras and write some theoretical stuffs along the way.
But first let’s get to know the first topic mentioned here. PCA or Principal Component Analysis.
Principal Component Analysis (PCA)
PCA is the most popular instance of second main class of unsupervised projection methods which is also known as Dimensionality Reduction methods.
If you find this writing about PCA dull. Go to this interactive visualization website to get more intuition.
AIM: The aim of PCA is to find small number of ‘directions’ in input space that explain variations in input data. Representing data by projecting along those direction.
Application
One of the important questions is, what are the useful applications of PCA?
 Visualization
 Preprocessing
 Modeling — Good prior for new data
 Data Compression
Intuition
 Assuming input data X with number of samples N with dimension of D. Representing as,
 Suppose the lower dimensional space is represented by M, so the objective would be to represent X in lower dimensional space M from dimension D. We can write
 PCA would search for orthogonal directions in space with respect to highest variance and then project data into this M subspace.
 Structure of data vectors is encoded in sample covariance.
Finding Principal Components
 To find the principal component directions, we have to centralize the data. Meaning, subtracting the sample mean from each variable.
 Then compute the empirical covariance matrix:
 Find the M eigenvectors with largest eigenvalues of C: These are the principal components
 Assemble these eigenvectors into a D X M matrix called U
 We can now express Ddimensional vectors x by projecting them to Mdimensional z
What it basically does is maximize variance while keeping the reconstruction error minimized.
PCA: Minimizing Reconstruction Error
Since we can think of PCA as projecting data onto a lowerdimensional subspace. ONe derivation is that we want to find the projection such that the best linear reconstructions of the data is as close as possible to the original data.
The loss function is simply MSE.
Where,
And we already know
PCA in a nutshell
Here is a summary of PCA for the impatients.
PCA : Autoencoders
I think this was enough to brush up the theories of PCA. Let’s get to the point, what is the relation between PCA and autoencoder, how could we define one and implement one in our favorite programming language Python and most favorite deep learning framework named Keras.
The goal is to encode the image information in lower dimensional space then reconstruct it again from encoded lower dimensional representation to original form.
Encoded space is lower dimensional so PCA has to learn the most important features from which it can decode it again to keep the Mean Square Error minimum.
Again some mathematical stuffs then we will get down to coding.
Thinking this with respect to image representation will help you to understand.
Encoding
Suppose we are doing some affine transformation on input data then encoding it to a latent space named z. The representation would look something like this one.
Here f is an activation function, we are keeping it linear for the time being.
Decoding
Great, we encoded all information of X into latent space z. Now we need to decode it back by some affine transformation again.
z being a function of f we can write,
Assuming g and f are linear activations. We can get rid of the functions.
Objective
The goal is to minimize this loss function with respect to W and V matrices.
What if g() is not linear, then we are basically doing nonlinear PCA.
Implementation of PCA Autoencoder
Yeah finally, but first, we need to download some dataset to test the autoencoder. No! Enough of MNIST dataset, let’s try something else to train on.
Here is the link of image data. And link of attributes. Download both and put them in one folder.
Project Structure
Autoencoders/

 lfw_dataset.py
 Autoencoder.ipynb

 data/

 lfw.tgz
 lfw_attributes.txt
Loading Dataset
I am going to use this script to load the dataset.
#Import Stuff
import numpy as np
from sklearn.model_selection import train_test_split
from lfw_dataset import load_lfw_dataset
import tensorflow as tf
import keras, keras.layers as L
s = keras.backend.get_session()
# Loading and normalizing [Might take some time]
X, attr = load_lfw_dataset(use_raw=True,dimx=38,dimy=38)
X = X.astype('float32') / 255.0
img_shape = X.shape[1:]
X_train, X_test = train_test_split(X, test_size=0.1, random_state=42)
# Checking out some images
%matplotlib inline
import matplotlib.pyplot as plt
plt.title('sample image')
for i in range(6):
plt.subplot(2,3,i+1)
plt.imshow(X[i])
print("X shape:",X.shape)
print("attr shape:",attr.shape)
Output
X shape: (13143, 38, 38, 3)
attr shape: (13143, 73)
Coding the PCA Autoencoder
We could actually implement the autoencoder in a couple of ways.
Here is one way,
code_size = 32
pca_autoencoder = keras.models.Sequential()
# Input layer
pca_autoencoder.add(L.InputLayer(img_shape))
# Flattening the layer
pca_autoencoder.add(L.Flatten())
# Encoded space
pca_autoencoder.add(L.Dense(code_size))
# Output units should be image_size * image_size * channels
pca_autoencoder.add(L.Dense(np.prod(img_shape)))
# Last layer
pca_autoencoder.add(L.Reshape(img_shape))
Model summary
Layer (type) Output Shape Param #
=================================================================
input_10 (InputLayer) (None, 38, 38, 3) 0
_________________________________________________________________
flatten_6 (Flatten) (None, 4332) 0
_________________________________________________________________
dense_11 (Dense) (None, 32) 138656
_________________________________________________________________
dense_12 (Dense) (None, 4332) 142956
_________________________________________________________________
reshape_5 (Reshape) (None, 38, 38, 3) 0
=================================================================
Total params: 281,612
Trainable params: 281,612
Nontrainable params: 0
_________________________________________________________________
Training the model
pca_autoencoder.compile('adamax', 'mse')
pca_autoencoder.fit(x=X_train,y=X_train,epochs=10, batch_size=500,
validation_data=[X_test,X_test])
Output
Train on 11828 samples, validate on 1315 samples
Epoch 1/10
11828/11828 [==============================]
loss: 0.0413  val_loss: 0.0130
Epoch 2/10
11828/11828 [==============================]
50us/step  loss: 0.0087  val_loss: 0.0074
Epoch 3/10
11828/11828 [==============================]
49us/step  loss: 0.0072  val_loss: 0.0070
Epoch 4/10
11828/11828 [==============================]
49us/step  loss: 0.0070  val_loss: 0.0069
Epoch 5/10
11828/11828 [==============================]
50us/step  loss: 0.0069  val_loss: 0.0069
Epoch 6/10
11828/11828 [==============================]
54us/step  loss: 0.0069  val_loss: 0.0068
Epoch 7/10
11828/11828 [==============================]
54us/step  loss: 0.0069  val_loss: 0.0068
Epoch 8/10
11828/11828 [==============================]
51us/step  loss: 0.0068  val_loss: 0.0068
Epoch 9/10
11828/11828 [==============================]
53us/step  loss: 0.0068  val_loss: 0.0068
Epoch 10/10
11828/11828 [==============================]
53us/step  loss: 0.0068  val_loss: 0.0068
Coding the autoencoder in other way
Visualization of Reconstructed Output and the Code itself
We are going to need a helper function to visualize the codes along with the outputs.
Output
Final MSE: 0.00678860718958
Making a Deep Autoencoder using Feedforward Neural Network
Autoencoders may be though of as being a special case of feedforward networks and can be trained with all of the same techniques.
General structure of an autoencoder is given below.
Where x is input and h is the internal representation and r is the reconstructed output from the representation.
Deep Feedforward Autoencoder
This image represents a rough idea, we are actually going to build an autoencoder deeper than the depicted image.
Sanity Checks
 There shouldn’t be any hidden layer smaller than bottleneck (encoder output)
 Adding nonlinearities between intermediate dense layers yield good result.
These are all examples of Undercomplete Autoencoders since the code dimension is less than the input dimension. If the encoder and decoder are allowed too much capacity, the autoencoder can learn to perform the copying task without extracting useful information about the distribution of data.
Deep Convolutional Autoencoder
Author of Keras has already explained and implemented variations of AE in his post. You can check this out.
Here’s another image from internet to visualize autoencoders in a more intuitive way.
Building Convolutional Autoencoder is simple as building a ConvNet, the decoder is the mirror image of encoder. That’s basically it!
Regularized ‘X’ Autoencoder
Simply add a kernel_regularizer
to the last layer of encoder. You can make any autoencoder regularized by this way. Here is the general step.
encoder.add(L.Dense(code_size, kernel_regularizer=keras.regularizer.l2(0.01))
Denoising Autencoder
Here is the computational graph from Deep Learning Textbook.
At training time I will add random gaussian noise to training dataset.
def apply_gaussian_noise(X,sigma=0.1):
noise = np.random.normal(loc=0.0, scale=sigma, size=X.shape)
return X + noise
# Clipping the images for showing
plt.subplot(1,4,1)
plt.imshow(X[0])
plt.subplot(1,4,2)
plt.imshow( np.clip(apply_gaussian_noise(X[:1],sigma=0.01)[0], 0, 1))
plt.subplot(1,4,3)
plt.imshow(np.clip(apply_gaussian_noise(X[:1],sigma=0.1)[0], 0, 1))
plt.subplot(1,4,4)
plt.imshow(np.clip(apply_gaussian_noise(X[:1],sigma=0.5)[0],0, 1))
Output
Preparing the model again for new code size.
encoder,decoder = build_deep_conv_autoencoder((44, 44, 3),code_size=512)
inp = L.Input(img_shape)
code = encoder(inp)
reconstruction = decoder(code)
autoencoder = keras.models.Model(inp,reconstruction)
autoencoder.compile('adamax','mse')
# Training with noise
for i in range(50):
print("Epoch %i/50, Generating corrupted samples..."%i)
X_train_noise = apply_gaussian_noise(X_train)
X_test_noise = apply_gaussian_noise(X_test)
autoencoder.fit(x=X_train_noise,y=X_train,epochs=1,
validation_data=[X_test_noise,X_test])
# Evaluation using noisy input
denoising_mse = autoencoder.evaluate(apply_gaussian_noise(X_test),X_test,verbose=0)
print("Final MSE:", denoising_mse)
for i in range(5):
img = X_test[i]
visualize(img,encoder,decoder)
Output
Great, we have implemented some of the autoencoders. So far autoencoders have not been useful to us. Let’s try to do some fun things using it.
This result was made using KNN with encoded size of 32. Not 512! To get similar result, you might have to train your autoencoder with this settings.
images = X_train
# Hashing the image with encoder
codes = encoder.predict(images)
def show_image(x):
plt.imshow(np.clip(x + 0.5, 0, 1))
# Fitting the codes
from sklearn.neighbors.unsupervised import NearestNeighbors
nei_clf = NearestNeighbors(metric="euclidean")
nei_clf.fit(codes)
def get_similar(image, n_neighbors=5):
assert image.ndim==3,"image must be [batch,height,width,3]"
code = encoder.predict(image[None])
(distances,),(idx,) = nei_clf.kneighbors(code,n_neighbors=n_neighbors)
return distances,images[idx]
def show_similar(image):
distances,neighbors = get_similar(image,n_neighbors=3)
plt.figure(figsize=[8,7])
plt.subplot(1,4,1)
show_image(image)
plt.title("Original image")
for i in range(3):
plt.subplot(1,4,i+2)
show_image(neighbors[i])
plt.title("Dist=%.3f"%distances[i])
plt.show()
# Cherry picked examples
# smiles
show_similar(X_test[247])
# ethnicity
show_similar(X_test[56])
# glasses
show_similar(X_test[63])
Output
That’s it for today! Codes will be uploaded to GitHub soon enough! I hope you had fun as much I had exploring autoencoders. Pretty neat thing, right? Thanks for reading!