Implementing PCA, Feedforward and Convolutional Autoencoders and using it for Image Reconstruction, Retrieval & Compression

My interaction with autoencoders is completely new. Naturally there will be some errors you might find in this blog post. I am actually going to implement some variants of autoencoders in Keras and write some theoretical stuffs along the way.

But first let’s get to know the first topic mentioned here. PCA or Principal Component Analysis.

Principal Component Analysis (PCA)

PCA is the most popular instance of second main class of unsupervised projection methods which is also known as Dimensionality Reduction methods.

If you find this writing about PCA dull. Go to this interactive visualization website to get more intuition.

AIM: The aim of PCA is to find small number of ‘directions’ in input space that explain variations in input data. Representing data by projecting along those direction.


One of the important questions is, what are the useful applications of PCA?

  • Visualization
  • Preprocessing
  • Modeling — Good prior for new data
  • Data Compression


  • Assuming input data X with number of samples N with dimension of D. Representing as,
  • Suppose the lower dimensional space is represented by M, so the objective would be to represent X in lower dimensional space M from dimension D. We can write
  • PCA would search for orthogonal directions in space with respect to highest variance and then project data into this M subspace.
  • Structure of data vectors is encoded in sample covariance.

Finding Principal Components

  • To find the principal component directions, we have to centralize the data. Meaning, subtracting the sample mean from each variable.
  • Then compute the empirical covariance matrix:
  • Find the M eigenvectors with largest eigenvalues of C: These are the principal components
  • Assemble these eigenvectors into a D X M matrix called U
  • We can now express D-dimensional vectors x by projecting them to M-dimensional z

What it basically does is maximize variance while keeping the reconstruction error minimized.

PCA: Minimizing Reconstruction Error

Since we can think of PCA as projecting data onto a lower-dimensional subspace. ONe derivation is that we want to find the projection such that the best linear reconstructions of the data is as close as possible to the original data.

The loss function is simply MSE.


And we already know

PCA in a nutshell

Here is a summary of PCA for the impatients.

PCA : Autoencoders

I think this was enough to brush up the theories of PCA. Let’s get to the point, what is the relation between PCA and autoencoder, how could we define one and implement one in our favorite programming language Python and most favorite deep learning framework named Keras.

The goal is to encode the image information in lower dimensional space then reconstruct it again from encoded lower dimensional representation to original form.

Encoded space is lower dimensional so PCA has to learn the most important features from which it can decode it again to keep the Mean Square Error minimum.

Again some mathematical stuffs then we will get down to coding.

Thinking this with respect to image representation will help you to understand.


Suppose we are doing some affine transformation on input data then encoding it to a latent space named z. The representation would look something like this one.

Here f is an activation function, we are keeping it linear for the time being.


Great, we encoded all information of X into latent space z. Now we need to decode it back by some affine transformation again.

z being a function of f we can write,

Assuming g and f are linear activations. We can get rid of the functions.


The goal is to minimize this loss function with respect to W and V matrices.

What if g() is not linear, then we are basically doing nonlinear PCA.

Implementation of PCA Autoencoder

Yeah finally, but first, we need to download some dataset to test the autoencoder. No! Enough of MNIST dataset, let’s try something else to train on.

Here is the link of image data. And link of attributes. Download both and put them in one folder.

Project Structure

|---- Autoencoder.ipynb
|---- data/
|---- lfw.tgz
|---- lfw_attributes.txt

Loading Dataset

I am going to use this script to load the dataset.

#Import Stuff
import numpy as np
from sklearn.model_selection import train_test_split
from lfw_dataset import load_lfw_dataset
import tensorflow as tf
import keras, keras.layers as L
s = keras.backend.get_session()
# Loading and normalizing [Might take some time]
X, attr = load_lfw_dataset(use_raw=True,dimx=38,dimy=38)
X = X.astype('float32') / 255.0
img_shape = X.shape[1:]
X_train, X_test = train_test_split(X, test_size=0.1, random_state=42)
# Checking out some images
%matplotlib inline
import matplotlib.pyplot as plt
plt.title('sample image')
for i in range(6):
print("X shape:",X.shape)
print("attr shape:",attr.shape)


X shape: (13143, 38, 38, 3)
attr shape: (13143, 73)

Coding the PCA Autoencoder

We could actually implement the autoencoder in a couple of ways.

Here is one way,

code_size = 32
pca_autoencoder = keras.models.Sequential()
# Input layer
# Flattening the layer
# Encoded space
# Output units should be image_size * image_size * channels 
# Last layer 

Model summary

Layer (type)                 Output Shape              Param #   
input_10 (InputLayer) (None, 38, 38, 3) 0
flatten_6 (Flatten) (None, 4332) 0
dense_11 (Dense) (None, 32) 138656
dense_12 (Dense) (None, 4332) 142956
reshape_5 (Reshape) (None, 38, 38, 3) 0
Total params: 281,612
Trainable params: 281,612
Non-trainable params: 0

Training the model

pca_autoencoder.compile('adamax', 'mse'),y=X_train,epochs=10, batch_size=500,


Train on 11828 samples, validate on 1315 samples
Epoch 1/10
11828/11828 [==============================]
loss: 0.0413 - val_loss: 0.0130
Epoch 2/10
11828/11828 [==============================]
50us/step - loss: 0.0087 - val_loss: 0.0074
Epoch 3/10
11828/11828 [==============================]
49us/step - loss: 0.0072 - val_loss: 0.0070
Epoch 4/10
11828/11828 [==============================]
49us/step - loss: 0.0070 - val_loss: 0.0069
Epoch 5/10
11828/11828 [==============================]
50us/step - loss: 0.0069 - val_loss: 0.0069
Epoch 6/10
11828/11828 [==============================]
54us/step - loss: 0.0069 - val_loss: 0.0068
Epoch 7/10
11828/11828 [==============================]
54us/step - loss: 0.0069 - val_loss: 0.0068
Epoch 8/10
11828/11828 [==============================]
51us/step - loss: 0.0068 - val_loss: 0.0068
Epoch 9/10
11828/11828 [==============================]
53us/step - loss: 0.0068 - val_loss: 0.0068
Epoch 10/10
11828/11828 [==============================]
53us/step - loss: 0.0068 - val_loss: 0.0068

Coding the autoencoder in other way

Visualization of Reconstructed Output and the Code itself

We are going to need a helper function to visualize the codes along with the outputs.


Final MSE: 0.00678860718958

Making a Deep Autoencoder using Feedforward Neural Network

Autoencoders may be though of as being a special case of feedforward networks and can be trained with all of the same techniques.

General structure of an autoencoder is given below.

Figure 14.1 : From Deep Learning Book by Ian Goodfellow

Where x is input and h is the internal representation and r is the reconstructed output from the representation.

Deep Feedforward Autoencoder

Borrowed Image

This image represents a rough idea, we are actually going to build an autoencoder deeper than the depicted image.

Sanity Checks

  • There shouldn’t be any hidden layer smaller than bottleneck (encoder output)
  • Adding nonlinearities between intermediate dense layers yield good result.

These are all examples of Undercomplete Autoencoders since the code dimension is less than the input dimension. If the encoder and decoder are allowed too much capacity, the autoencoder can learn to perform the copying task without extracting useful information about the distribution of data.

Deep Convolutional Autoencoder

Author of Keras has already explained and implemented variations of AE in his post. You can check this out.

Stolen from Hacker Noon :P

Here’s another image from internet to visualize autoencoders in a more intuitive way.

Building Convolutional Autoencoder is simple as building a ConvNet, the decoder is the mirror image of encoder. That’s basically it!

Regularized ‘X’ Autoencoder

Simply add a kernel_regularizer to the last layer of encoder. You can make any autoencoder regularized by this way. Here is the general step.

encoder.add(L.Dense(code_size, kernel_regularizer=keras.regularizer.l2(0.01))

Denoising Autencoder

Here is the computational graph from Deep Learning Textbook.

At training time I will add random gaussian noise to training dataset.

def apply_gaussian_noise(X,sigma=0.1):

noise = np.random.normal(loc=0.0, scale=sigma, size=X.shape)

return X + noise
# Clipping the images for showing
plt.imshow( np.clip(apply_gaussian_noise(X[:1],sigma=0.01)[0], 0, 1))
plt.imshow(np.clip(apply_gaussian_noise(X[:1],sigma=0.1)[0], 0, 1))
plt.imshow(np.clip(apply_gaussian_noise(X[:1],sigma=0.5)[0],0, 1))


Preparing the model again for new code size.

encoder,decoder = build_deep_conv_autoencoder((44, 44, 3),code_size=512)
inp = L.Input(img_shape)
code = encoder(inp)
reconstruction = decoder(code)
autoencoder = keras.models.Model(inp,reconstruction)
# Training with noise
for i in range(50):
print("Epoch %i/50, Generating corrupted samples..."%i)
X_train_noise = apply_gaussian_noise(X_train)
X_test_noise = apply_gaussian_noise(X_test),y=X_train,epochs=1,
# Evaluation using noisy input
denoising_mse = autoencoder.evaluate(apply_gaussian_noise(X_test),X_test,verbose=0)
print("Final MSE:", denoising_mse)
for i in range(5):
img = X_test[i]


Great, we have implemented some of the autoencoders. So far autoencoders have not been useful to us. Let’s try to do some fun things using it.

This result was made using KNN with encoded size of 32. Not 512! To get similar result, you might have to train your autoencoder with this settings.

images = X_train
# Hashing the image with encoder
codes = encoder.predict(images)
def show_image(x):
plt.imshow(np.clip(x + 0.5, 0, 1))
# Fitting the codes
sklearn.neighbors.unsupervised import NearestNeighbors
nei_clf = NearestNeighbors(metric="euclidean")
def get_similar(image, n_neighbors=5):
assert image.ndim==3,"image must be [batch,height,width,3]"
code = encoder.predict(image[None])

(distances,),(idx,) = nei_clf.kneighbors(code,n_neighbors=n_neighbors)

return distances,images[idx]
def show_similar(image):

distances,neighbors = get_similar(image,n_neighbors=3)

plt.title("Original image")

for i in range(3):
# Cherry picked examples
# smiles
# ethnicity
# glasses 


That’s it for today! Codes will be uploaded to GitHub soon enough! I hope you had fun as much I had exploring autoencoders. Pretty neat thing, right? Thanks for reading!