Keras Library: Understanding Optimizer

Introduction

Hello everyone, the supporters across the globe the constant followers and viewers of The Datum blogs. The Datum was started with single motivation to transfer the best practices in Data Science and Machine Learning, and staying true to that we have been sailing over 70 countries today! Thanks a ton. With constant learning in Data Science and Machine Learning, I did realize algorithms can be best implemented in R if we well know the R libraries and their documentation. Once we have the understanding of library documentation we are well equipped as we know how those functions were built thus we can better expose them to implement our algorithms.

R Libraries Series

The Datum is happy to announce altogether a new series of blogs based on R libraries, where I am going to write blogs which are directly from the R library documentation. Through this, we will be best able to understand the libraries thus better implementing our algorithms.

Keras

Keras library was developed to make an idea of execution faster, the library is built with functions that can execute algorithms faster and seamlessly with the following features:

  1. Keras codes can be run on both CPU as well as GPU
  2. It has a very user-friendly API
  3. It has built-in support for various types of Neural Networks
  4. Keras work on layer level of Neural Networks, thus makes it easy to customize and build any deep learning networks
  5. Keras has the capability to work on top back-end like TensorFlow

Thus, it is a good and very big library to conquer today we will focus on how we set up a neural network model using Keras-TensorFlow interface and evaluate the performance of the model with different activation functions. We will discuss what is an activation function and what are different activation functions available with Keras library and evaluate each’s performance.

Building Model

Setting up Keras in R

We need to install Keras R package first, this can be directly downloaded from Github, below are the listings to set-up Keras and all other required libraries in R

Installing Keras and other required libraries

Keras, by default uses TensorFlow as back-end, to set up default Keras – TensorFlow settings in CPU use the following listing

CPU set-up for Keras-TensorFlow interface

The above will setup default CPU settings for Keras-TensorFlow interface.

Preparing Data

Our next step, get the data ready for modeling. Today for this exhibition I will be using MNIST handwritten digits dataset, they consist of 28 x 28 grayscale images of handwritten digits, dataset also includes labels for each image, thus telling us which digit it is. The best part is MNIST dataset is also available in Keras. We will now load the data in the R interface and prepare it for modeling.

Bringing data in R environment

The x data is the 3-dimensional data or a 3D array consisting of images, width, and height of grayscale values. We will reshape width and height into a single dimension i.e. we will flatten 28 x 28 images into a length of 784 vectors. This is the required steps in preparing data for training. Then we downscale grayscale values 0 – 255 to floating-point values of 0 – 1. Below is the listing for reshaping and re-scaling the data.

Reshaping and Re-scaling data train data

The y data are integer vectors or labels ranging from 0 to 9, we will perform the following conversion as listed below on the y data

Converting y data

The Model

As mentioned before we can customize our models using Keras, for building this neural network model, we will use an input layer argument input_shape for a length of 784 numeric vector which represents grayscale image. And, the final outputs a length of 10 numeric vectors i.e. (0 to 9) along with Softmax activation. Below is the listing for our model as described above:

Listing for Defining the Model
Model Summary – Console View

Defining and setting up the model is not the only thing which brings out the neural network, we need to compile it and train it for the number of epochs. In a compilation, we need to define the model’s loss function, model’s optimizer and model metrics of measurement. Since it is a categorical data we will be using loss function ‘categorical _crossentropy‘. The loss function is a measure of how much loss is model committing in training. The lesser the loss the more is the model accuracy. For metrics, we will be evaluating the Accuracy of the model performance. And lastly in the spotlight today the Optimizer; Optimizer ties together loss function and model parameters by updating the model in response to the output of the loss function. In simpler terms, optimizers shape and mold your model into its most accurate possible form by futzing with the weights. The loss function is the guide to the terrain, telling the optimizer when it’s moving in the right or wrong direction.

Nadam Optimizer

We will optimize our model with ‘Nadam‘ optimizer and train it for 30 epochs than we will evaluate the Loss and Accuracy for this compilation. Below are the listings

Model compilation with Nadam Optimizer and setting up epochs
Nadam optimizer epoch history for Loss and Accuracy

We can figure the loss and accuracy for Nadam optimizer from the graph, but we will evaluate the model on test data set, below is the listing.

Listing to evaluate model on test data
Output of model evaluation on test data

As we can see above, the Nadam optimizer has accuracy of 90.09% and loss of 0.074. Lets move to the other optimizer.

Adam Optimizer

Next for all the optimizer, we just need to make replace the optimizer in the listing of compiler while everything else remains the same. Than just run the model all over again. Below are the output of Adam optimizer.

Adam optimizer epoch history for Loss and Accuracy
Output of model evaluation on test data

Adam has loss of 0.0712 and accuracy of 98.18% on test data

Adamax Optimizer

Adamax optimizer epoch history for Loss and Accuracy
Output of model evaluation on test data

Adamax has loss of 0.0664 and accuracy of 98.08% on test data

Adadelta Optimizer

Adadelta optimizer epoch history for Loss and Accuracy
Output of model evaluation on test data

Adadelta has loss of 1.218 and accuracy of 77.61% on test data

Adagrad Optimizer

Adagrad optimizer epoch history for Loss and Accuracy
Output of model evaluation on test data

Adagrad has loss of 0.1864 and accuracy of 94.38% on test data

RMSprop Optimizer

RMSprop optimizer epoch history for Loss and Accuracy
Output of model evaluation on test data

RMSprop has loss of 0.1087 and accuracy of 98.13% on test data

SGD Optimizer

SGD optimizer epoch history for Loss and Accuracy
Output of model evaluation on test data

SGD has loss of 0.1483 and accuracy of 95.55% on test data

And its all done finally, great job if you could make it till here, this was our last optimizer, these are all the optimizer available with the Keras library.

Conclusion

  1. Keras is a very useful library in terms of customizing Neural networks on a layer level
  2. Keras works with the TensorFlow backend, in this case, to produce phenomenal results
  3. The optimizer is a very crucial element in building a neural network
  4. In terms of accuracy, Adam Optimizer is the best performed with an accuracy of 98.18%
  5. In terms of loss again Adam Optimizer is the best performed with loss of as little as 0.06

If you liked this content and interested in Data Science and Machine Learning; Like, Share and Subscribe to The Datum for weekly such practical algorithms in Data Science and Machine Learning.

Your Data Scientifically,

The Datum

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s