Creating and Training Custom Layers in TensorFlow 2

Learning to create your own custom layers and training them in TensorFlow 2

Arjun Sarkar

Published in

Towards Data Science

8 min readJun 24, 2021

Previously we’ve seen how to create custom loss functions — Creating custom Loss functions using TensorFlow 2
Next, I wrote about creating custom Activation Functions using Lambda layers — Creating Custom Activation Functions with Lambda Layers in TensorFlow 2

This is the third part of the series, where we create custom Dense Layers and train them in TensorFlow 2.

Introduction:

Lambda layers are simple layers in TensorFlow that can be used to create some custom activation functions. But lambda layers have many limitations, especially when it comes to training these layers. So, the idea is to create custom layers that are trainable, using the inheritable Keras layers in TensorFlow — with a special focus on Dense layers.

What is a Layer?

Figure 1. Layer — Dense Layer representation (Source: image created by Author)

A layer is a class that receives some parameters, passes them through state and computations, and passes out an output, as required by the neural network. Every model architecture contains multiple layers, be it a Sequential or a Functional API.

State — Mostly trainable features which are trained during ‘model.fit’. In a Dense layer, the states constitute the weights and the bias, as shown in Figure 1. These values are updated to give better results as the model trains. In some layers, the state can also contain non-trainable features.

Computation — Computation helps in transforming a batch of input data into a batch of output data. In this part of the layer, the calculation takes place. In a Dense layer, the computation does the following computation —

Y = (w*X+c), and returns Y.

Y is the output, X is the input, w = weights, c = bias.

Creating a custom Dense Layer:

Now that we know what happens inside Dense layers, let’s see how we can create our own Dense layer and use it in a model.


import tensorflow as tf
from tensorflow.keras.layers import Layer

class SimpleDense(Layer):

    def __init__(self, units=32):        '''Initializes the instance attributes'''        super(SimpleDense, self).__init__()
        self.units = units

    def build(self, input_shape):        '''Create the state of the layer (weights)'''        # initialize the weights
        w_init = tf.random_normal_initializer()
        self.w = tf.Variable(name="kernel",   initial_value=w_init(shape=(input_shape[-1], self.units),
                 dtype='float32'),trainable=True)

        # initialize the biases
        b_init = tf.zeros_initializer()
        self.b = tf.Variable(name="bias",initial_value=b_init(shape=(self.units,), dtype='float32'),trainable=True)

    def call(self, inputs):        '''Defines the computation from inputs to outputs'''        return tf.matmul(inputs, self.w) + self.b

Explanation of the code above — The class is named SimpleDense. When we create a custom layer, we have to inherit Keras’s layer class. This is done in the line ‘class SimpleDense(Layer)’.

‘__init__’ is the first method in the class that will help to initialize the class. ‘init’ accepts parameters and converts them to variables that can be used within the class. This is inheriting from the ‘Layer’ class and hence requires some initialization. This initialization is done using the ‘super’ keyword. ‘units’ is a local class variable. This is analogous to the number of units in the Dense layer. The default value is set to 32, but can always be changed when the class is called.

‘build’ is the next method in the class. This is used to specify the states. In the Dense layer, the two states required are ‘w’ and ‘b’, for weights and biases. When the Dense layer is being created, we are not just creating one neuron of the network’s hidden layer, but multiple neurons at one go (in this case 32 neurons will be created). Every neuron in the layer needs to be initialized and given some random weight and bias values. TensorFlow contains many built-in functions to initialize these values.

For initializing the weights we use the ‘random_normal_initializer’ function from TensorFlow, which will initialize weights randomly using a normal distribution. ‘self.w’ contains the states of the weights in the form of a tensor variable. These states will initialize using ‘w_init’. The value contained as weights will be in the ‘float_32’ format. It is set to ‘trainable’, which means after every run, these initial weights will be updated in accordance with the loss function and optimizer. The name ‘kernel’ is added so that it can be easily traced later.

For initializing the biases, TensorFlow’s ‘zeros_initializer’ function is used. This sets all the initial bias values to zero. ‘self.b’ is a tensor with a size same as the size of the units (here 32), and each of these 32 bias terms are set to zero initially. This is also set to ‘trainable’, so the bias terms will update as training starts. The name ‘bias’ is added to be able to trace it later.

‘call’ is the last method that performs the computation. In this case, as it is a Dense layer, it multiplies the inputs with the weights, adds the bias, and finally returns the output. The ‘matmul’ operation is used as self.w and self.b are tensors and not single numerical values.

# declare an instance of the class my_dense = SimpleDense(units=1)  # define an input and feed into the layer x = tf.ones((1, 1)) 
y = my_dense(x)  # parameters of the base Layer class like `variables` can be used print(my_dense.variables)

Output:

[<tf.Variable 'simple_dense/kernel:0' shape=(1, 1) dtype=float32, numpy=array([[0.00382898]], dtype=float32)>, <tf.Variable 'simple_dense/bias:0' shape=(1,) dtype=float32, numpy=array([0.], dtype=float32)>]

Explanation of the code above — The first line creates a Dense layer containing just one neuron (unit =1). x (input) is a tensor of shape (1,1) with the value 1. Y = my_dense(x), helps initialize the Dense layer. ‘.variables’ helps us to look at the values initialized inside the Dense layers (weights and biases).

The output of ‘my_dense.variable’ is shown below the code block. It shows that there are two variables in ‘simple_dense’ called ‘kernel’ and ‘bias’. The kernel ‘w’ is initialized a value 0.0038, a random normal distribution value, and the bias ‘b’ is initialized with the value 0. This is just the initial state of the layer. Once trained, these values will change accordingly.

import numpy as np# define the dataset xs = np.array([-1.0,  0.0, 1.0, 2.0, 3.0, 4.0], dtype=float) 
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)   # use the Sequential API to build a model with our custom layer my_layer = SimpleDense(units=1) 
model = tf.keras.Sequential([my_layer])  # configure and train the model model.compile(optimizer='sgd', loss='mean_squared_error') model.fit(xs, ys, epochs=500,verbose=0)  # perform inference print(model.predict([10.0]))  # see the updated state of the variables print(my_layer.variables)

Output:

[[18.981567]][<tf.Variable 'sequential/simple_dense_1/kernel:0' shape=(1, 1) dtype=float32, numpy=array([[1.9973286]], dtype=float32)>, <tf.Variable 'sequential/simple_dense_1/bias:0' shape=(1,) dtype=float32, numpy=array([-0.99171764], dtype=float32)>]

Explanation of the code above —The code used above is a very simple way to check if the custom layers work. Input and output are set, and the model is compiled using the custom layer and finally trained for 500 epochs. What is important to see is that after training the model, the values of the weights and biases have now changed. The weight which was initially set as 0.0038 is now 1.9973, and the bias which was initially set as zero is now -0.9917.

Adding an Activation Function to the Custom Dense Layer:

Previously we created the custom Dense layer but we did not add any activations along with this layer. Of course to add activation we can just write the activation as a separate line in the model, or add the activation as a Lambda layer. But how do we implement the activation in the same custom layer that we created above.

The answer is a simple tweak in the ‘__init__’ and the ‘call’ methods in the custom Dense layer.

class SimpleDense(Layer):

    # add an activation parameter    def __init__(self, units=32, activation=None):
        super(SimpleDense, self).__init__()
        self.units = units
        
        # define the activation to get from the built-in activation layers in Keras        self.activation = tf.keras.activations.get(activation)


    def build(self, input_shape):        w_init = tf.random_normal_initializer()
        self.w = tf.Variable(name="kernel",
            initial_value=w_init(shape=(input_shape[-1], self.units),dtype='float32'),trainable=True)        b_init = tf.zeros_initializer()
        self.b = tf.Variable(name="bias",
            initial_value=b_init(shape=(self.units,), dtype='float32'),trainable=True)
        super().build(input_shape)


    def call(self, inputs):
        
        # pass the computation to the activation layer        return self.activation(tf.matmul(inputs, self.w) + self.b)

Explanation of the code above — Most of the code is exactly similar to the code that we used before.

To add the activation we need to specify in the ‘__init__’ that we need an activation. Either a string or an instance of an activation object can be passed into this activation. It is set to default as None, so if no activation function is mentioned it will not throw an error. Next, we have to initialize the activation function as — ‘tf.keras.activations.get(activation)’.

The final edit is in the ‘call’ method where just before the computation of the weights and the biases we need to add self.activation to activate the computation. So now the return is the computation along with the activation.

Complete code of Custom Dense layer with Activation on the mnist dataset:


import tensorflow as tf
from tensorflow.keras.layers import Layerclass SimpleDense(Layer):
def __init__(self, units=32, activation=None):
        super(SimpleDense, self).__init__()
        self.units = units
        
        # define the activation to get from the built-in activation layers in Kerasself.activation = tf.keras.activations.get(activation)


    def build(self, input_shape):w_init = tf.random_normal_initializer()
        self.w = tf.Variable(name="kernel",
            initial_value=w_init(shape=(input_shape[-1], self.units),dtype='float32'),trainable=True)b_init = tf.zeros_initializer()
        self.b = tf.Variable(name="bias",
            initial_value=b_init(shape=(self.units,), dtype='float32'),trainable=True)
        super().build(input_shape)


    def call(self, inputs):
        
        # pass the computation to the activation layerreturn self.activation(tf.matmul(inputs, self.w) + self.b)mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0# build the model
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),    # our custom Dense layer with activation
    SimpleDense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation='softmax')
])# compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])# fit the model
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)

Training the model with our custom Dense layer and activation gives a training accuracy of 97.8% and a validation accuracy of 97.7%.

Conclusion:

This is the way to create custom layers in TensorFlow. Even though we only see the working of a Dense Layer, this can easily be replaced by any other layers such as a Quadratic Layer which does the following computation —

It has 3 state variables: a,b and c,

Computation:

Replacing the Dense Layer with a Quadratic layer:

import tensorflow as tf
from tensorflow.keras.layers import Layerclass SimpleQuadratic(Layer):

    def __init__(self, units=32, activation=None):        '''Initializes the class and sets up the internal variables'''
        
        super(SimpleQuadratic,self).__init__()
        self.units=units
        self.activation=tf.keras.activations.get(activation)
    
    def build(self, input_shape):        '''Create the state of the layer (weights)'''
        
        a_init = tf.random_normal_initializer()
        a_init_val = a_init(shape=(input_shape[-1],self.units),dtype= 'float32')
        self.a = tf.Variable(initial_value=a_init_val, trainable='true')
        
        b_init = tf.random_normal_initializer()
        b_init_val = b_init(shape=(input_shape[-1],self.units),dtype= 'float32')
        self.b = tf.Variable(initial_value=b_init_val, trainable='true')
        
        c_init= tf.zeros_initializer()
        c_init_val = c_init(shape=(self.units,),dtype='float32')
        self.c = tf.Variable(initial_value=c_init_val,trainable='true')
        
   
    def call(self, inputs):        '''Defines the computation from inputs to outputs'''        x_squared= tf.math.square(inputs)
        x_squared_times_a = tf.matmul(x_squared,self.a)
        x_times_b= tf.matmul(inputs,self.b)
        x2a_plus_xb_plus_c = x_squared_times_a+x_times_b+self.c
        
        return self.activation(x2a_plus_xb_plus_c)mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  SimpleQuadratic(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)

This Quadratic layer gives a validation accuracy of 97.8% on the mnist dataset.

Thus, we see we can implement our own layers along with the desired activation into the TensorFlow models to edit or maybe even improve overall accuracies.