Implementation of all Loss Functions (Deep Learning) in NumPy, TensorFlow, and PyTorch

In this article, all common loss functions used in Deep Learning are discussed, and they are implemented in NumPy, PyTorch, and TensorFlow

18 min readMar 17, 2023

Mean Squared Error (MSE) Loss
Binary Cross-Entropy Loss
Weighted Binary Cross-Entropy Loss
Categorical Cross-Entropy Loss
Sparse Categorical Cross-Entropy Loss
Dice Loss
KL Divergence Loss
Mean Absolute Error (MAE) / L1 Loss
Huber Loss

Mean Squared Error (MSE) Loss

Mean Squared Error (MSE) loss is a commonly used loss function in regression problems, where the goal is to predict a continuous variable. The loss is calculated as the average of the squared differences between the predicted and true values. The formula for MSE loss is:

MSE loss = (1/n) * sum((y_pred — y_true)²)

Where:

n is the number of samples in the dataset
y_pred is the predicted value of the target variable
y_true is the true value of the target variable

The MSE loss is sensitive to outliers and can penalize large errors heavily, which may not be desirable in some cases. In such cases, other loss functions like Mean Absolute Error (MAE) or Huber Loss may be used instead.

Implementation in NumPy

import numpy as np

def mse_loss(y_pred, y_true):
    """
    Calculates the mean squared error (MSE) loss between predicted and true values.
    
    Args:
    - y_pred: predicted values
    - y_true: true values
    
    Returns:
    - mse_loss: mean squared error loss
    """
    n = len(y_true)
    mse_loss = np.sum((y_pred - y_true) ** 2) / n
    return mse_loss

In this implementation, y_pred and y_true are NumPy arrays containing the predicted and true values, respectively. The function first calculates the squared differences between y_pred and y_true, and then takes the mean of these values to obtain the MSE loss. The n variable represents the number of samples in the dataset, and is used to normalize the loss.

Implementation in TensorFlow

import tensorflow as tf

def mse_loss(y_pred, y_true):
    """
    Calculates the mean squared error (MSE) loss between predicted and true values.
    
    Args:
    - y_pred: predicted values
    - y_true: true values
    
    Returns:
    - mse_loss: mean squared error loss
    """
    mse = tf.keras.losses.MeanSquaredError()
    mse_loss = mse(y_true, y_pred)
    return mse_loss

In this implementation, y_pred and y_true are TensorFlow tensors containing the predicted and true values, respectively. The tf.keras.losses.MeanSquaredError() function calculates the MSE loss between y_pred and y_true. The mse_loss variable contains the calculated loss.

Implementation in PyTorch

import torch

def mse_loss(y_pred, y_true):
    """
    Calculates the mean squared error (MSE) loss between predicted and true values.
    
    Args:
    - y_pred: predicted values
    - y_true: true values
    
    Returns:
    - mse_loss: mean squared error loss
    """
    mse = torch.nn.MSELoss()
    mse_loss = mse(y_pred, y_true)
    return mse_loss

In this implementation, y_pred and y_true are PyTorch tensors containing the predicted and true values, respectively. The torch.nn.MSELoss() function calculates the MSE loss between y_pred and y_true. The mse_loss variable contains the calculated loss.

Binary Cross-Entropy Loss

Binary Cross-Entropy loss, also known as log loss, is a common loss function used in binary classification problems. It measures the difference between the predicted probability distribution and the actual binary label distribution.

The formula for binary cross-entropy loss is as follows:

L(y, ŷ) = -[y * log(ŷ) + (1 — y) * log(1 — ŷ)]

where y is the true binary label (0 or 1), ŷ is the predicted probability (ranging from 0 to 1), and log is the natural logarithm.

The first term of the equation calculates the loss when the true label is 1, and the second term calculates the loss when the true label is 0. The overall loss is the sum of both terms.

When the predicted probability is close to the true label, the loss is low, and when the predicted probability is far from the true label, the loss is high. This loss function is commonly used in neural network models that use sigmoid activation functions in the output layer to predict binary labels.

Implementation in NumPy

In numpy, the binary cross-entropy loss can be implemented using the formula we described earlier. Here is an example of how to calculate it:

# define true labels and predicted probabilities
y_true = np.array([0, 1, 1, 0])
y_pred = np.array([0.1, 0.9, 0.8, 0.3])

# calculate the binary cross-entropy loss
loss = -(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)).mean()

# print the loss
print(loss)

Implementation in TensorFlow

In TensorFlow, the binary cross-entropy loss can be implemented using the tf.keras.losses.BinaryCrossentropy() function. Here is an example of how to use it:

import tensorflow as tf

# define true labels and predicted probabilities
y_true = tf.constant([0, 1, 1, 0])
y_pred = tf.constant([0.1, 0.9, 0.8, 0.3])

# define the loss function
bce_loss = tf.keras.losses.BinaryCrossentropy()

# calculate the loss
loss = bce_loss(y_true, y_pred)

# print the loss
print(loss)

Implementation in PyTorch

In PyTorch, the binary cross-entropy loss can be implemented using the torch.nn.BCELoss() function. Here is an example of how to use it:

import torch

# define true labels and predicted probabilities
y_true = torch.tensor([0, 1, 1, 0], dtype=torch.float32)
y_pred = torch.tensor([0.1, 0.9, 0.8, 0.3], dtype=torch.float32)

# define the loss function
bce_loss = torch.nn.BCELoss()

# calculate the loss
loss = bce_loss(y_pred, y_true)

# print the loss
print(loss)

Weighted Binary Cross-Entropy Loss

Weighted Binary Cross-Entropy loss is a variation of the binary cross-entropy loss that allows for assigning different weights to positive and negative examples. This can be useful when dealing with imbalanced datasets, where one class is significantly underrepresented compared to the other.

The formula for weighted binary cross-entropy loss is as follows:

L(y, ŷ) = -[w_pos * y * log(ŷ) + w_neg * (1 — y) * log(1 — ŷ)]

where y is the true binary label (0 or 1), ŷ is the predicted probability (ranging from 0 to 1), log is the natural logarithm, and w_pos and w_neg are the positive and negative weights, respectively.

The first term of the equation calculates the loss when the true label is 1, and the second term calculates the loss when the true label is 0. The overall loss is the sum of both terms, each weighted by the corresponding weight.

The positive and negative weights can be chosen based on the relative importance of each class. For example, if the positive class is more important, a higher weight can be assigned to it. Similarly, if the negative class is more important, a higher weight can be assigned to it.

Categorical Cross-Entropy Loss

The categorical cross-entropy loss is a popular loss function used in multi-class classification problems. It measures the dissimilarity between the true labels and the predicted probabilities for each class.

The formula for categorical cross-entropy loss is:

L = -1/N * sum(sum(Y * log(Y_hat)))

where Y is a matrix of true labels in one-hot encoding format, Y_hat is a matrix of predicted probabilities for each class, N is the number of samples, and log represents the natural logarithm.

In this formula, Y has a shape of (N, C), where N is the number of samples and C is the number of classes. Each row of Y represents the true label distribution for a single sample, with a value of 1 in the column corresponding to the true label and 0 in all other columns.

Similarly, Y_hat has a shape of (N, C), where each row represents the predicted probability distribution for a single sample, with a probability value for each class.

The log function is applied element-wise to the predicted probability matrix Y_hat. The sum function is used twice to sum over both dimensions of the Y matrix.

The resulting value L represents the average cross-entropy loss over all N samples in the dataset. The goal of training a neural network is to minimize this loss function.

The loss function penalizes the model more heavily for making large errors in predicting classes with low probabilities. The goal is to minimize the loss function, which means making the predicted probabilities as close to the true labels as possible.

Implementation in NumPy

In numpy, the categorical cross-entropy loss can be implemented using the formula we described earlier. Here is an example of how to calculate it:

import numpy as np

# define true labels and predicted probabilities as NumPy arrays
y_true = np.array([[0, 1, 0], [0, 0, 1], [1, 0, 0]])
y_pred = np.array([[0.8, 0.1, 0.1], [0.2, 0.3, 0.5], [0.1, 0.6, 0.3]])

# calculate the loss
loss = -1/len(y_true) * np.sum(np.sum(y_true * np.log(y_pred)))

# print the loss
print(loss)In this example, y_true represents the true labels (in integer format), and y_pred represents the predicted probabilities for each class (in a 2D array). The eye() function is used to convert the true labels to one-hot encoding, which is required for the loss calculation. The categorical cross-entropy loss is calculated using the formula we provided earlier, and the mean() function is used to average the loss over the entire dataset. Finally, the calculated loss is printed to the console.

In this example, y_true represents the true labels in one-hot encoding format, and y_pred represents the predicted probabilities for each class, both as NumPy arrays. The loss is calculated using the formula described above, and then printed to the console using the print function. Note that the np.sum function is used twice to sum over both dimensions of the Y matrix.

Implementation in TensorFlow

In TensorFlow, the categorical cross-entropy loss can be easily calculated using the tf.keras.losses.CategoricalCrossentropy class. Here's an example of how to use it:

import tensorflow as tf

# define true labels and predicted probabilities as TensorFlow Tensors
y_true = tf.constant([[0, 1, 0], [0, 0, 1], [1, 0, 0]])
y_pred = tf.constant([[0.8, 0.1, 0.1], [0.2, 0.3, 0.5], [0.1, 0.6, 0.3]])

# create the loss object
cce_loss = tf.keras.losses.CategoricalCrossentropy()

# calculate the loss
loss = cce_loss(y_true, y_pred)

# print the loss
print(loss.numpy())

In this example, y_true represents the true labels in one-hot encoding format, and y_pred represents the predicted probabilities for each class, both as TensorFlow Tensors. The CategoricalCrossentropy class is used to create an instance of the loss function, and then the loss is calculated by passing in the true labels and predicted probabilities as arguments. Finally, the calculated loss is printed to the console using the .numpy() method.

Note that the CategoricalCrossentropy class handles the conversion of the true labels to one-hot encoding internally, so you don't need to do it explicitly. If your true labels are already in one-hot encoding format, you can pass them in directly to the loss function without any problems.

Implementation in PyTorch

In PyTorch, the categorical cross-entropy loss can be easily calculated using the torch.nn.CrossEntropyLoss class. Here's an example of how to use it:

import torch

# define true labels and predicted logits as PyTorch Tensors
y_true = torch.LongTensor([1, 2, 0])
y_logits = torch.Tensor([[0.8, 0.1, 0.1], [0.2, 0.3, 0.5], [0.1, 0.6, 0.3]])

# create the loss object
ce_loss = torch.nn.CrossEntropyLoss()

# calculate the loss
loss = ce_loss(y_logits, y_true)

# print the loss
print(loss.item())

In this example, y_true represents the true labels in integer format, and y_logits represents the predicted logits for each class, both as PyTorch Tensors. The CrossEntropyLoss class is used to create an instance of the loss function, and then the loss is calculated by passing in the predicted logits and true labels as arguments. Finally, the calculated loss is printed to the console using the .item() method.

Note that the CrossEntropyLoss class combines the softmax activation function and the categorical cross-entropy loss into a single operation, so you don't need to apply softmax separately. Also note that the true labels should be in integer format, not one-hot encoding format.

Sparse Categorical Cross-Entropy Loss

The sparse categorical cross-entropy loss is similar to the categorical cross-entropy loss, but it is used when the true labels are provided as integers rather than one-hot encoding. It is commonly used as a loss function in multi-class classification problems.

The formula for sparse categorical cross-entropy loss is:

L = -1/N * sum(log(Y_hat_i))

where Y_hat_i is the predicted probability for the true class label i for each sample, and N is the number of samples.

In other words, the formula calculates the negative logarithm of the predicted probability for the true class label for each sample, and then averages these values over all samples.

Unlike the categorical cross-entropy loss, which uses a one-hot encoding for the true labels, the sparse categorical cross-entropy loss uses integer labels directly. The true label for each sample is represented as a single integer value i between 0 and C-1, where C is the number of classes.

Implementation in NumPy

import numpy as np

def sparse_categorical_crossentropy(y_true, y_pred):
    # convert true labels to one-hot encoding
    y_true_onehot = np.zeros_like(y_pred)
    y_true_onehot[np.arange(len(y_true)), y_true] = 1

    # calculate loss
    loss = -np.mean(np.sum(y_true_onehot * np.log(y_pred), axis=-1))

    return loss

In this implementation, y_true is an array of integer labels and y_pred is an array of predicted probabilities for each sample. The function first converts the true labels to a one-hot encoding format using NumPy's advanced indexing feature to create an array of shape (N, C) where N is the number of samples and C is the number of classes, and each row corresponds to the true label distribution for a single sample.

The function then calculates the loss using the formula described in the previous answer: -1/N * sum(log(Y_hat_i)). This is implemented using NumPy's broadcasting, where y_true_onehot * np.log(y_pred) creates an array of shape (N, C) where each element represents the product of the corresponding elements in y_true_onehot and np.log(y_pred). The sum function is then used to sum over the C dimension, and mean is used to average over the N dimension.

Here’s an example of how to use the function:

# define true labels as integers and predicted probabilities as an array
y_true = np.array([1, 2, 0])
y_pred = np.array([[0.1, 0.8, 0.1], [0.3, 0.2, 0.5], [0.4, 0.3, 0.3]])

# calculate the loss
loss = sparse_categorical_crossentropy(y_true, y_pred)

# print the loss
print(loss)

This will output the value of the sparse categorical cross-entropy loss for the given inputs.

Implementation in TensorFlow

import tensorflow as tf

def sparse_categorical_crossentropy(y_true, y_pred):
    loss = tf.keras.losses.sparse_categorical_crossentropy(y_true, y_pred, from_logits=False)
    return loss

# define true labels as integers and predicted probabilities as a tensor
y_true = tf.constant([1, 2, 0])
y_pred = tf.constant([[0.1, 0.8, 0.1], [0.3, 0.2, 0.5], [0.4, 0.3, 0.3]])

# calculate the loss
loss = sparse_categorical_crossentropy(y_true, y_pred)

# print the loss
print(loss.numpy())

In this implementation, y_true is an array of integer labels and y_pred is an array of predicted probabilities for each sample. The function uses the tf.keras.losses.sparse_categorical_crossentropy function provided by TensorFlow to calculate the loss. The from_logits parameter is set to False to ensure that y_pred represents probabilities rather than logit values.

Implementation in PyTorch

import torch.nn.functional as F
import torch

def sparse_categorical_crossentropy(y_true, y_pred):
    loss = F.cross_entropy(y_pred, y_true)
    return loss

# define true labels as integers and predicted logits as a tensor
y_true = torch.tensor([1, 2, 0])
y_pred = torch.tensor([[0.1, 0.8, 0.1], [0.3, 0.2, 0.5], [0.4, 0.3, 0.3]])

# calculate the loss
loss = sparse_categorical_crossentropy(y_true, y_pred)

# print the loss
print(loss.item())

In this implementation, y_true is an array of integer labels and y_pred is an array of predicted logits for each sample. The function uses PyTorch's F.cross_entropy function to calculate the loss. The y_pred tensor should have shape (N, C) where N is the number of samples and C is the number of classes.

Dice Loss

Dice loss, also known as the Sørensen–Dice coefficient or F1 score, is a loss function used in image segmentation tasks to measure the overlap between the predicted segmentation and the ground truth. The Dice loss ranges from 0 to 1, where 0 indicates no overlap and 1 indicates perfect overlap.

The Dice loss is defined as:

Dice Loss = 1 - (2 * intersection + smooth) / (sum of squares of prediction + sum of squares of ground truth + smooth)

where intersection is the element-wise product of the prediction and ground truth masks, smooth is a smoothing constant (usually a small value such as 1e-5) to prevent division by zero, and the sums are taken over all elements of the masks.

The Dice loss can be implemented in various deep learning frameworks such as TensorFlow, PyTorch, and NumPy. The implementation involves computing the intersection and sums of squares using the element-wise product and summation operations available in the framework.

Implementation in NumPy

import numpy as np

def dice_loss(y_true, y_pred, smooth=1e-5):
    intersection = np.sum(y_true * y_pred, axis=(1,2,3))
    sum_of_squares_pred = np.sum(np.square(y_pred), axis=(1,2,3))
    sum_of_squares_true = np.sum(np.square(y_true), axis=(1,2,3))
    dice = 1 - (2 * intersection + smooth) / (sum_of_squares_pred + sum_of_squares_true + smooth)
    return dice

In this implementation, y_true and y_pred are the ground truth and predicted masks, respectively. The smooth parameter is used to prevent division by zero. The sum and square functions are used to compute the intersection and sums of squares, respectively. Finally, the Dice loss is computed using the formula described in the previous answer.

Note that this implementation assumes that y_true and y_pred are 4D arrays with dimensions (batch_size, height, width, num_classes). If your masks have a different shape, you may need to modify the implementation accordingly.

Implementation in TensorFlow

import tensorflow as tf

def dice_loss(y_true, y_pred, smooth=1e-5):
    intersection = tf.reduce_sum(y_true * y_pred, axis=(1,2,3))
    sum_of_squares_pred = tf.reduce_sum(tf.square(y_pred), axis=(1,2,3))
    sum_of_squares_true = tf.reduce_sum(tf.square(y_true), axis=(1,2,3))
    dice = 1 - (2 * intersection + smooth) / (sum_of_squares_pred + sum_of_squares_true + smooth)
    return dice

In this implementation, y_true and y_pred are TensorFlow tensors representing the ground truth and predicted masks, respectively. The smooth parameter is used to prevent division by zero. The reduce_sum and square functions are used to compute the intersection and sums of squares, respectively. Finally, the Dice loss is computed using the formula described in the previous answer.

Note that this implementation assumes that y_true and y_pred are 4D tensors with dimensions (batch_size, height, width, num_classes). If your masks have a different shape, you may need to modify the implementation accordingly.

Implementation in PyTorch

import torch

def dice_loss(y_true, y_pred, smooth=1e-5):
    intersection = torch.sum(y_true * y_pred, dim=(1,2,3))
    sum_of_squares_pred = torch.sum(torch.square(y_pred), dim=(1,2,3))
    sum_of_squares_true = torch.sum(torch.square(y_true), dim=(1,2,3))
    dice = 1 - (2 * intersection + smooth) / (sum_of_squares_pred + sum_of_squares_true + smooth)
    return dice

In this implementation, y_true and y_pred are PyTorch tensors representing the ground truth and predicted masks, respectively. The smooth parameter is used to prevent division by zero. The sum and square functions are used to compute the intersection and sums of squares, respectively. Finally, the Dice loss is computed using the formula described in the previous answer.

Note that this implementation assumes that y_true and y_pred are 4D tensors with dimensions (batch_size, num_classes, height, width). If your masks have a different shape, you may need to modify the implementation accordingly.

KL Divergence Loss

KL (Kullback-Leibler) divergence loss is a measure of how different two probability distributions are from each other. In the context of machine learning, it is often used as a loss function to train models that generate new samples from a given distribution.

The KL divergence between two probability distributions p and q is defined as:

KL(p||q) = sum(p(x) * log(p(x) / q(x)))

In the context of machine learning, p represents the true distribution and q represents the predicted distribution. The KL divergence loss measures how well the predicted distribution matches the true distribution.

The KL divergence loss can be used in various tasks such as image generation, text generation, and reinforcement learning. However, it can be difficult to optimize since it has a non-convex form.

In practice, the KL divergence loss is often used in conjunction with other loss functions such as the cross-entropy loss. By adding the KL divergence loss to the cross-entropy loss, the model is encouraged to generate samples that not only match the target distribution but also have similar distributions to the training data.

Implementation in NumPy

import numpy as np

def kl_divergence_loss(p, q):
    return np.sum(p * np.log(p / q))

In this implementation, p and q are numpy arrays representing the true distribution and predicted distribution, respectively. The KL divergence loss is computed using the formula described above.

Note that this implementation assumes that p and q have the same shape. If they have different shapes, you may need to modify the implementation accordingly.

Implementation in TensorFlow

tf.keras.losses.KLDivergence() is a built-in function in TensorFlow that computes the KL divergence loss between two probability distributions. It can be used as a loss function in various machine learning tasks such as image generation, text generation, and reinforcement learning.

Here’s an example usage of tf.keras.losses.KLDivergence():

import tensorflow as tf

# define true distribution and predicted distribution
p = tf.constant([0.2, 0.3, 0.5])
q = tf.constant([0.4, 0.3, 0.3])

# compute KL divergence loss
kl_loss = tf.keras.losses.KLDivergence()(p, q)

print(kl_loss.numpy())

In this example, p and q are TensorFlow tensors representing the true distribution and predicted distribution, respectively. The tf.keras.losses.KLDivergence() function is used to compute the KL divergence loss between p and q. The result is a scalar tensor that represents the loss value.

Note that tf.keras.losses.KLDivergence() automatically handles cases where p and q have different shapes, by broadcasting them to a common shape. Additionally, you can adjust the weight of the KL divergence loss relative to other losses in your model by setting the reduction parameter of the function, which controls how the losses are aggregated.

Implementation in PyTorch

In PyTorch, the KL divergence loss can be computed using the torch.nn.KLDivLoss module. Here's an example implementation:

import torch

def kl_divergence_loss(p, q):
    criterion = torch.nn.KLDivLoss(reduction='batchmean')
    loss = criterion(torch.log(p), q)
    return lossIn this implementation, p and q are PyTorch tensors representing the true distribution and predicted distribution, respectively. The torch.nn.KLDivLoss module is used to compute the KL divergence loss between p and q. The reduction parameter is set to 'batchmean' to compute the mean loss over the batch.

Note that p and q should be probabilities and sum up to 1 along the last dimension. The torch.log function is used to take the logarithm of p before passing it to the torch.nn.KLDivLoss module. This is because the module expects the input to be log-probabilities.

Mean Absolute Error (MAE) Loss / L1 Loss

L1 loss, also known as mean absolute error (MAE) loss, is a common loss function used in deep learning for regression tasks. It measures the absolute differences between the predicted and true values of the target variable.

The formula for L1 loss is:

L1 loss = 1/n * Σ|y_pred — y_true|

where n is the number of samples, y_pred is the predicted value, and y_true is the true value.

In simpler terms, L1 loss is the average of the absolute differences between the predicted and true values. It is less sensitive to outliers than the mean squared error (MSE) loss, making it a good choice for models that can be affected by outliers.

Implementation in Numpy

import numpy as np

def l1_loss(y_pred, y_true):
    loss = np.mean(np.abs(y_pred - y_true))
    return loss

The NumPy implementation of L1 loss is very similar to the formula, where you subtract the predicted value from the true value and take the absolute value. Then, you take the mean of these absolute differences across all samples to obtain the average L1 loss.

Implementation in TensorFlow

import tensorflow as tf

def l1_loss(y_pred, y_true):
    loss = tf.reduce_mean(tf.abs(y_pred - y_true))
    return loss

In TensorFlow, you can use the tf.reduce_mean() function to compute the mean of the absolute differences between the predicted and true values across all samples.

Implementation in PyTorch

import torch

def l1_loss(y_pred, y_true):
    loss = torch.mean(torch.abs(y_pred - y_true))
    return loss

In PyTorch, you can use the torch.mean() function to compute the mean of the absolute differences between the predicted and true values across all samples.

Huber Loss

Huber loss is a loss function used in regression tasks that is less sensitive to outliers than Mean Squared Error (MSE) loss. It is defined as a combination of the MSE loss and Mean Absolute Error (MAE) loss, where the loss function is MSE for small errors and MAE for larger errors. This makes Huber loss more robust to outliers than MSE loss.

The Huber loss function is defined as follows:

L(y_pred, y_true) = 1/n * sum(0.5 * (y_pred - y_true)^2)   if |y_pred - y_true| <= delta
                    1/n * sum(delta * |y_pred - y_true| - 0.5 * delta^2)   otherwise

where n is the number of samples, y_pred is the predicted value, y_true is the true value, and delta is a hyperparameter that determines the threshold for switching between the MSE and MAE loss.

When |y_pred - y_true| <= delta, the loss function is the MSE loss. When |y_pred - y_true| > delta, the loss function is the MAE loss with a slope of delta.

In practice, delta is usually set to a value that balances the MSE and MAE loss, such as 1.0.

Implementation in Numpy

import numpy as np

def huber_loss(y_pred, y_true, delta=1.0):
    error = y_pred - y_true
    abs_error = np.abs(error)
    quadratic = np.minimum(abs_error, delta)
    linear = (abs_error - quadratic)
    return np.mean(0.5 * quadratic ** 2 + delta * linear)

This function takes the predicted values y_pred, true values y_true, and the delta hyperparameter as inputs, and returns the Huber loss.

The function first calculates the absolute error between the predicted and true values, and then splits the error into two components based on the delta hyperparameter. The quadratic component is the MSE loss when abs_error <= delta, and the linear component is the MAE loss when abs_error > delta. Finally, the function returns the mean Huber loss over all samples.

You can use this function in your numpy-based regression tasks by calling it with your predicted and true values, and the desired delta value.

Implementation in TensorFlow

import tensorflow as tf

def huber_loss(y_pred, y_true, delta=1.0):
    error = y_pred - y_true
    abs_error = tf.abs(error)
    quadratic = tf.minimum(abs_error, delta)
    linear = (abs_error - quadratic)
    return tf.reduce_mean(0.5 * quadratic ** 2 + delta * linear)

This function takes the predicted values y_pred, true values y_true, and the delta hyperparameter as inputs, and returns the Huber loss.

The function first calculates the absolute error between the predicted and true values using the tf.abs function, and then splits the error into two components based on the delta hyperparameter using the tf.minimum and - operators. The quadratic component is the MSE loss when abs_error <= delta, and the linear component is the MAE loss when abs_error > delta. Finally, the function returns the mean Huber loss over all samples using the tf.reduce_mean function.

You can use this function in your TensorFlow-based regression tasks by calling it with your predicted and true values, and the desired delta value.

Implementation in PyTorch

import torch.nn.functional as F

def huber_loss(y_pred, y_true, delta=1.0):
    error = y_pred - y_true
    abs_error = torch.abs(error)
    quadratic = torch.min(abs_error, delta)
    linear = (abs_error - quadratic)
    return 0.5 * quadratic ** 2 + delta * linear

This function takes the predicted values y_pred, true values y_true, and the delta hyperparameter as inputs, and returns the Huber loss.

The function first calculates the absolute error between the predicted and true values using the torch.abs function, and then splits the error into two components based on the delta hyperparameter using the torch.min and - operators. The quadratic component is the MSE loss when abs_error <= delta, and the linear component is the MAE loss when abs_error > delta. Finally, the function returns the Huber loss using the formula 0.5 * quadratic ** 2 + delta * linear.

You can use this function in your PyTorch-based regression tasks by calling it with your predicted and true values, and the desired delta value.

Implementation of all Loss Functions (Deep Learning) in NumPy, TensorFlow, and PyTorch

In this article, all common loss functions used in Deep Learning are discussed, and they are implemented in NumPy, PyTorch, and TensorFlow

Contents

Mean Squared Error (MSE) Loss

Binary Cross-Entropy Loss

Weighted Binary Cross-Entropy Loss

Categorical Cross-Entropy Loss

Sparse Categorical Cross-Entropy Loss

Dice Loss

KL Divergence Loss

Mean Absolute Error (MAE) Loss / L1 Loss

Huber Loss

Written by Arjun Sarkar