Understanding EfficientNet — The most powerful CNN architecture

Understanding the best and the most efficient CNN model present currently — The EfficientNet

Arjun Sarkar
4 min readMay 8, 2021

When convolutional neural networks are developed, they are done so at a fixed resource cost. These networks are scaled up later to achieve better accuracies when more resources are available. A ResNet 18 model can be scaled up to a ResNet 200 model by adding more layers to the original model. In most situations, this scaling technique has helped provide better accuracies on most benchmarking datasets. But the conventional techniques of model scaling are very random. Some models are scaled depth-wise, and some are scaled widthwise. Some models simply take in images of a larger resolution to get better results. This technique of randomly scaling models requires manual tuning and many person-hours, often resulting in little or no improvement in performance. The authors of the EfficientNet proposed scaling up CNN models to obtain better accuracy and efficiency in a much more moral way.

EfficientNet uses a technique called compound coefficient to scale up models in a simple but effective manner. Instead of randomly scaling up width, depth or resolution, compound scaling uniformly scales each dimension with a certain fixed set of scaling coefficients. Using the scaling method and AutoML, the authors of efficient developed seven models of various dimensions, which surpassed the state-of-the-art accuracy of most convolutional neural networks, and with much better efficiency.

Compound Model Scaling

For developing the method of compound scaling, the authors systematically studied the impacts that each scaling technique has on the model’s performance and efficiency. They figured that while scaling single dimensions helps improve model performance, balancing the scale in all the three dimensions — width, depth, and image resolution — considering the variable available resources best improve the overall model performance. The compound scaling method is shown in figure 1.

Figure 1. Different scaling methods vs. Compound scaling (Source: image from the original paper)

The compound scaling method is based on the idea of balancing dimensions of width, depth, and resolution by scaling with a constant ratio. The equations below show how it is achieved mathematically,

The intuition for the networks is, if the input image is bigger, then the network needs more layers to increase the receptive field and more channels to capture more fine-grained patterns on the bigger image. The compound scaling technique also helped improve the model efficiency and accuracy of previous CNN models such as MobileNet and ResNet by around 1.4% and 0.7% ImageNet accuracy, respectively, compared to other random scaling methods.

Architecture

EfficientNet is based on the baseline network developed by the neural architecture search using the AutoML MNAS framework. The network is fine-tuned for obtaining maximum accuracy but is also penalized if the network is very computationally heavy. It is also penalized for slow inference time when the network takes a lot of time to make predictions. The architecture uses a mobile inverted bottleneck convolution similar to MobileNet V2 but is much larger due to the increase in FLOPS. This baseline model is scaled up to obtain the family of EfficientNets.

Performance

Figure 2 shows the performance of EfficientNet compared to other network architectures. The biggest EfficientNet model EfficientNet B7 obtained state-of-the-art performance on the ImageNet and the CIFAR-100 datasets. It obtained around 84.4% top-1/and 97.3% top-5 accuracy on ImageNet. Also, the model size was 8.4 times smaller and 6.1 times faster than the previous best CNN model. It obtained 91.7% accuracy on the CIFAR-100 data set and 98.8% accuracy on the Flowers dataset.

Figure 2. EfficientNet size and performance on the ImageNet dataset (Source: image from the original paper)

As seen in figure 3, the model was also seen to provide better Class Activation Maps (CAM), which focused more on the relevant regions with more object details, paving the way towards better model explainability.

Figure 3. The compound scaling method allows the scaled model (last column) to focus on more relevant regions with more object details (Source: image from the original paper)

Conclusion

As seen in Table 1, the results of EfficientNet outperform all previous CNN architectures on most benchmarking datasets. The compound scaling method can also be used to efficiently scale other CNN architectures as well. It allows EfficientNet models to be scaled in such a way that it achieves state-of-the-art accuracy with an order of magnitude fewer parameters and FLOPS, on ImageNet and other commonly used transfer learning datasets.

Table 1. EfficientNet Performance Results on Transfer Learning Datasets. EfficientNet models achieve new state-of-the-art accuracy for 5 out of 8 datasets, with 9.6x fewer parameters on average (Source: Table from the original paper)

References

EfficientNet paper — Tan, M., & Le, Q. v. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. 36th International Conference on Machine Learning, ICML 2019, 2019-June.

Link — https://arxiv.org/abs/1905.11946

--

--

Arjun Sarkar

Ph.D. student — Deep Learning on Biomedical Images at the Leibniz Institute-HKI, Germany. LinkedIn-https://www.linkedin.com/in/arjun-sarkar-9a051777/