Over the years, Inception architecture has become a huge success for Computer Vision. Being introduced in the year 2014 through the paper, ‘Going Deeper with Convolutions’ 4, which was GoogLeNet submission for widely-acknowledged ImageNet Large Scale Visual Recognition Challenge (ILSVRC). This module proved to better from its two years back ILSVRC winner by using 12 times fewer parameters and still proving to be more accurate. The Inception architecture used in ILSVRC 2014 had the following structure as denoted by Szegedy et al.:
• An average pooling layer with 5×5 filter size and stride 3.
• A 1×1 layer with 128 filters for dimension reduction and rectified linear activation.
• A fully connected layer with 1024 units and rectified linear activation.
• A dropout layer with 70% ratio of dropped outputs 6.
The Inception V3 architecture was later published by Szegedy et al., ‘Rethinking the Inception Architecture for Computer Vision’ in the year (2015) 5 which is primarily an update to their previous work 4 to further improve the accuracy.
In this improved Inception architecture, the last layer of Inception has been retrained using Softmax.
Inception architecture of GoogLeNet was also designed to perform well even under strict constraints on memory and computational budget. For example, GoogLeNet employed around 7 million parameters, which represented a 9 times reduction with respect to its predecessor AlexNet, which used 60 million parameters. Furthermore, VGGNet employed about 3 times more parameters than AlexNet. The computational cost of Inception is also much lower than VGGNet or its higher performing successors. This has made it feasible to utilize Inception networks in big-data scenarios, where huge amount of data needed to be processed at reasonable cost 5 6.
All in all Inception can be a better alternative as compared to other fellow networks depending on our requirements as we already have limited processing capability and storage constraints. So because of all the above mentioned advantages, Inception V3 was integrated by us for our network training which later proved to be a good decision.