Convolutional Neural Network (CNN) is one of the most representative deep neural networks in deep learning, which makes brilliant accomplishments and is inspired by the natural visual perception mechanism of the lifeforms. The significant benefits of CNN are local connections, weight sharing, and down-sampling dimensionality reduction, and such CNN models are widely employed in both research and industry projects. Being a spectacular Artificial Neural Network (ANN) design, CNN fuses artificial neural networks and up-to-date deep learning schemes for better feature extraction from large datasets. The core characteristic of CNN is the automatic identification of the relevant features without any human supervision. CNN has the most powerful architecture rather than other neural networks, as it comprises multi-building blocks.
CNN Architectures: Various CNN architectures have been developed, beneficial to improve the performance of CNN for different sorts of applications. Presently, expanding new and potent block architectures is the core trend in novel research models of CNN architectures. Moreover, the most innovative developments in CNN architectures were built on the utilization of network depth.
AlexNet: AlexNet achieves inventive outcomes in image recognition and classification. It effectively improves the CNN learning ability by increasing its depth and implementing various parameter optimization strategies. AlexNet has importance in recent research of CNN generation and applications.
ZefNet: The key aspect of ZefNet is monitoring the learning schematic through the training stage and efficiently applied for image processing applications. ZefNet is designed to visualize the network quantitively. The recent improvement in CNN performance is attained by rearrangement in CNN topology to analyze design weaknesses and conduct appropriate parameter variations.
Visual Geometry Group (VGG): VGG is developed to regulate the network complexity, utilized for localization problems and image classification. Still, this architecture demands high computational costs.
GoogLeNet: The core aim of GoogLeNet is to achieve high-level accuracy with reduced computational cost by merging multiple-scale convolutional transformations. However, it faces the shortcomings such as heterogeneous topology and representation jam of network.
Highway network: Highway networks are based on the cross-connectivity concept, conducive to increasing the network depth and enhancing performance for more complex tasks.
ResNet: ResNet focuses on an ultra-deep network free of the vanishing gradient problem. These architectures have lower computational complexity, even with enlarged depth.
DenseNet: DenseNet exhibits the effect of cross-layer depth wise-convolutions. These networks can be distinguished between the added and the preserved information in layers by resolving the vanishing gradient issue.
Inspection: The inspection network aims to decrease the computational cost by annulling the deeper network generalization. These networks recently utilized residual connections to accelerate the Inception network training. Another enhanced version of the Inception Network is ResNext.
WideResNet: WideResNet aims to solve feature reuse problems with transformations that impart a very small amount to learning. Inculcating a dropout in each residual block in WideResNet will efficiently regularize the network.
Pyramidal Net: Pyramidal Net was developed to contend with the ResNet learning interference problem by gradually enlarging the residual unit width to enfold the most viable places to the appearance of the down-sampling. Recently Pyramidal Nets are applied for network widening.
Xception: The core idea of Xception is depth-wise separable convolution which achieves extra learning efficacy and better performance but does not minimize the number of parameters.
CapsuleNet: CapsuleNet is greatly applied for object detection. CapsuleNet detects the face and several feature properties such as size, orientation, and perspective, which general CNN does not accomplish. Currently, CapsuleNet provides extra suitability for segmentation and detection.
High-resolution network (HRNet): HRNetare is highly utilized for position-sensitive vision tasks, namely semantic segmentation, object detection, and human pose estimation via getting a more precise representation in the spatial domain and extra-rich in the semantic domain. HRNet resembles a more vigorous backbone in many computer vision tasks.
Deep CNN: Deep Convolutional Neural Network (Deep CNN) is a special neural network that has shown superior performance in computer vision and image processing applications. Deep CNN utilizes multiple feature extraction stages for automatic representation learning. Due to the representational capacity of the deep CNN reaches architectural innovations based on spatial exploitation, depth, multi-path, width, feature-map exploitation, channel boosting, and attention.
ID CNN: 1D CNNs are the latest variants of conventional (2D) CNN and possess significant advances such as scarce training data handling and a low-cost and real-time implementation. Some of the emerging applications of 1D CNN are personalized biomedical data classification and early diagnosis, structural health monitoring, anomaly detection in power electronics, and electrical motor fault detection.
Impressive Applications of CNN: Owing to the prevalence of CNN, it has more applications and is exploited in a large-scale manner around the world. The tendency of CNN to accomplish state-of-the-art performance in numerous applications, including image classification, object tracking, pose estimation, text detection, visual saliency detection, action recognition, scene labeling, speech, and natural language processing.
Image Classification: CNN brings a breakthrough in large-scale image classification. Implementing hierarchy in CNN is considered a common strategy for image classification with many classes. Currently, the commencement of visual attention in CNN is the emerging approach for fine-grained image classification.
Object Detection: CNN accurately and efficiently localize objects in images or video frames, among other models. Region-based CNN(R-CNN), R-CNN with Selective Search (SS), spatial pyramid pooling network (SPP net), pyramid-based version of R-CNN, Fast RCNN, region proposal network are different forms of CNN utilized for an accurate object detection network, improvised in terms of computational cost. Single pipeline detection is recently developed, and it can be optimized end-to-end directly on detection performance.
Pose Estimation: Most of the recent works attracts more attention towards learning multiple levels of representations and abstractions for human-body pose estimation task by utilizing CNNs. DeepPose, multi-resolution CNNa, and dual-source CNN are the current CNN variants used for pose estimation.
Text Detection and Recognition: In recent times, due to the increasing trend of high-level visual understanding in computer vision research, text recognition of on-scene images is a major focus. Implementing CNN for text recognition of on-scene images captures many appearance variations in unconstrained environments with stronger and richer feature representations. Recent works include text detection and localization without recognition, text recognition on cropped text images, and end-to-end text spotting.
Action Recognition: Action Recognition based on their visual appearance and motion dynamics is one of the challenging problems in computer vision. Recently, CNN efficaciously utilized for action analysis in still images and videos sequences. A pre-trained CNN is used to represent full images of actions and achieve a high level of performance in action analysis. 3D convolutional layers are applied to video sequences for action recognition with increased performance and effectiveness.
Speech Processing: For Automatic Speech Recognition (ASR), CNNs and deep CNNs provide impressive performance via the local connectivity and frequency shift capturing in human speech signals. Additionally, the effect of CNNs for Statistical Parametric Speech Synthesis (SPSS) with the help of a generative model constitutes the conditional distribution of the acoustic features from the linguistic features.
Natural Language Processing: In recent years, CNN has been administered quickly in designing and executing natural language processing tasks (NLP) such as text categorization, semantic parsing, search query retrieval, sentence modeling and classification, and prediction. Statistical language modeling is one of the applications of CNN in NLP, and this combines recurrent highway networks for better modeling. Text classification CNN and deep CNN are applied to obtain high impactful performance. Through these two NLP tasks, CNN demands more improvement to achieve e state-of-the-art outcome.
Research Challenges and Future Impacts on CNN:
• Although all CNNs have attained resounding success in several research studies, there are still many difficulties that procure further examination.
• Owing to the tendency of CNNs that become deeper and deeper, the requirement of dataset and computing power are abundant. Exploring supervised learning in CNN architectures will support overcoming the labeled dataset collection problem.
• It is necessary to investigate developing CNN models with reduced complexity and fast execution without loss of accuracy.
• Another challenge is considerable skill and experience for suitable hyperparameters selection. The enhancement in current optimization techniques for learning deep CNN architectures is to be addressed.
• Ensemble learning is one of the future prospective research areas in CNNs to improve generalization and robustness on diverse categories of image processing applications to extract different levels of semantic representations.
• In image processing applications, ID CNN is encouraged for sequential data owing to its better feature extraction ability and effective computations with a limited number of parameters.
• The research work on CNN with attention mechanism may be carried out as future scope to preserve the spatial relevance of objects and their distinguishing features at succeeding stages of learning.
• Cloud-based platforms will be utilized to develop computationally intensive CNN applications as another future research work.