Research Area:  Machine Learning
The progress of the computer vision field is dependent on the large volume of labelled data, and it is a challenge to replicate these successes in real tasks with few labelled data. Fortunately, few-shot learning methods have made many promising attempts on few labelled data. In this paper, we propose a light transformer-based few-shot classification network under the framework of prototypical nets (PN) which has two distinctive hallmarks. First, we provide the local features combined with global features as the sample embedding, where the local features are gained by a CNN encoder and the global features are obtained by light transformer-based global information with a saliency detection structure (LT-GSE) simultaneously. Second, for each task, we use the class approximate degree as prior knowledge to interact with information among query samples at the category level, which makes the low-dimensional embedding space distribution more reasonable. The experimental results show that the proposed model can achieve 82.28% and 86.56% on the 5-way 5-shot classification task of mini ImageNet and tiered ImageNet respectively, which are the best performances of all comparable models. Moreover,few-shot task experiments on the Stanford Dogs and CUB-200 datasets also verify the superiority and robustness of the proposed model.
Keywords:  
Author(s) Name:  Hegui Zhu, Rong Zhao, Zhan Gao, Qingsong Tang, Wuming Jiang
Journal name:  Applied Intelligence
Conferrence name:  
Publisher name:  Springer
DOI:  10.1007/s10489-022-03951-0
Volume Information:  Volume 53, pages 7970-7987, (2023)
Paper Link:   https://link.springer.com/article/10.1007/s10489-022-03951-0