Research Area:  Machine Learning
Domain adaptation assumes that samples from source and target domains are freely accessible during a training phase. However, such an assumption is rarely plausible in the real world and possibly causes data privacy issues, especially when the label of the source domain can be a sensitive attribute as an identifier. To avoid accessing source data that could contain sensitive information, we introduce source data free domain adaptation (SFDA). Our key idea is to leverage a pretrained model from the source domain and progressively update the target model in a self-learning manner. We observe that target samples with lower self-entropy measured by the pretrained source model are more likely to be classified correctly. From this, we select the reliable samples with the self-entropy criterion and define these as class prototypes. We then assign pseudolabels for every target sample based on the similarity score with class prototypes. We further propose point-to-set distance-based filtering, which does not require any tunable hyperparameters to reduce uncertainty from the pseudolabeling process. Finally, we train the target model with the filtered pseudolabels with regularization from the pretrained source model. Surprisingly, without direct usage of labeled source samples, our SFDA outperforms conventional domain adaptation methods on benchmark datasets.
Keywords:  
Adaptation models
Training data
Reliability
Data models
Data privacy
Entropy
Prototypes
Author(s) Name:  Youngeun Kim; Donghyeon Cho; Kyeongtak Han
Journal name:  IEEE Transactions on Artificial Intelligence
Conferrence name:  
Publisher name:  IEEE
DOI:  10.1109/TAI.2021.3110179
Volume Information:  Volume: 2
Paper Link:   https://ieeexplore.ieee.org/abstract/document/9528982