Marmara University
TUBITAK 1001 Project (2022 – 2025)
Principal Investigator Prof. Çiğdem Eroğlu Erdem
Researcher:Assoc. Prof. Ömer Korçak
Abstract: Federated Learning (FL) has recently emerged to find solutions to the problems arising from the need to centrally store and process data in centralized machine learning. FL is a learning method where multiple clients train a machine learning model collaboratively under the coordination of a central server, but in a distributed manner. In FL, a model is locally trained with private data located on participating clients, and only the parameters of the trained model are sent to the server for the aggregation process while the private data stay on the clients. Thus, it offers some important advantages over traditional centralized learning methods in terms of privacy and communication cost. In many real-world applications, clients either have little labeled data or completely unlabeled data because data labeling may be costly or may require expertise. This leads to poor performance of FL methods as the model will be trained with a small amount of labeled data. This raises a new setting known as semi-supervised federated learning (SSFL) where not only labeled but also unlabeled data are used for training. In SSFL, there exist some studies that handle the label scarcity problem and try to enhance the learning performance. However, most of these studies have low model performance. In addition, some issues regarding the use of labeled and unlabeled data, which may affect the model performance, were not considered. For example, in these studies, labeled data is introduced to the training process in a random order, regardless of their difficulty level. In addition, unlabeled data with low classification confidence is labeled using local models at clients that are likely to produce incorrect predictions or using active learning that require human annotations. In this project, we propose a new framework to address the lack of labels in SSFL. In our framework, different from the existing SSFL studies, we will combine the concepts of curriculum learning and self-paced learning in an SSFL setting. To the best of our knowledge, this will be the first study to use the curriculum learning method with SSFL. We will also present a new labeling method for (low-confidence) unlabeled data. Apart from that, we will introduce a new model aggregation method.