With the increasing consideration of data privacy and security, it is unenviable to gather data from all users to conduct machine learning tasks.
As the next generation of Artificial Intelligence (AI), privacy-preserved AI is developed by establishing intelligent systems privately with the help of new emerging technology called Federated Learning (FL).
Federated learning captivates a lot of attention from researchers and scholars to explore its potential applicability over a wide range of real-world applications and use cases.
Federated learning is the machine learning approach, where the statistical methods are trained at the edge in dispersed networks.
FL is also known as decentralized learning, which trains machine learning algorithms on multiplex edge devices without transferring the training data in a distributive manner.
In other words, Federated learning cooperatively trains a model under the coordination of a central server and retains the training data decentralized.
Machine learning algorithms, such as deep neural networks, are also trained on several local datasets entailed in local edge junctions. Federated learning alleviates many systemic privacy threats and costs by training a shared model on the server without raw data and agglomerating locally computed updates.
The significant core need for instigating federated learning concepts is big data, the trend of edge computing, and privacy-preserved deep learning.
Federated learning helps resolve many crucial hurdles, including data privacy, data security, data access rights, and access to diversified data.
Some of the significant advantages of federated learning are data diversity, continual real-time learning, and hardware efficiency over traditional and centralized machine learning techniques.
The current development of FL concentrates on conquering statistical constraints, enhancing security, and creating a more personalized federated learning model.
In the on-device federated learning, distributed mobile user interactions are involved with optimization factors such as communication cost in massive distribution, unbalanced data distribution, and device reliability.
The main challenges of the federated network are effectual communication, heterogeneous systems management, statistical heterogeneity of data, better privacy concerns, and privacy-preserving techniques.
Federated learning facilitates the cooperative training of machine learning and deep learning to optimize mobile edge networks. In maintaining privacy, federated learning empowers more robust models with lower latency and lower energy usage.
Federated learning supports privacy-sensitive applications that refashioned the path of AI models in various autonomous vehicles, traffic prediction, monitoring, healthcare, telecommunication, IoT, cyber-security, pharmaceutics, industrial management, industrial IoT, and healthcare and medical AI. Federated learning remains a comparatively new field with many research possibilities for accomplishing privacy-preserving AI.
The classification of federated learning is categorized based on five prospects: data distribution, privacy mechanisms, applicable machine learning models, and techniques for solving heterogeneity.
Data distribution:
Categorizations based on data distribution patterns are horizontal federated learning, vertical federated learning, and federated transfer learning.
Horizontal Federated Learning (HFL):
HFL is also referred to as sample-based federated learning, as it is utilized for the datasets that distribute feature space but differ in the number of samples. In other words, datasets with each device of HFL device have the same feature space with different sample instances. This kind of federated learning uses the same features, also called homogeneous federated learning. The benefit of HFL is increasing the user sample size. The first applicative case of horizontal federated learning is android phone model updates for the Google keyboard developed by Google. HFL exploits machine learning models such as logistic regression for android model updates. Homomorphic encryption, differential privacy, and secure aggregation are the techniques of HFL used to resolve the privacy leakage during processing and communication to maintain the security of shifting gradients in horizontal federated learning.
Vertical Federated Learning (VFL):
VFL is also referred to as feature-based federated learning, as its datasets contribute the same sample space but dissimilar feature space. Vertical Federated Learning is also considered heterogeneous federated learning, owing to its varying feature sets. The advantage of VFL is increasing feature dimension. Several machine learning algorithms exist for the vertical distribution of data, such as classification, statistical analysis, gradient descent, safe linear regression, and data mining. VFL has recently utilized secure boost, various logistic regression models, decision trees, and neural network models.
Federated Transfer Learning (FTL):
Federated transfer learning is identical to traditional machine learning; a new feature is appended to the pre-trained model. FTL can boost both user sample size and feature dimension. FTL enhances statistical models under a data association that permits knowledge to be distributed without deteriorating user privacy and maintains interdependent knowledge to be converted in the network. The effective application of federated transfer learning is fed health, which serves as exceptional medical assistance.
Privacy mechanisms:
Techniques of privacy mechanisms in federated learning are to secure federal privacy, such as model aggregation, homomorphic encryption, and differential privacy.
Model aggregation:
In federated learning, model aggregation is one of the well-known privacy mechanisms that train the global model by summing up the model parameters from all parties, which is beneficial to avoid transferring the real data in the training process. The deep network federation learning concept is the recently evolved model aggregation based upon iterative model averaging. The amalgamation of federated learning and multitasking is the classic method of model aggregation that admits multiplex users to train models of varied tasks locally. Another method of model aggregation uses federated learning and blockchain protocol.
Homomorphic encryption:
Generally, encryption methods concentrate on data storage security. Homomorphic encryption is evolved to resolve the computing problem of general encrypted data with the concern of data processing security. The essential feature of homomorphic encryption is estimating and processing encrypted data without disclosing the real data. Ridge regression combines homomorphic encryption to attain privacy needs. Homomorphic encryption is also applied to prevent private information leakage with the help of an additive homomorphism scheme.
Differential privacy:
Differential privacy is the novel privacy mechanism to tackle the problem of privacy divulgence in statistical databases. Differential privacy techniques strongly protect user privacy by adjoining noise and utilizing traditional machine learning and deep learning algorithms. Differential privacy protection is classified into global and localized differential privacy.
• Federated learning exploits various machine learning models to ensure the privacy and efficacy of the model.
• Linear models applied for solving security problems via federated learning. Some of them are linear regression, ridge regression, and lasso regression. These linear models are intelligent and easy to model.
• Federated learning utilizes tree models for training single or numerous decision trees, gradient boosting decision trees, random forests, and secure boost. Tree models are precise, steady, and able to map non-linear correlation.
• At present neural network models resemble a popular choice of federated learning due to their capability to handle complex tasks. Deep neural network-based federated learning is employed for pattern recognition and intelligent control. Advantages of using deep neural networks for federated learning applications are better learning abilities, extremely powerful, and fault-tolerant.
• Conducive to resolving the problem of system heterogeneity, there are four types of diversion asynchronous communication, device sampling, fault-tolerant mechanism, and model heterogeneity.
• Owing to the necessity of real-time communication, asynchronous communication is the first choice to overcome system heterogeneity. In the federated learning multi-device scenario, the asynchronous communication tackles the issue of scattered devices effectively.
• The sampling method is used with better scalability to select the particular equipment to participate in the training of federated learning scenarios.
• In the distributed and unstable network environment of federated learning, the fault-tolerant mechanism impedes system collapsing, where multiple devices are used together.
• Model heterogeneity of federated learning networks is categorized into three schemes: a single device has its self-model. The global model is trained to suit all devices and trains pertinent learning models for tasks.
• Cross-Silo Federated Learning is utilized where the engaging clients are less in number, and its training data are partitioned as either horizontal or vertical FLpattern.
• Cross-Device Federated Learning is used with a huge number of engaging clients. Client selection and incentive designs are two forms of techniques facilitated by cross-device FL.
In today’s world, federated learning is occasionally regarded as one of the most leading-edge technologies accessible in multiple scenarios.
Federated learning is expected to provide safe and distributed security aid for a wide range of applications soon, enhancing the stable growth of Artificial Intelligence (AI). Some of the popular real-world research areas of federated learning are briefly listed below:
Healthcare Data Analytics:
In the medical industry, data privacy and security play a crucial role because healthcare data contains highly confidential and personally identifiable information. Federated learning achieves great promises on healthcare data analytics, creating a high impact on accurate medicine, ultimately enhancing medical care and better diagnosis. Federated learning handles fragmented, sensitive healthcare data by training a shared global model with a central server while managing the data in local organizations. The most impactful scenarios in the healthcare industry are wearable healthcare devices, electronic health records, the Internet of Medical Things (IoMT), medical imaging, disease diagnosis, and collaborative drug discovery.
Computer Vision:
Federated learning for the computer vision platform reached various interesting applications due to privacy concerns, high cost, and data sensitivity. Federated learning provides a personalized framework from decentralized datasets on edge devices, beneficial in many computer vision tasks, including image classification, image segmentation, and object detection. FedCV and FedVision are the recently evolved platforms to support the development of federated learning-powered computer vision applications.
Cyber Security:
Diverse real-time technologies are affected by cyber-attacks and threats. Thus cyber security is essential in today’s scenario. Federated learning consistently helps cyber security by providing data security and privacy for the depletion of cyber threats. The outstanding applicability of federated learning in cyber security has varied opportunities such as intrusion detection, edge computing, anomaly detection, privacy preservation of the users in social media applications, industrial Internet of Things (IoT), and blockchain.
Edge Computing:
Edge computing is becoming more ubiquitous as it generates data from highly distributed edge devices. Edge computing enabling technology needs more security, privacy, communication cost, and reliability due to its distributed computing platform. Federated learning is extremely suitable for edge computing and imparts a cooperative framework for edge computing network optimization. Malware detection, anomaly detection, computation offloading, content caching, task scheduling, and resource allocation are some common edge computing services using federated learning. Smart health care and Unmanned Aerial Vehicles are the recent application of federated learning-based edge computing.
Internet of Things (IoT):
IoT engages a huge range of smart applications. However, it ineffectively handles distributed and private data with a centralized machine learning framework. Federated learning yields a promising solution for IoT by enabling a decentralized framework without exchanging the private data of the end-user to the central server. The areas of IoT that incorporate the federated learning concept are fifth and sixth-generation wireless networks, various smart applications, and several Internet of Everything (IoE) applications.
Smart Intrusion Detection Systems:
Intrusion detection with federated learning is effectively applied to healthcare and transport systems. Federated provides on-device learning and privacy conservancy for smart intrusion detection systems via a decentralized learning paradigm while persevering user data locally. Federated learning-enabled intrusion detection systems have several practical scenarios such as heterogeneous anomaly detection, varied attacks detection, data privacy preservation, low-power IoT devices, and computer networks.
Internet of Vehicles:
Internet of Vehicles employs federated learning to protect data privacy and transmission overhead depletion in wireless communications. Vehicular Service Providers (VSP) and Smart Vehicles (SVs) are accurately and safely implemented using federated learning. Driving safety, traffic efficiency, and entertainment applications are the recent Internet of Vehicles scenarios that efficaciously adopt federated learning.
Natural Language Processing:
User data privacy is essential in many Natural Language Processing (NLP) tasks, and thereby federated learning assists NLP tasks in providing a decentralized learning platform without exploiting the user's private data. Sentence-level text intent classification, google keyboard suggestions, and medical name entity recognition is the current NLP tasks implemented using federated learning.
Robotics and Automation:
In the current era, robotics and automation are highly evolving technological development because of their operating speed in computing and communication. Owing to the real-time collaboration of distributed devices, decentralized technology is necessary to increase the intelligence level of robotic and autonomous systems. Federated learning paves the optimal way for robotics and automation by enabling privacy-preserved deep learning techniques at distributed edges. Robot picking operations and automated industrial systems recently adopted federated learning under automation and robotics.
Smart City:
Federated learning facilitates a significant role in smart city technologies for preserving privacy from multiple devices. With its cooperative framework for smart city technologies, federated learning manages various domains such as IoT, transportation systems, aviation systems, finance, medical field, and communication field. In smart city sensing applications to transfer smart city services, federated learning is utilized to overcome user privacy protection, user incentives, data trustworthiness, and data quality maintenance.
Under the remarkable outcome of federated learning for various applications globe-wide, some of the critical challenges remain unsolved.
Privacy protection:
Methods of federated learning evolve to provide privacy prevention of original data and discover protecting private data leakage while model transition remains unresolved.
Communication cost:
Communication efficiency is the major contribution of federated learning; communication efficiency for large devices is needed to improve.
Systems heterogeneity:
Heterogeneous data, hardware, and networks are required to be addressed, owing to the increase of system implementing devices.
Unreliable model upload:
Unreliable and unstable network models must be updated properly in federated learning-based network models to avoid unexpected errors.
Even though the challenges faced by the federated learning are discussed in this above section, several future research directions in federated learning are still to be investigated, and some of them are briefly listed below:
New models of asynchrony:
Novel device-centric communication models ahead synchronous and asynchronous training need to be discovered for better-distributed network optimization.
Heterogeneity diagnostics:
Most recent works develop federated learning methods for resolving statistical heterogeneity. Techniques to identify both system and statistical heterogeneity are entailing to be evolved.
Coarseprivacy constraints:
Although privacy is the main contribution of federated learning, more granular level privacy restrictions need to invent for sample or device-specific privacy protection.
Federated learning in manufacturing:
New methods to tackle the practical issues of federated learning in production are concept drift, diurnal variations, and cold start problems.
Multi-center federated learning:
Focusing on multi-center federated learning to tackle heterogeneous challenges in federated learning is a promising direction in future works.
Dependableclient selection:
The implementation of reliable client selection to assure the reliability of federated learning and regarded as an extensive research direction in the future.