Federated Learning Introduction

Q: How does the FedAvg aggregation algorithm work?

FedAvg (Federated Averaging) works in rounds. Each round, the server sends the current global model to selected clients. Each client trains the model on its local data for one or more epochs, producing updated weights. The server then computes a weighted average of all client model updates, where the weight for each client is proportional to the size of its local dataset. This averaged model becomes the new global model for the next round. FedAvg is communication-efficient because it requires far fewer rounds than sending individual gradient updates.

Q: What is a privacy budget in federated learning?

A privacy budget, measured in epsilon (ε), quantifies the maximum privacy loss allowed during training. It comes from differential privacy theory. A smaller epsilon means stronger privacy guarantees but typically reduces model accuracy. Each training round consumes some privacy budget because model updates can leak information about the underlying data. Once the budget is exhausted, no more training can occur without exceeding the privacy guarantee. Common epsilon values range from 1 (strong privacy) to 10 (moderate privacy).

Q: What are the main security risks in federated learning?

The main security risks include model poisoning (malicious clients sending manipulated gradients to corrupt the global model), gradient inversion attacks (reconstructing client training data from shared gradient updates), free-rider attacks (clients sending random updates without training to benefit from the global model), and inference attacks on the aggregated model. Defenses include secure aggregation (encrypting individual updates so the server only sees the aggregate), robust aggregation methods that detect outlier updates, differential privacy for gradient clipping, and Byzantine-fault-tolerant protocols.

Understanding Federated Learning

Federated learning is a paradigm shift in how machine learning models are trained. In traditional centralized training, all data is collected in a single location, and the model trains on the entire dataset at once. This approach works well when data can be freely moved and centralized, but it breaks down when data is sensitive, regulated, or simply too large to transfer. Federated learning solves this by bringing the model to the data instead of bringing the data to the model. Each participating client keeps its data locally and only shares model updates, never raw data.

The concept was pioneered by Google in 2016 for training the next-word prediction model on Android keyboards. Millions of phones each trained a local model on the user's typing patterns, sent encrypted gradient updates to Google's servers, and received an improved global model in return. No individual typing data ever left any phone. Since then, federated learning has expanded to healthcare (training diagnostic models across hospitals without sharing patient records), finance (fraud detection across banks without sharing transaction data), and autonomous driving (sharing driving experience across vehicles without uploading sensor footage).

The FedAvg Algorithm

The most widely used federated learning algorithm is Federated Averaging (FedAvg), proposed by McMahan et al. in 2017. It works in communication rounds. At the start of each round, the aggregation server sends the current global model weights to a selected subset of clients. Each client initializes its local model with these global weights, then trains on its local data for one or more epochs using standard stochastic gradient descent. After local training, each client sends its updated model weights back to the server. The server computes a weighted average of all received updates, where each client's contribution is weighted by the number of local training samples. This averaged model becomes the new global model.

FedAvg is communication-efficient because clients perform multiple local training steps before communicating. This reduces the number of communication rounds needed for convergence compared to sending individual gradient updates after every batch. The trade-off is that more local steps can cause client models to diverge from each other, especially when data is non-IID (not identically distributed across clients). The simulator above lets you observe this effect directly by comparing IID and non-IID data distributions.

Non-IID Data: The Central Challenge

In real-world federated learning, each client typically has a different data distribution. A hospital in a rural area sees different patient demographics than a hospital in an urban center. A smartphone user in one country has different typing patterns than a user in another country. This non-IID (non-independently and identically distributed) data is the central challenge of federated learning. When clients train on very different data, their local model updates push the model in different directions. Averaging these conflicting updates slows convergence and reduces final accuracy.

The simulator demonstrates this with three distribution modes. In IID mode, each client receives a random, representative sample of the full dataset — this is the ideal case where federated learning performs closest to centralized training. In non-IID mode, each client's data is skewed toward certain classes, simulating realistic data heterogeneity. In extreme non-IID mode, each client has data from only one or two classes, representing the worst case. Watch how the accuracy curves diverge as you increase the non-IID severity.

Privacy Budget and Differential Privacy

Federated learning protects data by keeping it local, but the model updates themselves can leak information. Gradient inversion attacks can reconstruct training images from the shared gradients with surprising fidelity. Differential privacy addresses this by adding calibrated noise to the gradient updates before they leave the client. The amount of noise is controlled by the privacy budget epsilon (ε). A smaller epsilon means more noise and stronger privacy, but also lower model accuracy because the aggregation server receives noisier updates.

Each training round consumes a portion of the privacy budget. The privacy meter in the simulator tracks this consumption. When the budget is exhausted, no further training is possible without exceeding the privacy guarantee. This creates a fundamental trade-off: more training rounds improve accuracy but consume more privacy budget. In practice, teams must carefully balance the number of rounds, local epochs, noise level, and total budget to achieve acceptable accuracy within the privacy constraints. The Renyi Divergence-based accounting used in modern implementations provides tighter budget tracking than the original composition theorem.

Security Considerations

Federated learning introduces unique security challenges that do not exist in centralized training. Model poisoning attacks occur when malicious clients send carefully crafted gradient updates designed to corrupt the global model. Unlike data poisoning (where bad data is injected), model poisoning directly manipulates the training process. A Byzantine attacker can cause the global model to misclassify specific inputs while maintaining normal performance on others, making the attack difficult to detect.

Defenses include Byzantine-fault-tolerant aggregation methods (Krum, Trimmed Mean, Median) that identify and exclude outlier updates, secure aggregation protocols that encrypt individual client updates so the server can only compute the aggregate without seeing individual contributions, and anomaly detection on the pattern of updates over time. The LockML Threat Model Generator includes federated learning-specific threats in its assessment.

Beyond FedAvg: Advanced Algorithms

Several algorithms improve on FedAvg for specific scenarios. FedProx adds a proximal term that penalizes client models for diverging too far from the global model, improving stability with non-IID data. SCAFFOLD uses control variates to correct for client drift, achieving faster convergence. FedBN keeps batch normalization parameters local to each client, accommodating distribution differences without compromising the shared model. Personalized federated learning techniques train a shared base model while allowing each client to fine-tune top layers on local data, combining global knowledge with local specialization.

For production deployments, cross-silo federated learning (small number of reliable organizations) differs significantly from cross-device federated learning (millions of unreliable mobile devices). Cross-silo settings can afford synchronous communication and full participation, while cross-device settings must handle partial participation, stragglers, and device dropouts. The simulator models the cross-silo scenario where all clients participate in every round. Frameworks like Flower, PySyft, and NVIDIA FLARE provide production-ready implementations for both scenarios.

Practical Implementation Guide

To implement federated learning in production, start by evaluating whether your use case truly requires it. If you can centralize data without regulatory or competitive barriers, centralized training will be simpler and achieve higher accuracy. If data must stay distributed, begin with FedAvg and IID-similar data as a baseline. Measure the accuracy gap against centralized training on a test subset. Then introduce non-IID handling with FedProx or SCAFFOLD if the gap exceeds your tolerance. Add differential privacy with a generous initial budget (ε = 8-10) and tighten it as you gain experience with the accuracy trade-off. Finally, implement secure aggregation if the threat model requires protection against an honest-but-curious server.

Frequently Asked Questions

What is federated learning and why does it matter?

Federated learning is a machine learning approach where multiple clients collaboratively train a shared model without sharing raw data. Each client trains locally and sends only model updates (gradients or weights) to an aggregation server. This preserves privacy, reduces data transfer, and enables training on sensitive datasets that cannot be centralized due to regulations like GDPR, HIPAA, or CCPA. It was pioneered by Google for keyboard prediction and has since expanded to healthcare, finance, and autonomous driving.

How does the FedAvg aggregation algorithm work?

FedAvg works in rounds. The server sends the global model to clients, each trains locally for one or more epochs, then sends updated weights back. The server computes a weighted average of all client updates (weighted by local dataset size), producing the new global model. It is communication-efficient because multiple local steps reduce the total rounds needed, though more local steps can cause client divergence on non-IID data.

What is a privacy budget in federated learning?

A privacy budget, measured in epsilon (ε), quantifies maximum allowable privacy loss during training from differential privacy theory. Smaller epsilon means stronger privacy but lower accuracy. Each training round consumes some budget as model updates can leak information about underlying data. Common values range from 1 (strong privacy) to 10 (moderate). Once exhausted, no further training is possible without exceeding the privacy guarantee.

How does federated learning compare to centralized training in accuracy?

Federated learning typically achieves 1-5% lower accuracy than centralized training, primarily due to non-IID data across clients. However, it enables training on data that would be inaccessible otherwise, potentially resulting in a larger effective dataset and higher real-world accuracy. Techniques like FedProx, SCAFFOLD, and personalization layers can close the gap significantly.

What are the main security risks in federated learning?

Main risks include model poisoning (malicious clients sending manipulated gradients), gradient inversion (reconstructing training data from shared updates), free-rider attacks (fake updates to benefit from the global model), and inference attacks on aggregated models. Defenses include secure aggregation, Byzantine-fault-tolerant protocols, differential privacy for gradient clipping, and anomaly detection on update patterns.

Simulation Configuration

Client Nodes

Accuracy Curves

Client Loss Distribution

Training Log

Understanding Federated Learning

The FedAvg Algorithm

Non-IID Data: The Central Challenge

Privacy Budget and Differential Privacy

Security Considerations

Beyond FedAvg: Advanced Algorithms

Practical Implementation Guide

Frequently Asked Questions

Michael Lip

Federated Learning Introduction

Simulation Configuration

Client Nodes

Accuracy Curves

Client Loss Distribution

Training Log

Understanding Federated Learning

The FedAvg Algorithm

Non-IID Data: The Central Challenge

Privacy Budget and Differential Privacy

Security Considerations

Beyond FedAvg: Advanced Algorithms

Practical Implementation Guide

Frequently Asked Questions

Related Tools

ML Threat Model Generator

Data Poisoning Detection

ML Compliance Checker

Michael Lip