Track cumulative epsilon consumption across queries with sequential, parallel, and advanced (Renyi) composition. Visualize budget exhaustion in real time, set maximum budgets, and export audit logs for compliance documentation.
Set your maximum privacy budget and delta parameter. Add queries below to track epsilon consumption over time.
Differential privacy provides a mathematical framework for quantifying the privacy loss incurred when releasing statistics or machine learning models trained on sensitive data. The privacy budget, parameterized by epsilon, represents the maximum amount of information about any individual record that can leak through query responses. Every query against the data consumes a portion of this budget, and once the budget is exhausted, no further queries can be answered without breaking the privacy guarantee. This makes budget management the central operational challenge in any differential privacy deployment.
The privacy guarantee states that for any two datasets differing in exactly one record, the probability of any particular output changes by at most a factor of e^epsilon. An epsilon of 0 means the output distribution is identical regardless of whether any individual is in the dataset, providing perfect privacy but zero utility. As epsilon increases, more information leaks but query results become more accurate. The practical challenge is finding the smallest epsilon that still provides actionable analytics results.
Sequential composition is the simplest and most conservative privacy accounting method. When multiple queries access the same data, the total privacy loss is the sum of individual epsilons. If you run three queries with epsilons 0.5, 1.0, and 0.3, the total privacy cost is 1.8. This is a worst-case bound that assumes each query leaks information about the same individual in the worst possible direction. Sequential composition is the correct accounting method when queries access overlapping data and you want the strongest possible guarantee. However, it is often overly conservative because real queries rarely achieve the worst-case information leakage simultaneously. For a production system running hundreds or thousands of queries, sequential composition quickly exhausts even generous budgets.
Parallel composition provides dramatically better privacy accounting when queries operate on disjoint subsets of the data. If you partition your data into non-overlapping groups and each query only accesses one partition, the total privacy cost is the maximum individual epsilon, not the sum. Running 100 queries on 100 different partitions, each with epsilon 1.0, costs a total of only epsilon 1.0, compared to 100.0 under sequential composition. This hundred-fold improvement is the foundation of privacy-efficient analytics architectures. Systems like Google's RAPPOR and Apple's local differential privacy leverage parallel composition by having each user's data contribute to a separate computation. The key requirement is genuine disjointness: if any record appears in multiple partitions, you must fall back to sequential composition for those overlapping queries.
Advanced composition, formalized through Renyi Differential Privacy (RDP) and zero-concentrated differential privacy (zCDP), provides a middle ground between sequential and parallel composition. The key insight is that privacy loss across multiple queries accumulates sub-linearly when measured using Renyi divergence instead of max divergence. For k adaptive queries each satisfying epsilon-DP, advanced composition gives a total privacy cost of approximately epsilon * sqrt(2k * ln(1/delta)) + k * epsilon * (exp(epsilon) - 1). For small epsilon, this simplifies to roughly epsilon * sqrt(2k * ln(1/delta)). The improvement over sequential composition grows with the number of queries: for 100 queries with epsilon 0.1, sequential composition gives 10.0, while advanced composition gives approximately 1.3 (with delta = 1e-5), an 8x improvement. Advanced composition introduces a delta parameter representing the probability that the privacy guarantee fails entirely. Delta should be cryptographically small, typically 1/n^2 where n is the dataset size, or at most 1e-5 for practical deployments.
Effective budget allocation requires understanding which queries provide the most analytical value per epsilon spent. High-sensitivity queries (those requiring more noise and thus more budget) should be reserved for the most critical business questions. Count queries typically require sensitivity 1 and can use small epsilon. Sum and average queries have sensitivity proportional to the data range, requiring either clipping (which introduces bias) or more budget. Iterative machine learning training via DP-SGD consumes budget at each training step, making the number of training epochs a direct budget trade-off. The privacy budget calculator above tracks consumption in real time, so you can see exactly how different allocation strategies affect your remaining budget and make informed decisions about which queries to prioritize.
The choice of epsilon in production systems varies enormously. Apple uses epsilon values between 2 and 8 for iOS telemetry, collecting aggregate statistics about emoji usage, Safari crashes, and health data while providing meaningful privacy for individual users. Google's RAPPOR system for Chrome browser metrics uses epsilon 2 to 3 for individual reports, with overall privacy guarantees depending on the number of reports per user. The US Census Bureau used epsilon 19.6 for the 2020 Decennial Census, which drew criticism from privacy researchers who argued the value was too large. LinkedIn's Audience Engagements API uses epsilon 5.0 for advertising analytics. Academic research typically considers epsilon below 1.0 as strong privacy, 1.0 to 10.0 as moderate, and above 10.0 as weak. However, these thresholds are context-dependent: epsilon 10.0 for a one-time query on a static dataset may be acceptable, while epsilon 1.0 for a continuously running analytics pipeline with thousands of queries may exhaust the budget rapidly under sequential composition.
Differential privacy budget tracking is increasingly a regulatory requirement, not just a best practice. The EU AI Act requires high-risk AI systems to implement data minimization and privacy-by-design, and differential privacy is the gold standard mechanism. GDPR's data protection impact assessments (DPIAs) should include epsilon budgets as a quantitative privacy measure. Financial regulators like the OCC (Office of the Comptroller of the Currency) expect privacy accounting in model risk management documentation for models trained on customer data. The export feature in this calculator produces audit-ready logs documenting every query, its epsilon allocation, composition method, and cumulative budget consumption, which can be included directly in compliance documentation. Maintaining a complete epsilon audit trail is essential for demonstrating that the privacy guarantee was not violated during the lifetime of the data release.
The privacy budget epsilon measures how much information about any individual record can leak through query responses. An epsilon of 0 means perfect privacy (no information leaks), while larger values allow more leakage. In practice, epsilon values between 0.1 and 1.0 provide strong privacy guarantees, 1.0 to 10.0 provide moderate guarantees, and above 10.0 is considered weak privacy.
Sequential composition applies when queries access the same data. Total privacy loss is the sum of individual epsilons. Parallel composition applies when queries access disjoint subsets. Total loss is the maximum individual epsilon, not the sum. Parallel composition can provide orders-of-magnitude improvements in budget efficiency.
Advanced composition provides tighter accounting by recognizing that privacy loss grows sub-linearly across queries. For k queries each with epsilon e, the total is approximately e * sqrt(2k * ln(1/delta)) instead of k * e. For 100 queries with epsilon 0.1, this reduces the cost from 10.0 to approximately 1.3.
When the budget is exhausted, no more queries can be answered without violating the privacy guarantee. The data must be refreshed, parameters relaxed, or the system must stop responding. Budget exhaustion is permanent for a given dataset.
For healthcare and financial PII, use epsilon 0.1 to 1.0. For general analytics with pseudonymized data, 1.0 to 5.0 is common. Apple uses epsilon 2-8 for telemetry, Google RAPPOR uses 2-3 for Chrome metrics. Start with the lowest epsilon that provides acceptable utility.