-
Notifications
You must be signed in to change notification settings - Fork 0
Home
BayesianDisaggregation is an R package that solves a fundamental problem in data analysis: coherently projecting structure from disaggregated spaces onto aggregated data through uncertain intermediaries.
At its core, the package addresses situations where you need to relate information at incompatible levels of granularity, but the only way to do so is through a proxy that isn't completely reliable. The solution incorporates this uncertainty honestly and propagates it through to the final disaggregation.
You frequently encounter data at different resolutions that need to be related:
- Variable X: Aggregated data (e.g., Consumer Price Index at city/national level)
- Variable Y: Disaggregated data (e.g., economic gains by sector)
- Variable Z: An imperfect proxy for disaggregation (e.g., Gross Value Added by sector)
The challenge: How do you decompose X using the structure from Y when your mapping (Z) is uncertain?
BayesianDisaggregation implements a principled Bayesian framework that:
- Extracts likelihood signal from your prior disaggregation weights via PCA/SVD
- Performs Bayesian updating to transform prior weights into posterior weights that incorporate uncertainty
- Provides multiple update rules (weighted, multiplicative, Dirichlet, adaptive) for different use cases
- Validates results through coherence, stability, and interpretability metrics
- Delivers analytical solutions avoiding MCMC computational overhead
This package represents a methodological contribution to econometrics and data science, providing the first analytical solution to the structure transfer problem with uncertain intermediaries.
The package's breakthrough lies in recognizing that PCA on temporally-centered disaggregation weights yields exactly the likelihood signal needed for Bayesian updating. This enables:
- Formal uncertainty quantification rather than ad-hoc treatment
- Analytical solutions for tractable sample sizes
- Transparent propagation of proxy uncertainty to final results
Problem. You often want to compare an aggregate series
What this package does. The package treats
What we did not find in the literature. We did not find a prior, named methodology that (i) takes an uncertain structural intermediary
- Biproportional balancing / RAS / IPF. Classic RAS adjusts a matrix to match new margins by multiplying rows/columns iteratively (Deming & Stephan, 1940, pp. 28–29; overviews referencing Bacharach, 1970). It doesn’t construct a likelihood from PCA/SVD nor deliver a probabilistic posterior on the simplex for use as disaggregation weights. It’s a deterministic reconciliation method for totals, not a Bayesian structure-transfer with an uncertain intermediary.
- Temporal disaggregation (Denton, Chow–Lin, Fernández). These methods distribute low-frequency aggregates into higher frequency using a related indicator series, via smoothness/BLU estimators (Eurostat, 2013, pp. 79–98; see Denton-type formulations; Chow–Lin and Fernández are discussed there). They do not address cross-sectional disaggregation with simplex-valued weights nor use PCA-derived likelihoods.
- Forecast reconciliation for hierarchies (e.g., MinT). This reconciles inconsistent forecasts across aggregation trees by projecting onto a coherent subspace (Wickramasuriya et al., 2019, pp. 1–3). It is forecast-centric and linear-algebraic—not a Bayesian update for compositional weights with PCA-likelihoods.
-
Compositional/Bayesian state-space models exist (e.g., Dirichlet evolutions), but we did not find an analytical 1-step update that (a) builds its likelihood from PCA/SVD of
$Z$ and (b) outputs posterior weights to disaggregate a different aggregate series$X_t$ .
Takeaway. To our knowledge, the package provides a new, analytical approach to the “structure transfer with uncertain intermediary” problem in the specific setting of time-varying, simplex-valued weights: it uses PCA via SVD on
Deming, W. E., & Stephan, F. F. (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. The Annals of Mathematical Statistics, 11(4), 427–444. https://doi.org/10.1214/aoms/1177731829 (see method description on pp. 428–430 for the iterative proportional fitting idea). ([apps.bea.gov][1])
Eurostat. (2013). Handbook on quarterly national accounts (2013 edition). Publications Office of the European Union. (See Chapter “Benchmarking and temporal disaggregation,” esp. the Denton formulation and BLU approaches, pp. 79–98.)
International Labour Organization (ILO), International Monetary Fund (IMF), Organisation for Economic Co-operation and Development (OECD), Eurostat, United Nations Economic Commission for Europe (UNECE), & The World Bank. (2020). Consumer Price Index Manual: Concepts and Methods. IMF. (Aggregation structure and weighting practices are explained throughout; see e.g. Ch. 3 on index number theory and aggregation). ([Scribd][2])
Wickramasuriya, S. L., Athanasopoulos, G., & Hyndman, R. J. (2019). Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. Statistica Sinica, 30(4), 1555–1586. (Preprint version available as arXiv:1805.07245; see Sections 1–2 for the reconciliation setup.) ([Scribd][3])
-
On novelty. Because adjacent areas are vast, we phrase novelty as “to our knowledge, we did not find …” rather than an absolute first. The distinctive combination here is: (i) uncertain intermediary
$Z$ treated as a prior on the simplex, (ii) likelihood built from PCA via SVD on time-centered$Z$ , and (iii) analytical (non-MCMC) posterior used to disaggregate an unrelated aggregate$X_t$ . -
On CPI examples. Public CPI databases (e.g., headline CPI and coarse categories) typically lack rich sectoral disaggregation tied to national accounts—hence the need to transfer structure from a proxy like
$Z$ (e.g., value added) rather than rely on directly observed$X_{t,k}$ . The CPI Manual (2020) documents aggregation frameworks and weights at a conceptual level but does not provide a ready-made cross-sectional mapping suited to your use case (see CPI Manual, 2020). [1]: https://apps.bea.gov/scb/pdf/2008/05%20May/0508_methods.pdf "An Empirical Review of Methods for Temporal Distribution ..." [2]: https://www.scribd.com/document/915498814/Input-Output-Analysis-Foundations-and-Extensions-2nd-edition-Ronald-E-Miller-pdf-version "Input Output Analysis Foundations and Extensions 2nd" [3]: https://www.scribd.com/document/794344468/Book4-SVD "Book4 SVD | PDF | Principal Component Analysis"
This package enables analyses that were previously impossible:
- Which sectors are truly driving inflation? Decompose CPI by economic activity to identify inflation sources
- How do price shocks differentially affect industries? Understand sector-specific impacts of monetary policy
- What are the real sectoral price dynamics? Track inflation patterns at the industry level
These are questions policymakers need answered but couldn't address with existing tools. No traditional methods exist for disaggregating CPI by economic sector because the mapping between consumer prices and productive sectors is inherently uncertain.
The framework generalizes to any domain with the structure transfer problem:
Relate global brain activity (aggregated) to specific cognitive functions (disaggregated) using imperfect anatomical mappings. Understand which brain regions contribute to observed EEG/MEG signals while accounting for spatial uncertainty.
Distribute global climate projections to regional levels using uncertain downscaling models. Project temperature/precipitation changes from coarse climate models to local watersheds while quantifying projection uncertainty.
Allocate national mortality rates to specific subpopulations using imperfect demographic proxies. Decompose country-level disease burden to demographic groups when direct measurements are unavailable.
Reconstruct high-frequency components from compressed signals using approximate dictionaries. Recover detailed structure from aggregated measurements in compressed sensing applications.
Transfer knowledge from source domains to granular target domains through noisy intermediate representations. Apply domain adaptation when the mapping between domains is uncertain.
# 1. Extract likelihood from prior weights
L <- compute_L_from_P(prior_matrix) # PCA/SVD → likelihood signal
# 2. Spread likelihood temporally
LT <- spread_likelihood(L, T_periods, pattern = "recent")
# 3. Bayesian update to get posterior
W <- posterior_adaptive(P, LT) # Or weighted/multiplicative/dirichlet
# 4. Validate results
metrics <- list(
coherence = coherence_score(P, W, L),
stability = stability_composite(W),
interpretability = interpretability_score(P, W)
)- Weighted Average: Linear combination with mixing parameter λ
- Multiplicative: Element-wise product with renormalization
- Dirichlet Mean: Analytical conjugacy with concentration parameter γ
- Adaptive: Sector-specific mixing based on prior volatility
- Coherence: Alignment between posterior and likelihood signal
- Stability: Numerical (row-sum unity) and temporal (low variation)
- Interpretability: Structure preservation and plausible changes
# Install from GitHub
devtools::install_github("IsadoreNabi/BayesianDisaggregation")library(BayesianDisaggregation)
# Run complete pipeline
results <- bayesian_disaggregate(
path_cpi = "path/to/cpi.xlsx",
path_weights = "path/to/weights.xlsx",
method = "adaptive",
likelihood_pattern = "recent"
)
# Access disaggregated CPI
disaggregated_cpi <- results$posteriorBy providing the first principled method for CPI disaggregation, this package opens new analytical frontiers:
- Central banks can identify sector-specific inflation drivers for targeted policy
- Researchers can study differential price transmission across industries
- Analysts can decompose aggregate shocks into sectoral components
- Policymakers can design interventions based on granular inflation dynamics
The framework's generality means similar breakthroughs are possible wherever the structure transfer problem appears—from neuroscience to climate modeling.
- Comprehensive Manual - Full theoretical development and API reference
- Vignettes - Worked examples with real data
- Technical Paper - Mathematical foundations and proofs
We welcome contributions! For contributions, contact me via isadore.nabi@pm.me
If you use BayesianDisaggregation in your research, please cite:
@software{gomez2025bayesian,
author = {Gómez Julián, José Mauricio},
title = {BayesianDisaggregation: Coherent Structure Transfer Through Uncertain Proxies},
year = {2025},
url = {https://github.com/IsadoreNabi/BayesianDisaggregation}
}MIT License - see LICENSE file for details.
José Mauricio Gómez Julián
2025