auto population scaling #1779

mnjowe · 2026-01-15T06:44:58Z

This PR addresses issue #1778

Scope

Establish a single helper function for scaling population DataFrames to census totals
Support both flat DataFrames (single-level columns) and MultiIndex column DataFrames
Handle common input shapes where:
- date is already the index
- date is provided as a column and must be promoted to the index

marghe-molaro · 2026-01-21T07:50:53Z

tests/test_utils.py

+    # =========================================================
+    # Positive: Flat columns (date already index)
+    # =========================================================
+    census_pop = 1_500_000


Hi @mnjowe, I've just started reviewing this but I have a question!
I thought I remembered from one of the quarterly meeting discussions that we wanted to make the scaling census-year dependent (so that if the simulation starts in 2010 with a sim_pop_size_2010, but the nearest most available census is in e.g. 2014, scaling factor would then be sim_pop_size_2014 (which would just be the number of alive individuals in the sim in 2014) over the 2014 census pop. Is that the idea here, but the user would have to manually compute the simulated pop size at the time of the census?
Or is this PR not related to your generalisation of the demography module to Tanzania?

That's absolutely right @marghe-molaro.

This PR is related to my generalisation of the demography module to Tanzania

scaling should indeed be census year dependent as you have explained above

sim_pop_size_census_year should come from the population dataframe indexing the census year

scaling factor should then be sim_pop_size_census_year/census_pop

Suggestion

I can modify the auto scaling function to receive population dataframe, census pop and census year thereby automating even the process of generating the scale factor. The only challenge is that this assumes every simulation run logs demography data which I'm not sure is always the case.

Thank you very much for clarifying @mnjowe!
I think given that this function would only ever be used in post-processing, it would be good for this scaling factor to be computed 'behind the scenes', s.t. the user never has to worry about what input to pass to the function in order to calculate it, as is currently done in the util function extract_results: if the user ops to do_scaling when extracting results, the scaling_factor is just looked up in the log.
So I think ideally we would retain this logic, but update how the scaling factor is computed - i.e. demography should schedule an event to log the scaling factor in the year of the census, based on the simulated pop size in that year.

Many thanks @marghe-molaro. I like your approach.
I will look more into the extract_result function to see how I can go updating the auto scaling function

I think then you can pause your review for now until I push the updated version

tbhallett · 2026-01-21T15:44:04Z

Hi @mnjowe and @marghe-molaro
Yes, I agree with all of that (getting scale_factor from "behind the scenes". Tip: as well as the demography log, it's also logged in the 'population' log which is ALWAYS on! That's where we get it from in extract_results)

@mnjowe --- please could you also just describe again the "use case" for this function? (What it does that extract_results() doesn't do).

mnjowe · 2026-01-22T08:16:21Z

Hi @tbhallett . I have gone through the extract_results() function.
I think its doing most of what I wanted in this PR. The only thing we had suggested that I feel is not included is the ability to make a scale factor that's census year based.

With @marghe-molaro we were discussing a situation where census was conducted in a different year than 2010. In that case we discussed that a scale factor should be obtained by considering a model population of those alive in that year over the total census population of that year.

Another thing which is minor but important i think is that extract_results() assumes everyone is running the analysis via azure. I don't think this will always be the case with the new researchers?

Suggestion

I guess it could have been better if the existing scaling could have been made as an independent function and called within extract_results() or any other function that's scaling data not run via azure? Thereby also providing an opportunity for further updates on the scaling.

In any case I think I should update the existing function rather than developing a whole new one. I din't realise that we have a function already in place that's almost doing all I wanted.

marghe-molaro · 2026-01-22T09:34:40Z

Hi @mnjowe,
Why do you say that extract_results is only available for Azure results? I don't think there's anything preventing it from being used on local runs?
The function to retrieve the scaling factor is defined within the extract results function (get_multiplier), but I don't think there's any need for it to be modified, as it is the logging of the scaling factor that should be updated

tbhallett · 2026-01-22T11:04:12Z

Yes, I think the main thing we need to focus on is how that value that is logged as the scaling factor is computed. As you both rightly pointed out, the existing code hard-wires that the 2010 population size (starting population) is compared to a reference population size (in this case, the WPP?).

TLOmodel/src/tlo/methods/demography.py

Lines 636 to 651 in 85ec43b

    
               def compute_initial_model_to_data_popsize_ratio(self, initial_population_size): 
        
                   """Compute ratio of initial model population size to estimated population size in 2010. 
        
                   Uses the total of the per-region estimated populations in 2010 used to 
        
                   initialise the simulation population as the baseline figure, with this value 
        
                   corresponding to the 2010 projected population from [wpp2019]_. 
        
                   .. [wpp2019] World Population Prospects 2019. United Nations Department of 
        
                   Economic and Social Affairs. URL: 
        
                   https://population.un.org/wpp/Download/Standard/Population/ 
        
                   :param initial_population_size: Initial population size to calculate ratio for. 
        
                   :returns: Ratio of ``initial_population`` to 2010 baseline population. 
        
                   """ 
        
                   return initial_population_size / self.parameters['pop_2010']['Count'].sum()

I think that still is OK, as long as we have a WPP value for 2010 for everywhere else (which I think we should).

But, I know that you're keen that we actually do the calibration to a census, the year of which could be ANYTHING.

So, I think the changes needed for that are:
1 - add in a parameter the designates the YEAR of the census.
2 - schedule an event for that year.
3 - let that event run code analogous to the above, and log it in all the same places, letting the key denote that this is the 'census derived scale factor'
4- include an option in extract_results you use one or other of these two possible scaling factors.

(Steps 2 & 3 may require some light refactoring to avoid code duplication),

auto population scaling

dba83cf

mnjowe requested review from marghe-molaro and tbhallett January 15, 2026 06:44

mnjowe self-assigned this Jan 15, 2026

mnjowe added the framework label Jan 15, 2026

mnjowe linked an issue Jan 15, 2026 that may be closed by this pull request

Helper function for population scaling #1778

Open

mnjowe and others added 5 commits January 15, 2026 09:47

Merge branch 'master' into mnjowe/auto-population-scalling

98880b5

removing unused import

bb211fc

re-ordering imports

0cdd854

re-ordering imports

186ee25

Merge branch 'master' into mnjowe/auto-population-scalling

50e15ff

marghe-molaro reviewed Jan 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auto population scaling #1779

auto population scaling #1779

Uh oh!

mnjowe commented Jan 15, 2026

Uh oh!

marghe-molaro Jan 21, 2026 •

edited

Loading

Uh oh!

mnjowe Jan 21, 2026

Uh oh!

marghe-molaro Jan 21, 2026 •

edited

Loading

Uh oh!

mnjowe Jan 21, 2026

Uh oh!

mnjowe Jan 21, 2026

Uh oh!

tbhallett commented Jan 21, 2026 •

edited

Loading

Uh oh!

mnjowe commented Jan 22, 2026

Uh oh!

marghe-molaro commented Jan 22, 2026

Uh oh!

tbhallett commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

auto population scaling #1779

Are you sure you want to change the base?

auto population scaling #1779

Uh oh!

Conversation

mnjowe commented Jan 15, 2026

Uh oh!

marghe-molaro Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mnjowe Jan 21, 2026

Choose a reason for hiding this comment

Suggestion

Uh oh!

marghe-molaro Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mnjowe Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

mnjowe Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

tbhallett commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mnjowe commented Jan 22, 2026

Suggestion

Uh oh!

marghe-molaro commented Jan 22, 2026

Uh oh!

tbhallett commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

marghe-molaro Jan 21, 2026 •

edited

Loading

marghe-molaro Jan 21, 2026 •

edited

Loading

tbhallett commented Jan 21, 2026 •

edited

Loading