-
Notifications
You must be signed in to change notification settings - Fork 16
auto population scaling #1779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
auto population scaling #1779
Conversation
| # ========================================================= | ||
| # Positive: Flat columns (date already index) | ||
| # ========================================================= | ||
| census_pop = 1_500_000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mnjowe, I've just started reviewing this but I have a question!
I thought I remembered from one of the quarterly meeting discussions that we wanted to make the scaling census-year dependent (so that if the simulation starts in 2010 with a sim_pop_size_2010, but the nearest most available census is in e.g. 2014, scaling factor would then be sim_pop_size_2014 (which would just be the number of alive individuals in the sim in 2014) over the 2014 census pop. Is that the idea here, but the user would have to manually compute the simulated pop size at the time of the census?
Or is this PR not related to your generalisation of the demography module to Tanzania?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's absolutely right @marghe-molaro.
- This PR is related to my generalisation of the demography module to Tanzania
- scaling should indeed be census year dependent as you have explained above
- sim_pop_size_census_year should come from the population dataframe indexing the census year
- scaling factor should then be
sim_pop_size_census_year/census_pop
Suggestion
I can modify the auto scaling function to receive population dataframe, census pop and census year thereby automating even the process of generating the scale factor. The only challenge is that this assumes every simulation run logs demography data which I'm not sure is always the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for clarifying @mnjowe!
I think given that this function would only ever be used in post-processing, it would be good for this scaling factor to be computed 'behind the scenes', s.t. the user never has to worry about what input to pass to the function in order to calculate it, as is currently done in the util function extract_results: if the user ops to do_scaling when extracting results, the scaling_factor is just looked up in the log.
So I think ideally we would retain this logic, but update how the scaling factor is computed - i.e. demography should schedule an event to log the scaling factor in the year of the census, based on the simulated pop size in that year.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks @marghe-molaro. I like your approach.
I will look more into the extract_result function to see how I can go updating the auto scaling function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think then you can pause your review for now until I push the updated version
|
Hi @mnjowe and @marghe-molaro @mnjowe --- please could you also just describe again the "use case" for this function? (What it does that |
|
Hi @tbhallett . I have gone through the With @marghe-molaro we were discussing a situation where census was conducted in a different year than 2010. In that case we discussed that a scale factor should be obtained by considering a model population of those alive in that year over the total census population of that year. Another thing which is minor but important i think is that SuggestionI guess it could have been better if the existing scaling could have been made as an independent function and called within In any case I think I should update the existing function rather than developing a whole new one. I din't realise that we have a function already in place that's almost doing all I wanted. |
|
Hi @mnjowe, |
|
Yes, I think the main thing we need to focus on is how that value that is logged as the scaling factor is computed. As you both rightly pointed out, the existing code hard-wires that the 2010 population size (starting population) is compared to a reference population size (in this case, the WPP?). TLOmodel/src/tlo/methods/demography.py Lines 636 to 651 in 85ec43b
I think that still is OK, as long as we have a WPP value for 2010 for everywhere else (which I think we should). But, I know that you're keen that we actually do the calibration to a census, the year of which could be ANYTHING. So, I think the changes needed for that are: (Steps 2 & 3 may require some light refactoring to avoid code duplication), |
This PR addresses issue #1778
Scope
- date is already the index
- date is provided as a column and must be promoted to the index