Skip to content

Weather Data Cleaning #144

@s2t2

Description

@s2t2

NOTE: this issue references a future state of the code that has not yet been synced out to github

We see the weather data uses extreme values like -9999.0 to signal a null or bad value. For example in the "TempC" column.

To avoid downstream issues when using this data in simulation, we should model those values as null or nan instead.

We should then also make sure the Replay Weather Controller appropriately handles / drops those null values (instead of using a very extreme negative number). Using the interpolation strategy, the absence of one hour won't negatively impact the results because the weather controller will use the average between the closest two hours it does have.

For the "TempC" column, the following dates have a single instance of these bad / extreme values:

2025:

  • 04-02
  • 06-06
  • 07-07
  • and possibly others not included since the current data snapshot ends before the end of the year

2024:

  • 06-04
  • 07-04
  • 08-03
  • 11-16
  • 11-24
  • 12-24

2023:

  • 04-20

As a temporary workaround we are performing the filtering downstream:

wc = ReplayWeatherController()
# or access from the environment:
# wc = env.building.simulator.weather_controller
wc.weather_df = wc.weather_df[ wc.weather_df["TempC"] != -9999.0]
# now you can use the weather controller

Metadata

Metadata

Assignees

Labels

internalTo be resolved by a Google staff member.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions