Skip to content

Investigate whether short circuit if statements speed up task sample processing #843

@EricSchrock

Description

@EricSchrock

The mortality_prediction, drug_recommendation, length_of_stay_prediction, and readmisison_prediction tasks all exclude visits that do not have "conditions" AND "procedures" AND "drugs". The first three exclude them with the following pattern.

if len(conditions) * len(procedures) * len(drugs) == 0:
    continue

However, the readmission_prediction tasks short circuits the generation of the procedures and drugs lists if there are no conditions and short circuits the generation of the drugs list if there are no procedures.

conditions = ...
if len(conditions) == 0:
    continue

procedures = ...
if len(procedures) == 0:
    continue

drugs = ...
if len(drugs) == 0:
    continue

The two approaches are logically equivalent. The question is whether the latter is more performant. It's hard to say what Python does with this under the hood.

I did a quick test on one of the small demo/synthetic datasets (I don't remember which one) and there was not a noticeable difference in sample generation time. However, it's very possible we would notice a difference between the two approaches if we ran sample generation on the full MIMIC3 and/or MIMIC4 datasets.

Run said full dataset test/s and if the performance is noticeably different, update all the tasks to use the short circuit ifs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions