Skip to content

Conversation

@Khushmagrawal
Copy link
Contributor

@Khushmagrawal Khushmagrawal commented Nov 24, 2025

Reference Issues/PRs

Partially towards #554
Blocked by #667

What does this implement/fix? Explain your changes.

Adds support for Zero-inflated distribution

Does your contribution introduce a new dependency? If yes, which one?

No.

What should a reviewer concentrate their feedback on?

  • Correctness of the implementation
  • Design decision of making the lower bound inclusive in truncated.py

Did you add any tests for the change?

No

Any other comments?

After the ZeroInflated implementation is reviewed, I plan to add ZeroInflatedPoisson and ZeroInflatedNegativeBinomial as well using _DelegatedDistribution

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators
  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

@fkiraly fkiraly changed the title [ENH] add zero-inflated distribution [ENH] zero-inflated distribution Nov 27, 2025
@fkiraly fkiraly added enhancement module:probability&simulation probability distributions and simulators labels Nov 27, 2025
@Khushmagrawal
Copy link
Contributor Author

Fixed the off-by-one errors causing test_ppf_and_cdf to fail. The issue was due to floating-point error when q_rescaled is divided by p. Resolved by subtracting a small epsilon from q_rescaled (similar to #568).
The workflow passes on my fork with this change

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice!

Could we please move the modifications to TruncatedDistribution to another PR, since I think there is some discussion to be had about how to parameterize the options, e.g., why only lower inclusive and not upper inclusive, etc.

The zero inflated distribution looks good.

@Khushmagrawal
Copy link
Contributor Author

This can now be re-reviewed as #667 is merged

@Khushmagrawal
Copy link
Contributor Author

@fkiraly, just bringing this to your attention this needs a review. Thanks!

def _truncated_distribution(self) -> TruncatedDistribution:
# TODO: return self.distribution if it is naturally non-negative.
# requires support tag (check #244)
if (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unintuitive behaviour, in a case where distribution already is a TruncatedDistribution, with a lower` of 1, say. IT gets replaced by double wrapped truncation.

I would suggest to consider not manually truncating, but merely suggesting to the user that distribution is non-negative.
Then we do not need to take care of too many modificiation cases that are too hard to explain.

index=None,
columns=None,
):
if isinstance(p, np.ndarray) and p.ndim == 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would advise to make p broadcastable, by using the broadcast tags. Then you do not need to handle it manually.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

Two main comments above:

  • make p use the broadcast logic built into the base class, instead of replacing that with a different logic. That way it is consistent.
  • I would also avoid truncating, actually. I would treat the distribution as a special mixture, instead, this also reduces hard to track case distinctions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement module:probability&simulation probability distributions and simulators

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants