Skip to content

Conversation

@SimonBlanke
Copy link
Collaborator

Reference Issues/PRs

resolves #559

What does this implement/fix? Explain your changes.

The source code changes are just the isinstance check for distribution objects and using iat/iloc for scalars or arrays. This PR also adds multiple tests and the _CompositeDistributionTester (which is used in the new tests).

Does your contribution introduce a new dependency? If yes, which one?

No

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Any other comments?

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators
  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant!

Could you extend this to one of the existing composite distributions where this is currently done manually, and ensure this works for them? I.e., remove the custom _iloc etc and check whether it works.

@SimonBlanke
Copy link
Collaborator Author

Could you extend this to one of the existing composite distributions where this is currently done manually, and ensure this works for them? I.e., remove the custom _iloc etc and check whether it works.

I successfully used the base-methods for TruncatedDistribution, MeanScale and LeftTruncated. I tried these two, but ran into some issues: Hurdle, TransformedDistribution.

The five distributions I tested show, that this needs some additional work for all distributions.

How should we proceed? Should I:

  • Fix this for all (or most?) distributions in this PR
  • or leave it as is and fix the rest in subsequent PRs?

@fkiraly
Copy link
Collaborator

fkiraly commented Nov 29, 2025

Can you explain why it is not as easy as just delete the custom _iat and _iloc? What happens if you do that?

@SimonBlanke
Copy link
Collaborator Author

Can you explain why it is not as easy as just delete the custom _iat and _iloc? What happens if you do that?

For the Hurdle-dist we need some different numpy handling. It uses numpy broadcasting, where a parameter with fewer columns (e.g., shape (2, 1)) broadcasts across a distribution with more columns (e.g., shape (2, 3)). When subsetting columns, the base implementation assumes all parameters have the same shape and tries to subset columns from parameters that don't have enough columns, causing an IndexError.

I haven't looked that much into the TransformedDistribution-dist, but it looks like a very special case...

@fkiraly
Copy link
Collaborator

fkiraly commented Nov 29, 2025

ok - are there any cases that we can remove without adaptation? If so, we could just remove those that we can and leave the others as future todos?

@SimonBlanke
Copy link
Collaborator Author

ok - are there any cases that we can remove without adaptation? If so, we could just remove those that we can and leave the others as future todos?

I do not see any at the moment. Here are some of composite dist. and some of the problems or complexities I see:

Hurdle - Numpy broadcasting with undersized params
Mixture - Composite list-of-distributions structure
Empirical - Different data model (MultiIndex)
IID - Broadcasting mode flag
QPD_Empirical - Inherits Empirical

@fkiraly
Copy link
Collaborator

fkiraly commented Dec 5, 2025

so the addition is basically dead code then?

@SimonBlanke
Copy link
Collaborator Author

so the addition is basically dead code then?

This is difficult for me to interpret. Is it:

  • "Based on the conversation the PR is unnecessary or unused"
  • or "is there any more value to extract?"

This PR successfully uses base methods for 3 composite distribution, removes ~85 lines of source-code and provides multiple tests. Even though the issue is not fully solved, I see this PR as a success. It provides a starting point for more refactoring to align the different composite distributions. I could go further and continue working on more distributions (if that is required), but I see this PR as mergeable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENH] Default _iloc and _iat to include handling of composite distribution objects

2 participants