Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi @rasbt,
This PR addresses issue #1092 regarding incorrect indexing when using feature_groups in feature_importance_permutation.
Changes made:
Shuffling Logic: Instead of rng.shuffle, which failed on multi-column slices, I implemented rng.permutation(X.shape[0]) to ensure consistent row-wise shuffling across all columns within a group.
Module Export: Updated mlxtend/evaluate/init.py to correctly export feature_importance_permutation from the feature_importance.py module.
Verification (Jupyter Notebook Test):
To verify the fix, I tested the function with a group of non-informative features vs. one highly informative feature. The results confirmed that the shuffling is now working correctly for grouped indices.
Test Case:
Feature 1 is highly correlated with target. Features 2 & 3 are noise.
X = np.column_stack([feature_1, feature_2, feature_3])
feature_groups_idx = [0, [1, 2]]
mean_importance_vals, _ = feature_importance_permutation(
predict_method=model.predict, X=X, y=y, metric='r2',
num_rounds=10, feature_groups=feature_groups_idx, seed=42
)
Results:
Importance of Feature_1 (Group 0): 1.9084
Importance of Features 2&3 (Group 1): 0.0247
Status: SUCCESS (The model correctly identified the first group as the primary driver).
Note on Local Testing Issues:
During development, I encountered an issue where pytest could not locate the test files or the module in some Windows environments.
Error: file or directory not found: mlxtend/evaluate/tests/test_feature_importance.py
Reason: This appears to be a path resolution/PYTHONPATH issue on Windows and a naming inconsistency between the module (feature_importance.py) and the expected test imports.
Suggestions for Future Contributors:
Standardize Module Naming: Consider renaming feature_importance.py to feature_importance_permutation.py to match the User Guide and prevent ImportError.
Environment Setup: Always run tests using $env:PYTHONPATH = "." on Windows.
Expand Test Suite: Add a specific test case in test_feature_importance.py that handles 2D array slices to ensure no regressions in group shuffling.