Implement JSON serialization for estimators #32#1153
Open
mariam851 wants to merge 2 commits intorasbt:masterfrom
Open
Implement JSON serialization for estimators #32#1153mariam851 wants to merge 2 commits intorasbt:masterfrom
mariam851 wants to merge 2 commits intorasbt:masterfrom
Conversation
Owner
|
Thanks for the PR. Could you also add a brief documentation for this? E.g., you could consider this other utils one as an example: https://github.com/rasbt/mlxtend/blob/master/docs/sources/user_guide/utils/Counter.ipynb to create a docs/sources/user_guide/utils/serialization.ipynb file (these jupyter notebooks are automatically converted to the web documentation each version release) |
296d57e to
635063b
Compare
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Contributor
Author
|
Hi @rabst, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi @rasbt,
I hope you're doing well.
I have implemented the JSON serialization utilities for mlxtend as discussed in issue #32. My goal was to provide a reliable, human-readable alternative to pickle that avoids versioning conflicts and platform dependencies.
Why this implementation is robust:
Dynamic Reconstruction: Instead of requiring the user to manually instantiate a model before loading, I used importlib to store and recall the module and class. This makes the load_model_from_json function truly "smart" and polymorphic.
Custom Type Handling: I implemented a specialized MlxtendEncoder to bridge the gap between NumPy and JSON. It handles ndarrays, numpy scalars, and provides a safe fallback mechanism to prevent serialization crashes.
State Integrity: The implementation ensures that "fitted" attributes (identified by the trailing underscore _) are correctly cast back to NumPy arrays upon loading, preserving the exact state of the estimator for immediate inference.
Decoupled Design: By placing these utilities in mlxtend.utils.serialization, I ensured the logic is centralized and easily maintainable without bloating the individual estimator classes.
Validation:
Unit Tests: Added tests in mlxtend/utils/tests/test_serialization.py. Verified with Perceptron that the model's weights and predict output remain identical after a round-trip save/load.
Code Quality: The code has been linted and formatted using black, isort, and flake8 to match the project's standards.
I've put a lot of thought into making this extensible for other estimators in the library. Looking forward to your feedback!
Fixes #32