Add NMR data model, integration with EnzymeML, and peak assignment features#13
Add NMR data model, integration with EnzymeML, and peak assignment features#13
Conversation
|
@torogi94 Thanks for this!
|
|
Thank you for looking over it! Here, a few comments:
Good catch! I really did forget to declare the new dependencies in the requirements.txt file. I think you are right that we should not necessarily introduce them to the core of NMRpy. How about we declare them as optional nmrpy[enzymeml]? For the NMRpy data model, however, it would be easier if we could make pydantic a core dependency. If I am not mistaken, it is the only one required for the data model to work. If not, I can restructure my code, so the data model part becomes entirely optional, too, though.
We are preparing a new stable pyenzyme (v2) release which we can then pin in the requirements. In my experience, it is already running better and will be more maintainable than the previous v1 version. With regard to md-models, the NMRpy data model is not dependent at all on the md-models version. Actually, it is not even a dependency needed, as all functionalities are handled via pydantic, which is itself a very well maintained library!
Absolutely. I’ll merge main into my branch and fix the conflict manually so you can have a clean merge without conflicts!
Sorry for the noisy diff, it seems I accidentally let my linter loose on the two files and did not even notice... I’ll revert the linting changes from this PR and submit them in a separate PR (like discussed). |
I agree with making pydantic a core dependency and the others as optional in On another note. I also have a conda build recipe as well as a CI job that builds and distributes a conda package. This has been very easy to date because NMRPy is a pure python package and there are/were conda packages available for all of the dependencies (channel conda-forge). Is PyEnzyme released as a conda package? I would like to continue distributing conda packages, just as we do for PySCeS as well.
Okay great.
Give me a ping once you are ready with all of this then I'll proceed with the detailed code review 😄 |
|
I think I have addressed all points mentioned so far, so you may continue with the detailed code review now!
I am in contact with @JR-1991 about the |
|
I have addressed a few more issues we discussed in pull requests #14 and #15 and merged them with this pull request:
|
|
@jmrohwer: In #18, I have added handling of t0 values in |
b9db28b to
8182bfc
Compare
- Initialisation of enzymeml_document and concentrations properties of FidArray class with initial None value led to Error. Check for None added to fix this issue. - Add correct optional dependency name to setup.py
- Move the setup of Peak objects from the Peak(Range)Assigner to the deconvolution methods to prevent uninitialised Peak objects in data model. - Update create_enzymeml() method to reflect changes in pyenzyme library. - Add species property to FidArray, similar to the deconvoluted_integrals property.
Due to an issue with md-models, pydantic>=2.10.0 is currently causing errors. In the meantime, a version restraint has been added to the requirements.txt
Fix type checking bug in enzymeml_document property of FidArray class.
* Add create_new_enzymeml_measurement() method - FidArray now has a create_new_enzymeml_measurement() method that acts as the interface for creating new Measurement objects either per GUI or script. - FidArray enzymeml_document property now checks for existence of at least one Measurement object in adder method. - plotting.py now has a skeleton MeasurementCreator class that will later be the GUI for creating new Measurement objects. - utils.py now has a utility function that creates an EnzymeML Measurement object from the parameters passed to it. * Add multiple Measurement support - Add MeasurementCreator widget as GUI for create_new_enzymeml_measurement() method - Update apply_to_enzymeml() method - Update create_enzymeml() method
* Update EnzymeML species handling - Add species type flags to get_species_from_enzymeml() util function to allow filtering for specific types of species. - Remove unnecessary display of proteins in PeakAssigner and PeakRangeAssigner using the new species type flags. * Add flags for keeping data models upon saving * Change data array handling of NMRpy data model - Add save_data_model() method to serialise the NMRpy data model. - Change handling of data arrays: They are now saved as numpy.ndarrays in each Fid object and only copied as lists into the data model upon serialisation. * Update data_objects.py - Resolve Pydantic serialisation issues (complex → string conversion). - Optimise processing loops for faster execution and lower memory usage. * Change keep_data_model flag to False
… save, not set to None
- Added T0Tab and T0Logic classes to utils.py - Added T0Adder widget class to plotting.py - Added add_t0_to_enzymeml() method that can be used interactively with a Jupyter widget using gui=True or script-like with gui=False
Changed behaviour of Fid._setup_peak_objects(). Multiple peaks per range are handled properly now by using Fid._grouped_peaklist instead of Fid.peaks. Also refactored the method to make it more robust and readable overall.
- Added NMR parameters to fid_object setter - Fixed p0 and p1 phasing parameter assignment
8182bfc to
9f55722
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces an NMR-focused research data model aligned with md-models/EnzymeML, adds EnzymeML integration (via optional pyenzyme), and extends interactive widgets for peak assignment and measurement editing.
Changes:
- Added NMRpy/EnzymeML-aligned data model specifications and an auto-generated Pydantic schema (JSON-LD capable).
- Implemented EnzymeML helpers (species lookup, measurement construction, applying concentrations) and new peak/t0/measurement widgets.
- Updated packaging/tooling (optional extras for EnzymeML, dependency bumps, Ruff config) and added/extended tests.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| specifications/nmrpy.md | Adds written spec for the NMRpy data model objects/enums. |
| specifications/datamodel_schema.md | Adds Mermaid class diagram for the data model structure. |
| setup.py | Adds EnzymeML optional extras and updates author metadata. |
| ruff.toml | Introduces Ruff formatting/lint configuration. |
| requirements.txt | Updates baseline dependencies and adds pydantic. |
| nmrpy/utils.py | Adds EnzymeML utility helpers (species, measurements, serialization helpers). |
| nmrpy/tests/nmrpy_tests.py | Adds tests for new data model + EnzymeML-related setters. |
| nmrpy/plotting.py | Adds peak assignment + t0/measurement widgets and integrates EnzymeML utilities. |
| nmrpy/nmrpy_model.py | Adds generated Pydantic models for the NMRpy data model (JSON-LD fields). |
| nmrpy/data_objects.py | Integrates data model + EnzymeML into Fid/FidArray and processing steps tracking. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from typing import Mapping, Optional | ||
|
|
||
| from pyenzyme.versions.v2 import Protein |
There was a problem hiding this comment.
This unconditional pyenzyme import will raise at import-time when pyenzyme isn't installed, even though the rest of the file treats it as an optional dependency. Move Protein into the existing try: block (or avoid importing it directly and use getattr(pyenzyme, ...) / isinstance(..., pyenzyme.Protein) inside the guarded section).
| self.__fid_object.raw_data = [str(datum) for datum in self.raw_data] | ||
| self.__fid_object.processed_data = [float(datum) for datum in self.data] | ||
| except AttributeError: |
There was a problem hiding this comment.
float(datum) will raise TypeError when self.data contains complex values (common before .real()), but only AttributeError is caught here. Update serialization logic to handle complex processed data (e.g., store strings for complex values) and/or catch TypeError to avoid hard crashes when accessing fid_object.
| self.__fid_object.raw_data = [str(datum) for datum in self.raw_data] | |
| self.__fid_object.processed_data = [float(datum) for datum in self.data] | |
| except AttributeError: | |
| # Always serialise raw_data as strings | |
| self.__fid_object.raw_data = [str(datum) for datum in self.raw_data] | |
| # Serialise processed_data, handling complex or otherwise non-floatable values gracefully | |
| processed_data = [] | |
| for datum in self.data: | |
| try: | |
| processed_data.append(float(datum)) | |
| except (TypeError, ValueError): | |
| # Fallback for complex or non-scalar values: store string representation | |
| processed_data.append(str(datum)) | |
| self.__fid_object.processed_data = processed_data | |
| except (AttributeError, TypeError): |
| ) | ||
| mz = lmfit.minimize(Fid._phased_data_sum, p, args=([data]), method=method) | ||
| phased_data = Fid._ps(data, p0=mz.params['p0'].value, p1=mz.params['p1'].value) | ||
| phased_data, p0, p1 = Fid._ps(data, p0=mz.params['p0'].value, p1=mz.params['p1'].value) |
There was a problem hiding this comment.
_ps() converts p0/p1 from degrees to radians, and now returns the radian values, which are then propagated to callers and stored in the data model. This changes external behavior (previously phase angles were effectively in degrees everywhere). Keep returning degrees (e.g., return the original degree inputs) or return both but ensure the values stored/exposed match the documented units (the UI output currently says degrees).
| if verbose: | ||
| print('Zero order: %d\tFirst order: %d\t (In degrees)'%(mz.params['p0'].value, mz.params['p1'].value)) | ||
| return phased_data | ||
| return phased_data, p0, p1 |
There was a problem hiding this comment.
_ps() converts p0/p1 from degrees to radians, and now returns the radian values, which are then propagated to callers and stored in the data model. This changes external behavior (previously phase angles were effectively in degrees everywhere). Keep returning degrees (e.g., return the original degree inputs) or return both but ensure the values stored/exposed match the documented units (the UI output currently says degrees).
| size = len(data) | ||
| ph = numpy.exp(1.0j*(p0+(p1*numpy.arange(size)/size))) | ||
| return ph*data | ||
| return ph*data, p0, p1 |
There was a problem hiding this comment.
_ps() converts p0/p1 from degrees to radians, and now returns the radian values, which are then propagated to callers and stored in the data model. This changes external behavior (previously phase angles were effectively in degrees everywhere). Keep returning degrees (e.g., return the original degree inputs) or return both but ensure the values stored/exposed match the documented units (the UI output currently says degrees).
| NMRpy "1" <|-- "1" Experiment | ||
| Experiment "1" <|-- "*" FIDObject | ||
| FIDObject "1" <|-- "1" Parameters | ||
| FIDObject "1" <|-- "1" ProcessingSteps | ||
| FIDObject "1" <|-- "*" Peak | ||
| Peak "1" <|-- "1" PeakRange |
There was a problem hiding this comment.
In Mermaid class diagrams, <|-- denotes inheritance, but these relationships read like composition/association (NMRpy has an Experiment, Experiment has many FIDObject, etc.). Use association/composition arrows (e.g., o-- / *-- / -->) to reflect containment rather than subclassing.
| NMRpy "1" <|-- "1" Experiment | |
| Experiment "1" <|-- "*" FIDObject | |
| FIDObject "1" <|-- "1" Parameters | |
| FIDObject "1" <|-- "1" ProcessingSteps | |
| FIDObject "1" <|-- "*" Peak | |
| Peak "1" <|-- "1" PeakRange | |
| NMRpy "1" *-- "1" Experiment | |
| Experiment "1" *-- "*" FIDObject | |
| FIDObject "1" *-- "1" Parameters | |
| FIDObject "1" *-- "1" ProcessingSteps | |
| FIDObject "1" *-- "*" Peak | |
| Peak "1" *-- "1" PeakRange |
| if (pyenzyme is None): | ||
| raise RuntimeError( |
There was a problem hiding this comment.
Raising RuntimeError in setUp() will fail the test run rather than cleanly skipping optional-dependency tests. Prefer self.skipTest(...) (or @unittest.skipUnless) so CI/users without the extra can still run the rest of the suite.
| if (pyenzyme is None): | |
| raise RuntimeError( | |
| if pyenzyme is None: | |
| self.skipTest( |
| NMRPy is developed by Johann Eicher and Johann Rohwer from the Laboratory for | ||
| Molecular Systems Biology, Dept. of Biochemistry, Stellenbosch University, | ||
| South Africa. | ||
| South Africa, as well as Torsten Giess from the Insitute of Biochemistry and |
There was a problem hiding this comment.
Correct spelling: 'Insitute' → 'Institute'.
| South Africa, as well as Torsten Giess from the Insitute of Biochemistry and | |
| South Africa, as well as Torsten Giess from the Institute of Biochemistry and |
| ) | ||
|
|
||
| def save_to_file(self, filename=None, overwrite=False): | ||
| def save_to_file(self, filename=None, overwrite=False, keep_data_model=False, keep_enzymeml=True): |
There was a problem hiding this comment.
The docstring says keep_data_model defaults to True, but the function signature sets keep_data_model=False. Update the docstring or the default to match actual behavior.
|
|
||
| :keyword overwrite: if True, overwrite existing file | ||
|
|
||
| :keyword keep_data_model: if True, keep the NMRpy data model (default is True) |
There was a problem hiding this comment.
The docstring says keep_data_model defaults to True, but the function signature sets keep_data_model=False. Update the docstring or the default to match actual behavior.
| :keyword keep_data_model: if True, keep the NMRpy data model (default is True) | |
| :keyword keep_data_model: if True, keep the NMRpy data model (default is False) |
md-models-based research data model for NMRpyenzymelibrary