Conversation
…ent an inteface method
| @@ -0,0 +1,1030 @@ | |||
| [ | |||
There was a problem hiding this comment.
Why is there a JSON file? Didn't I just see the same content in a CSV file?
There was a problem hiding this comment.
Yes, initially I had the resource information in a Google Sheet and was building the parser based on CSV, but I realized JSON was a better way to go. The application supports both right now, but I may move it to entirely JSON.
| FTPClient ftpClient = new FTPClient(); | ||
|
|
||
| ftpClient.connect(getFtpServer()); | ||
| ftpClient.enterLocalPassiveMode(); |
There was a problem hiding this comment.
Might need an option to toggle this (passive/active mode) for specific resources. It's been a problem in the past, hasn't it?
There was a problem hiding this comment.
Yes - I'll add it in if any of the resources accessed by FTP won't work without active mode during testing.
src/main/java/org/reactome/release/resourcechecker/FileResourceChecker.java
Outdated
Show resolved
Hide resolved
src/main/java/org/reactome/release/resourcechecker/WebPageResourceChecker.java
Show resolved
Hide resolved
| <!-- | ||
| integration tests run by default when running the command "mvn verify"; can be overridden using the option | ||
| "-DskipITs=true" | ||
| --> | ||
| <skipITs>false</skipITs> |
There was a problem hiding this comment.
It looks like they can also just change this <skipITs> value as well
There was a problem hiding this comment.
Yes, that's true but changing the default value in the file changes the configuration in a more permanent way. I set it to skipping ITs as false, because I think we would like to run integration tests by default, but the command-line option gives a convenient way to override that value on a one-time basis without affecting the default behaviour of running all tests (integration and unit tests).
|
|
||
|
|
There was a problem hiding this comment.
Yeah :). I'll remove it. The Main class is under development and still messy :).
| return resourceInfoJson; | ||
| } | ||
|
|
||
| private void checkMandatoryAttributesExist(String originalRecord) throws IllegalArgumentException { |
There was a problem hiding this comment.
I know you said this is a WIP so will probably plan to do this anyways, but for this method I'm a little unsure how it accomplishes what it does. I get the gist of it from the name of the method, but putting all these values into the Map and such is something I don't quite understand -- could comments be added?
There was a problem hiding this comment.
It is the how or the why that's unclear?
There was a problem hiding this comment.
I would say both, but more of the 'how'.
There was a problem hiding this comment.
The map holds the attributes of the resource that should be included in the JSON that describes the resource. The key for the map is the name of the attribute that should be there and the value is the call to the appropriate "getter" method which should be able to provide that value since the record is now stored in the Resource object after being constructed.
If any of the values in the map give a null or empty value (checked for in the for-loop iterating over the entries in the map), it means something is wrong because a value that should have been available after the construction of the Resource object is not available. This triggers an IllegalStateException because the Resource object is not in the expected state of having values for fundamental information about the resource it represents. This is either because the JSON provided didn't specify that information or something went wrong in providing that information during the construction of the object. Either way, it's an illegal state and should cause an exception to check the problem with the resource info and/or code.
| // public String getHeaderAsTSVString() { | ||
| // return String.join("\t", getHeaderNameToValueMap().keySet()); | ||
| // } | ||
| // | ||
| // public String getValuesAsTSVString() { | ||
| // return String.join("\t", getHeaderNameToValueMap().values()); | ||
| // } |
There was a problem hiding this comment.
No, actually this was when the output was going to be in CSV format. Good catch - I'll remove it.
| */ | ||
| default boolean isFileSizeAcceptable(long previousFileSize, double acceptablePercentageDrop) { | ||
| long differenceInFileSize = getFileSize() - previousFileSize; | ||
| double percentChangeInFileSize = differenceInFileSize * 100.0d / previousFileSize; |
There was a problem hiding this comment.
Does it ever happen where a resource is suddenly much larger than it used to be? I think that is covered by this code but I'm not sure... Maybe change the name of acceptablePercentageDrop to acceptablePercentageChange?
There was a problem hiding this comment.
A file increase of any amount is currently acceptable, but you're right - I want to implement something to alert us to a suspicious jump in size. I'm just thinking of what the threshold would/should be.
| * @see #isFileSizeAcceptable(long, double) | ||
| */ | ||
| default boolean isFileSizeAcceptable(long previousFileSize) { | ||
| final double acceptableFileSizePercentageDrop = 5.0; |
There was a problem hiding this comment.
I know for older release files we had the heuristic of a 10% change in file sizes being deemed suspicious. Any reason for the 5% threshold now? Also, should this value be configurable?
There was a problem hiding this comment.
Yeah, I thought of 10% initially, but I wanted to err on the side of caution a bit more so I went (somewhat arbitrarily) with 5% with the caveat that we can change it as we get experience with the feedback we get with the automated checking.
For configuration, this method is overloaded allowing you to pass in a second parameter of the percentage drop threshold that will trigger the check to fail.
src/main/java/org/reactome/release/resourcechecker/ResourceCheckerFactory.java
Show resolved
Hide resolved
| // Path exampleFile = Paths.get(".", "example.html"); | ||
| // Files.deleteIfExists(exampleFile); | ||
| // Files.write(exampleFile, page.getBytes(), StandardOpenOption.CREATE); |
| while (fileSize > BYTE_UNIT_CONVERSION_FACTOR) { | ||
| magnitudeOfByteConversionFactor += 1; | ||
| fileSize = fileSize / BYTE_UNIT_CONVERSION_FACTOR; | ||
| } |
There was a problem hiding this comment.
Since the number of possible magnitudes is fixed (and relatively small), a loop is not really necessary here. A few if-else statements would work just as well.
There was a problem hiding this comment.
Okay - I'll take a look at reworking this.
|
Regarding the external resource json/csv files -- is there a plan for keeping everything in sync? Is the envisioned usage to be to use this new module to confirm everything will work for release, and for cases where it flags that it couldn't find the file then we would be notified and need to change it this modules files as well as the affected release step? |
The CSV file I think is going to be phased out and replaced with a JSON file. I'm still giving some thought about the best way to keep the reference to the release process up to date in a straightforward way. For the usage, yes, this project is designed to check things outside of Reactome needed/used by the release process and give us warning that something is missing or incorrect and we need to contact the maintainer of the resource and/or change something on our side. |
SolomonShorser-OICR
left a comment
There was a problem hiding this comment.
There are a few outstanding issues that are not yet resolved:
- CSV/JSON input file - please commit to a format.
- Files from resources being larger than expected.
- Filesize chages from resources not being captured in the input file that should track this.
If they are not resolved within this PR, that's OK but in that case I'd like to see separate issues opened to track them, before approving this PR, just so these issues don't get lost.
Fair enough. I don't want this PR to get too large, so I'll create issues for them and reference them from this PR before it's closed. |
Issues for these have been opened: #6: CSV/JSON input file - please commit to a format. |
The structure of the code is such that the
ResourceParserclass will take in a CSV or JSON file with records that describe properties for each resource. A list ofResourceobjects will be returned be returned after parsing. TheResourceobjects will be iterated over and the proper type ofResourceCheckerwill be created using theResourceCheckerFactory. TheResourceCheckerwill do its work ensuring the resource exists and is valid and a report will be produced.The code is still a work in progress right now (some aspects of Main and specific types of ResourceChecker classes are still being written), but it is far enough along for a review and merging the existing source.