EFSA One Health WGS System helping tools for Programmatic Submission
Prepares CLI and API Programmatic Submission JSON files for the EFSA One Health WGS System
Calculates a compatibility report between two allelic profile result files calculated on the same sample sequences: one (base) coming from EFSA Analytical Pipeline and one (test) from a testing pipeline.
Definitions:
NumMissingLociTest: Number of missing loci in test with corresponding detected loci in base. The number of missing Loci on base is not accounted.AllelesDiff: different called alleles under the condition that the related loci have been detected on both base and test.MAX_DIFF, MIN_ALLELES_DIFF, MAX_ALLELES_DIFF: Thresholds species dependent
Rules:
| threshold | Description |
|---|---|
| Perfect | NumMissingLociTest + AllelesDiff == 0 |
| Acceptable | NumMissingLociTest + AllelesDiff <= MAX_DIFF |
| Warning | MIN_ALLELES_DIFF < AllelesDiff < MAX_ALLELES_DIFF |
| Fail | AllelesDiff => MAX_ALLELES_DIFF |
| Fail | NumMissingLociTest + AllelesDiff > MAX_DIFF |
Thresholds:
| Specie | MIN_ALLELES_DIFF |
MAX_ALLELES_DIFF |
MAX_DIFF |
|---|---|---|---|
| L. monocytogenes | 4 | 7 | 10 |
| Salmonella | 5 | 10 | 15 |
| STEC | 5 | 10 | 15 |
This example has been tested on Linux with OpenJdk 11. It should work on any Java enabled Linux machine.
- Download the last release (https://github.com/adipi71/EOHsender/releases/latest),
- Extract EOHsender.tar.gz
- Launch on test data
# launch eoh-sender on the test data
sh eoh-sender.sh etc/EfsaOhWgs/comparealleles/exampleOfCompareInputEXTENDED.tsv
# launch eoh-compatibility on the test data
sh eoh-sender.sh etc/EfsaOhWgs/exampleOfManualInput.tsv| MANUAL INPUT | ||||
| TYPE | FieldName | Mandatory | Description | Example |
| CODES | localRawReadId | Yes | See EFSA Guidance | rawr1234 |
| sampleLocalId | Yes | “ | smpl1234 | |
| isolationLocalId | Yes | “ | isolate1234 | |
| EPIDATA | isolateSpecieCode | Conditional | See EFSA Guidance. Mandatory if cfg:epidataSubmission=‘Y’ |
RF-00003072-PAR |
| samplingCountryId | Conditional | “ | IT | |
| samplingYear | Conditional | “ | 2023 | |
| samplingMatrixCode | Conditional | “ | A02QE | |
| samplingMatrixFreeText | Conditional | “ | CHEESE | |
| ANALYSIS | libraryLayoutCode | Yes | See EFSA Guidance | 2 |
| AnalyticalPipelineInfoTag | Yes | “ | XXX pipeline | |
| MLSTSequenceType.Software | Yes | “ | mlst 2.23.0 | |
| FILES | file:R1 | Yes | Illumina R1 or Iontorrent fastq file. | rawr1234/fastq/rawr1234_R1_001.fastq.gz |
| file:R2 | Conditional | R2 fastq file. Mandatory if libraryLayoutCode=2 |
rawr1234/fastq/rawr1234_R2_001.fastq.gz | |
| file:fqc | Yes | Fastq QC result file (FastQC) | rawr1234/fastqc/rawr1234_SRC_raw.csv | |
| file:fastp | Yes | rawr1234/fastqc/rawr1234_SRC_raw.csv | ||
| file:quast | Yes | Assembly QC result file (Quast) | rawr1234/spades/rawr1234_quast.csv | |
| file:checkm | Yes | Assembly QC result file (Checkm) | rawr1234/spades/bin_stats.analyze.tsv | |
| file:bowtie | Yes | Assembly Coverage calculated through mapping with one of the best reference | rawr1234/bowtie/rawr1234_bowtie_import_coverage_full.csv | |
| file:mlst | Yes | MLST result file (https://github.com/tseemann/mlst) | rawr1234/mlst/rawr1234_mlst.tsv | |
| SALMONELLA FILES |
file:sistr | Yes for Salmonella | Serotype. | rawr1234/sistr/sistr.tsv |
| file:seqsero | Yes for Salmonella | “ | rawr1234/seqsero2/SeqSero_result.tsv | |
| file:seq_typing | Yes for Salmonella | Serotype. | rawr1234/seq_typing/seq_typing.report.txt | |
| STEC FILES |
file:stx_typing | Yes for STEC | Stx subtyping. | rawr1234/seq_typing/seq_typing.report.txt |
| file:ecoh | Yes for STEC | |||
| file:ectyper | Yes for STEC | “ | ||
| file:innuendo | Yes for STEC | “ | ||
| dir:patho_typing | Yes for STEC | Pathotype directory of patho_typing (https://github.com/B-UMMI/patho_typing/) It is expected to find 3 files in the directory: patho_typing.report.txt, patho_typing.extended_report.txt, rematchModule_report.txt | rawr1234/patho_typing/ | |
| ALLELES FILE |
file:alleles | Conditional | Allelic profile result file (chewBBACA gt 2.8.5) Mandatory if cfg:crc32transform=‘Y’ |
rawr1234/chewbbaca/rawr1234_chewbbaca_results_alleles.tsv |
| CONFIG/SETTING | cfg:cgmlstschema | Conditional | Path of chewie-ns schema. Mandatory if cfg:crc32transform=‘Y’ |
/path/to/chewie-ns/schema/ |
| cfg:crc32transform | No | Put Y in case you need to transform allelic profiles in crc32 format | Y | |
| cfg:epidataSubmission | No | Put Y in case you intend to submit the epidemiological data | Y | |
| cfg:outputDir | No | results will be put here | /dir/output | |
| cfg:inputDir | No | base dir for the files | /dir/input | |
| MANUAL INPUT | ||||
| TYPE | FieldName | Mandatory | Description | Example |
| FILES | species | Yes | L.monocytogenes,Salmonella,STEC | L.monocytogenes |
| baseAllelicProfileFile | Yes | Base file | smpl1234.tsv | |
| testAllelicProfileFile | Yes | Test file | smpl1234-test.tsv | |
| THRESHOLDS | MAX_DIFF | No | Thresholds species dependent (See rules) | 4 |
| MAX_ALLELES_DIFF | No | “ | 7 | |
| MIN_ALLELES_DIFF | No | “ | 10 | |
| CONFIG/SETTING | cfg:outputDir | No | results will be put here | /dir/output |
| localRawReadId | sampleLocalId | isolationLocalId | isolateSpecieCode | samplingCountryId | samplingYear | samplingMatrixCode | samplingMatrixFreeText | libraryLayoutCode | AnalyticalPipelineInfoTag | MLSTSequenceType.Software | file:R1 | file:R2 | file:fqc | file:quast | file:bowtie | file:mlst | file:alleles | cfg:cgmlstschema | cfg:crc32transform | cfg:epidataSubmission | cfg:outputDir | cfg:inputDir |
| rawr1234 | smpl1234 | isolate1234 | RF-00000251-MCG | IT | 2019 | A02QE | FORMAGGIO | 2 | NGSManager | mlst 2.23.0 | rawr1234/fastq/rawr1234_R1_001.fastq.gz | rawr1234/fastq/rawr1234_R2_001.fastq.gz | rawr1234/fastq/rawr1234_SRC_raw.csv | rawr1234/spades/rawr1234_quast.csv | rawr1234/bowtie/rawr1234_bowtie_import_coverage_full.csv | rawr1234/mlst/rawr1234_mlst.tsv | rawr1234/chewbbaca/rawr1234_chewbbaca_results_alleles.tsv | chewie_lm | Y | Y | /base/dir/output | /base/dir/input |
| rawr5678 | smpl5678 | isolate5678 | RF-00003072-PAR | IT | 2019 | A02QE | FORMAGGIO | 2 | XXX | mlst 2.23.0 | rawr5678/fastq/rawr5678_R1_001.fastq.gz | rawr5678/fastq/rawr5678_R2_001.fastq.gz | rawr5678/fastq/rawr5678_SRC_raw.csv | rawr5678/spades/rawr5678_quast.csv | rawr5678/bowtie/rawr5678_bowtie_import_coverage_full.csv | rawr5678/mlst/rawr5678_mlst.tsv | rawr5678/chewbbaca/rawr5678_chewbbaca_results_alleles.tsv | chewie_lm | Y | /base/dir/output | /base/dir/input | |
| rawrXYZ | smplXYZ | isolateXYZ | RF-00003072-PAR | IT | 2019 | A02QE | FORMAGGIO | 2 | YYYY | mlst 2.23.0 | rawrXYZ/fastq/rawrXYZ_R1_001.fastq.gz | rawrXYZ/fastq/rawrXYZ_R2_001.fastq.gz | rawrXYZ/fastq/rawrXYZ_SRC_raw.csv | rawrXYZ/spades/rawrXYZ_quast.csv | rawrXYZ/bowtie/rawrXYZ_bowtie_import_coverage_full.csv | rawrXYZ/mlst/rawrXYZ_mlst.tsv | rawrXYZ/chewbbaca/rawrXYZ_chewbbaca_results_alleles.tsv | chewie_lm | Y | Y | /base/dir/output | /base/dir/input |
| species | baseAllelicProfileFile | testAllelicProfileFile | cfg:outputDir | MAX_DIFF | MAX_ALLELES_DIFF | MIN_ALLELES_DIFF |
| STEC | /stec/base.tsv | /stec/test.tsv | /tmp/ | 15 | 10 | 5 |
| L.monocytogenes | base.tsv | test.tsv | /tmp/ |