Skip to content

adipi71/EOHsender

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EOHsender

EFSA One Health WGS System helping tools for Programmatic Submission

Sender

Prepares CLI and API Programmatic Submission JSON files for the EFSA One Health WGS System

Compatibility report

Calculates a compatibility report between two allelic profile result files calculated on the same sample sequences: one (base) coming from EFSA Analytical Pipeline and one (test) from a testing pipeline.

Definitions:

  • NumMissingLociTest: Number of missing loci in test with corresponding detected loci in base. The number of missing Loci on base is not accounted.
  • AllelesDiff: different called alleles under the condition that the related loci have been detected on both base and test.
  • MAX_DIFF, MIN_ALLELES_DIFF, MAX_ALLELES_DIFF: Thresholds species dependent

Rules:

threshold Description
Perfect NumMissingLociTest + AllelesDiff == 0
Acceptable NumMissingLociTest + AllelesDiff <= MAX_DIFF
Warning MIN_ALLELES_DIFF < AllelesDiff < MAX_ALLELES_DIFF
Fail AllelesDiff => MAX_ALLELES_DIFF
Fail NumMissingLociTest + AllelesDiff > MAX_DIFF

Thresholds:

Specie MIN_ALLELES_DIFF MAX_ALLELES_DIFF MAX_DIFF
L. monocytogenes 4 7 10
Salmonella 5 10 15
STEC 5 10 15

Usage example

This example has been tested on Linux with OpenJdk 11. It should work on any Java enabled Linux machine.

  1. Download the last release (https://github.com/adipi71/EOHsender/releases/latest),
  2. Extract EOHsender.tar.gz
  3. Launch on test data
# launch eoh-sender on the test data
sh eoh-sender.sh etc/EfsaOhWgs/comparealleles/exampleOfCompareInputEXTENDED.tsv

# launch eoh-compatibility on the test data
sh eoh-sender.sh etc/EfsaOhWgs/exampleOfManualInput.tsv

Input files format

eoh-sender: tsv input file specifications

MANUAL INPUT



TYPE FieldName Mandatory Description Example
CODES localRawReadId Yes See EFSA Guidance rawr1234

sampleLocalId Yes smpl1234

isolationLocalId Yes isolate1234





EPIDATA isolateSpecieCode Conditional See EFSA Guidance.
Mandatory if cfg:epidataSubmission=‘Y’
RF-00003072-PAR

samplingCountryId Conditional IT

samplingYear Conditional 2023

samplingMatrixCode Conditional A02QE

samplingMatrixFreeText Conditional CHEESE





ANALYSIS libraryLayoutCode Yes See EFSA Guidance 2

AnalyticalPipelineInfoTag Yes XXX pipeline

MLSTSequenceType.Software Yes mlst 2.23.0





FILES file:R1 Yes Illumina R1 or Iontorrent fastq file. rawr1234/fastq/rawr1234_R1_001.fastq.gz

file:R2 Conditional R2 fastq file.
Mandatory if libraryLayoutCode=2
rawr1234/fastq/rawr1234_R2_001.fastq.gz

file:fqc Yes Fastq QC result file (FastQC) rawr1234/fastqc/rawr1234_SRC_raw.csv

file:fastp Yes
rawr1234/fastqc/rawr1234_SRC_raw.csv

file:quast Yes Assembly QC result file (Quast) rawr1234/spades/rawr1234_quast.csv

file:checkm Yes Assembly QC result file (Checkm) rawr1234/spades/bin_stats.analyze.tsv

file:bowtie Yes Assembly Coverage calculated through mapping with one of the best reference rawr1234/bowtie/rawr1234_bowtie_import_coverage_full.csv

file:mlst Yes MLST result file (https://github.com/tseemann/mlst) rawr1234/mlst/rawr1234_mlst.tsv





SALMONELLA
FILES
file:sistr Yes for Salmonella Serotype. rawr1234/sistr/sistr.tsv

file:seqsero Yes for Salmonella rawr1234/seqsero2/SeqSero_result.tsv

file:seq_typing Yes for Salmonella Serotype. rawr1234/seq_typing/seq_typing.report.txt





STEC
FILES
file:stx_typing Yes for STEC Stx subtyping. rawr1234/seq_typing/seq_typing.report.txt

file:ecoh Yes for STEC


file:ectyper Yes for STEC

file:innuendo Yes for STEC

dir:patho_typing Yes for STEC Pathotype directory of patho_typing (https://github.com/B-UMMI/patho_typing/) It is expected to find 3 files in the directory: patho_typing.report.txt, patho_typing.extended_report.txt, rematchModule_report.txt rawr1234/patho_typing/





ALLELES
FILE
file:alleles Conditional Allelic profile result file (chewBBACA gt 2.8.5)
Mandatory if cfg:crc32transform=‘Y’
rawr1234/chewbbaca/rawr1234_chewbbaca_results_alleles.tsv





CONFIG/SETTING cfg:cgmlstschema Conditional Path of chewie-ns schema.
Mandatory if cfg:crc32transform=‘Y’
/path/to/chewie-ns/schema/

cfg:crc32transform No Put Y in case you need to transform allelic profiles in crc32 format Y

cfg:epidataSubmission No Put Y in case you intend to submit the epidemiological data Y

cfg:outputDir No results will be put here /dir/output

cfg:inputDir No base dir for the files /dir/input






























eoh-compatibility: tsv input file specifications

MANUAL INPUT



TYPE FieldName Mandatory Description Example
FILES species Yes L.monocytogenes,Salmonella,STEC L.monocytogenes

baseAllelicProfileFile Yes Base file smpl1234.tsv

testAllelicProfileFile Yes Test file smpl1234-test.tsv





THRESHOLDS MAX_DIFF No Thresholds species dependent (See rules) 4

MAX_ALLELES_DIFF No 7

MIN_ALLELES_DIFF No 10





CONFIG/SETTING cfg:outputDir No results will be put here /dir/output






eoh-sender: input example

localRawReadId sampleLocalId isolationLocalId isolateSpecieCode samplingCountryId samplingYear samplingMatrixCode samplingMatrixFreeText libraryLayoutCode AnalyticalPipelineInfoTag MLSTSequenceType.Software file:R1 file:R2 file:fqc file:quast file:bowtie file:mlst file:alleles cfg:cgmlstschema cfg:crc32transform cfg:epidataSubmission cfg:outputDir cfg:inputDir
rawr1234 smpl1234 isolate1234 RF-00000251-MCG IT 2019 A02QE FORMAGGIO 2 NGSManager mlst 2.23.0 rawr1234/fastq/rawr1234_R1_001.fastq.gz rawr1234/fastq/rawr1234_R2_001.fastq.gz rawr1234/fastq/rawr1234_SRC_raw.csv rawr1234/spades/rawr1234_quast.csv rawr1234/bowtie/rawr1234_bowtie_import_coverage_full.csv rawr1234/mlst/rawr1234_mlst.tsv rawr1234/chewbbaca/rawr1234_chewbbaca_results_alleles.tsv chewie_lm Y Y /base/dir/output /base/dir/input
rawr5678 smpl5678 isolate5678 RF-00003072-PAR IT 2019 A02QE FORMAGGIO 2 XXX mlst 2.23.0 rawr5678/fastq/rawr5678_R1_001.fastq.gz rawr5678/fastq/rawr5678_R2_001.fastq.gz rawr5678/fastq/rawr5678_SRC_raw.csv rawr5678/spades/rawr5678_quast.csv rawr5678/bowtie/rawr5678_bowtie_import_coverage_full.csv rawr5678/mlst/rawr5678_mlst.tsv rawr5678/chewbbaca/rawr5678_chewbbaca_results_alleles.tsv chewie_lm Y
/base/dir/output /base/dir/input
rawrXYZ smplXYZ isolateXYZ RF-00003072-PAR IT 2019 A02QE FORMAGGIO 2 YYYY mlst 2.23.0 rawrXYZ/fastq/rawrXYZ_R1_001.fastq.gz rawrXYZ/fastq/rawrXYZ_R2_001.fastq.gz rawrXYZ/fastq/rawrXYZ_SRC_raw.csv rawrXYZ/spades/rawrXYZ_quast.csv rawrXYZ/bowtie/rawrXYZ_bowtie_import_coverage_full.csv rawrXYZ/mlst/rawrXYZ_mlst.tsv rawrXYZ/chewbbaca/rawrXYZ_chewbbaca_results_alleles.tsv chewie_lm Y Y /base/dir/output /base/dir/input

eoh-compatibility: input example

species baseAllelicProfileFile testAllelicProfileFile cfg:outputDir MAX_DIFF MAX_ALLELES_DIFF MIN_ALLELES_DIFF
STEC /stec/base.tsv /stec/test.tsv /tmp/ 15 10 5
L.monocytogenes base.tsv test.tsv /tmp/


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages