mSCAPE Changelog¶
All notable changes to CLIMB-TRE mSCAPE APIs, data or interchange formats that have impact to users or other pipelines should be documented in this file. Changes described here may only be a subset of all changes to a project as this log concerns itself only with changes that impact how data is provided or consumed by users or other pipelines.
The following DIPI projects are routinely using this changelog:
Scylla-- ingest analysis pipelineRoz-- ingest managementOnyx-- metadata databaseOnyx-client-- API for interacting with metadata database
The format is based on Keep a Changelog.
Issues can be reported to the mSCAPE DIPI group.
2025-10-17¶
Onyx¶
Added¶
Alignment Results¶
- Added
alignment_resultstable. - Added
alignment_results.taxon_idfield. - Added
alignment_results.human_readablefield. - Added
alignment_results.unique_accessionfield. - Added
alignment_results.accession_descriptionfield. - Added
alignment_results.sequence_lengthfield. - Added
alignment_results.evenness_valuefield. - Added
alignment_results.mean_depthfield. - Added
alignment_results.coverage_1xfield. - Added
alignment_results.coverage10xfield. - Added
alignment_results.mapped_readsfield. - Added
alignment_results.uniquely_mapped_readsfield. - Added
alignment_results.mapped_basesfield. - Added
alignment_results.mean_read_identityfield. - Added
alignment_results.read_duplication_ratefield. - Added
alignment_results.forward_proportionfield. - Added
alignment_results.mean_alignment_lengthfield.
Sylph Results¶
- Added
sylph_resultstable. - Added
sylph_results.taxon_idfield. - Added
sylph_results.human_readablefield. - Added
sylph_results.gtdb_taxon_stringfield. - Added
sylph_results.gtdb_assembly_idfield. - Added
sylph_results.gtdb_contig_headerfield. - Added
sylph_results.taxonomic_abundancefield. - Added
sylph_results.sequence_abundancefield. - Added
sylph_results.adjusted_anifield. - Added
sylph_results.ani_confidence_intervalfield. - Added
sylph_results.effective_coveragefield. - Added
sylph_results.effective_coverage_confidence_intervalfield. - Added
sylph_results.median_kmer_covfield. - Added
sylph_results.mean_kmer_covfield. - Added
sylph_results.containment_indexfield. - Added
sylph_results.naive_anifield. - Added
sylph_results.kmers_reassignedfield.
2025-09-15¶
Onyx¶
Added¶
- Added
ucl(University College London)siteoption. - Added
ukhsamanc(UKHSA Manchester Lab)siteoption. - Added
ukhsabris(UKHSA Bristol Lab)siteoption.
2025-08-13¶
Onyx¶
Added¶
- Added
control_type_detailschoicebacillus_ms2phage, constrained by aninput_typeofpositive_control. - Added optional choice field
protocol_arm, with choicesbacterialandviral.
Scylla¶
Release 2.1.0
Changed¶
- Large speedup of all read extract scripts
- Per read quality scores are now based on mean rather than median
2025-08-05¶
Scylla¶
Release 2.0.3
Changed¶
- Resolve missing total_length.json when no taxa files output
2025-08-05¶
Onyx¶
Added¶
- Added
spike_inoptionbacillus_ms2phage.
Scylla¶
Release 2.0.2
Added¶
- Added
spike_inoptionbacillus_ms2phage.
Changed¶
- Changed reference to --local flag in README/tests for local running to -profile local (can be combined with docker using -profile local,docker)
2025-07-02¶
Onyx¶
Added¶
- Added
total_basesfield, for recording the number of bases in the input FASTQ file(s), before any filtering. - Added
taxa_files.total_basesfield, for recording the number of bases extracted for a taxa (assignable for each taxa within thetaxa_filesof a record).
Scylla¶
Release 2.0.1
Changed¶
- Change the exitcode for script which checks paired fastq files so that the pipeline doesn't fail loudly with mismatched headers
2025-05-08¶
Scylla¶
Released version 2.0.0. Given the number of changes, they are grouped by category rather than Added/Changed etc.
HCID changes¶
- Add
min_coverageparameter to HCID JSON - Update references in HCID JSON and reference file
- Update thresholds for HCID detection
- Drop requirement for classified reads at taxon/parent level for HCID to be detected (mapping sufficient)
- Output reads corresponding to HCIDs which have flagged a warning (NEW OUTPUT in
qc/<taxid>.reads.fq) - Output read stats for HCID reads to the warning JSON (
mapped_mean_qualityandmapped_mean_length) - Add coverage information for HCID found showing how many bases have coverage at each level - in HCID JSON
Extract taxa changes¶
- Reworked code to interact with kraken reports and assignment files during extract steps. Found a bug where some of the counts in the summary had previously been double counted (where both a S and S1 or S2 level taxa were extracted)
- Extract reads at different levels for different domains as specified by config (
Ffor Viruses,Gfor everything else) - Only extract reads at the specific level, not sublevels (e.g. S not S1 or S2)
- Add
total_lencalculated both for input and extracted output files in the summary JSON (NEW OUTPUTqc/total_length.json) - Make extraction percentages domain-specific (e.g. 1% of bacterial reads rather than 1% of classified reads) to fix zepto example
- To extract a taxon, needs to pass count threshold OR the percentage threshold (previously both) and increase the count threshold for bacteria to 500
Workflow changes¶
- Add workflow to reclassify the viral+unclassified fraction with a second database
- In the process, the parameters associated with kraken databases have been restructured. Replace
--k2_hostwith--kraken_database.default.host,--k2_portwith--kraken_database.default.port,--databasewith--kraken_database.default.pathanddatabase_setis nowkraken_database.default.name. This allows a second dictionary of kraken parameters forkraken_database.virusto be defined if/when necessary. - Add code to merge kraken assignment files, giving preference to second assignment file
- Add code to update kraken report, giving list of changes made to assignments
- Add a QC script to check the input file where a single fastq file is provided, so that it can warn if there are duplicate headers. This was seen in some example data and would cause big problems for the viral reclassification step when run, as read names need to be unique. If it finds duplicate/unexpectedly interleaved files, tries to correct them but then exists. The user can try rerunning with the fixed files. I considered silently handling but this approach seemed dangerous.
- Add messaging if paired reads provided and
--pairednot. - Add a workflow to run modules (use
--module <name>) and remove workflow definitions from within these modules - Add a warning for incorrect Phred parsing as this is thought to be a resolved issue
Nextflow changes¶
- set
docker.userEmulation = true
Other changes¶
- Add to README more helpful
- Review all local test commands and make sure they run as expected.
2025-03-31¶
Onyx¶
Added¶
- Added
nuth(Newcastle upon Tyne Hospitals NHS Foundation Trust) as an option in the mSCAPEsitefield.
2025-03-06¶
All¶
- Start of changelog