Skip to content

mSCAPE Changelog

All notable changes to CLIMB-TRE mSCAPE APIs, data or interchange formats that have impact to users or other pipelines should be documented in this file. Changes described here may only be a subset of all changes to a project as this log concerns itself only with changes that impact how data is provided or consumed by users or other pipelines.

The following DIPI projects are routinely using this changelog:

  • Scylla -- ingest analysis pipeline
  • Roz -- ingest management
  • Onyx -- metadata database
  • Onyx-client -- API for interacting with metadata database

The format is based on Keep a Changelog.

Issues can be reported to the mSCAPE DIPI group.


2025-10-17

Onyx

Added

Alignment Results
  • Added alignment_results table.
  • Added alignment_results.taxon_id field.
  • Added alignment_results.human_readable field.
  • Added alignment_results.unique_accession field.
  • Added alignment_results.accession_description field.
  • Added alignment_results.sequence_length field.
  • Added alignment_results.evenness_value field.
  • Added alignment_results.mean_depth field.
  • Added alignment_results.coverage_1x field.
  • Added alignment_results.coverage10x field.
  • Added alignment_results.mapped_reads field.
  • Added alignment_results.uniquely_mapped_reads field.
  • Added alignment_results.mapped_bases field.
  • Added alignment_results.mean_read_identity field.
  • Added alignment_results.read_duplication_rate field.
  • Added alignment_results.forward_proportion field.
  • Added alignment_results.mean_alignment_length field.
Sylph Results
  • Added sylph_results table.
  • Added sylph_results.taxon_id field.
  • Added sylph_results.human_readable field.
  • Added sylph_results.gtdb_taxon_string field.
  • Added sylph_results.gtdb_assembly_id field.
  • Added sylph_results.gtdb_contig_header field.
  • Added sylph_results.taxonomic_abundance field.
  • Added sylph_results.sequence_abundance field.
  • Added sylph_results.adjusted_ani field.
  • Added sylph_results.ani_confidence_interval field.
  • Added sylph_results.effective_coverage field.
  • Added sylph_results.effective_coverage_confidence_interval field.
  • Added sylph_results.median_kmer_cov field.
  • Added sylph_results.mean_kmer_cov field.
  • Added sylph_results.containment_index field.
  • Added sylph_results.naive_ani field.
  • Added sylph_results.kmers_reassigned field.

2025-09-15

Onyx

Added

  • Added ucl (University College London) site option.
  • Added ukhsamanc (UKHSA Manchester Lab) site option.
  • Added ukhsabris (UKHSA Bristol Lab) site option.

2025-08-13

Onyx

Added

  • Added control_type_details choice bacillus_ms2phage, constrained by an input_type of positive_control.
  • Added optional choice field protocol_arm, with choices bacterial and viral.

Scylla

Release 2.1.0

Changed

  • Large speedup of all read extract scripts
  • Per read quality scores are now based on mean rather than median

2025-08-05

Scylla

Release 2.0.3

Changed

  • Resolve missing total_length.json when no taxa files output

2025-08-05

Onyx

Added

  • Added spike_in option bacillus_ms2phage.

Scylla

Release 2.0.2

Added

  • Added spike_in option bacillus_ms2phage.

Changed

  • Changed reference to --local flag in README/tests for local running to -profile local (can be combined with docker using -profile local,docker)

2025-07-02

Onyx

Added

  • Added total_bases field, for recording the number of bases in the input FASTQ file(s), before any filtering.
  • Added taxa_files.total_bases field, for recording the number of bases extracted for a taxa (assignable for each taxa within the taxa_files of a record).

Scylla

Release 2.0.1

Changed

  • Change the exitcode for script which checks paired fastq files so that the pipeline doesn't fail loudly with mismatched headers

2025-05-08

Scylla

Released version 2.0.0. Given the number of changes, they are grouped by category rather than Added/Changed etc.

HCID changes

  • Add min_coverage parameter to HCID JSON
  • Update references in HCID JSON and reference file
  • Update thresholds for HCID detection
  • Drop requirement for classified reads at taxon/parent level for HCID to be detected (mapping sufficient)
  • Output reads corresponding to HCIDs which have flagged a warning (NEW OUTPUT in qc/<taxid>.reads.fq)
  • Output read stats for HCID reads to the warning JSON (mapped_mean_quality and mapped_mean_length)
  • Add coverage information for HCID found showing how many bases have coverage at each level - in HCID JSON

Extract taxa changes

  • Reworked code to interact with kraken reports and assignment files during extract steps. Found a bug where some of the counts in the summary had previously been double counted (where both a S and S1 or S2 level taxa were extracted)
  • Extract reads at different levels for different domains as specified by config (F for Viruses, G for everything else)
  • Only extract reads at the specific level, not sublevels (e.g. S not S1 or S2)
  • Add total_len calculated both for input and extracted output files in the summary JSON (NEW OUTPUT qc/total_length.json)
  • Make extraction percentages domain-specific (e.g. 1% of bacterial reads rather than 1% of classified reads) to fix zepto example
  • To extract a taxon, needs to pass count threshold OR the percentage threshold (previously both) and increase the count threshold for bacteria to 500

Workflow changes

  • Add workflow to reclassify the viral+unclassified fraction with a second database
  • In the process, the parameters associated with kraken databases have been restructured. Replace --k2_host with --kraken_database.default.host, --k2_port with --kraken_database.default.port, --database with --kraken_database.default.path and database_set is now kraken_database.default.name. This allows a second dictionary of kraken parameters for kraken_database.virus to be defined if/when necessary.
  • Add code to merge kraken assignment files, giving preference to second assignment file
  • Add code to update kraken report, giving list of changes made to assignments
  • Add a QC script to check the input file where a single fastq file is provided, so that it can warn if there are duplicate headers. This was seen in some example data and would cause big problems for the viral reclassification step when run, as read names need to be unique. If it finds duplicate/unexpectedly interleaved files, tries to correct them but then exists. The user can try rerunning with the fixed files. I considered silently handling but this approach seemed dangerous.
  • Add messaging if paired reads provided and --paired not.
  • Add a workflow to run modules (use --module <name>) and remove workflow definitions from within these modules
  • Add a warning for incorrect Phred parsing as this is thought to be a resolved issue

Nextflow changes

  • set docker.userEmulation = true

Other changes

  • Add to README more helpful
  • Review all local test commands and make sure they run as expected.

2025-03-31

Onyx

Added

  • Added nuth (Newcastle upon Tyne Hospitals NHS Foundation Trust) as an option in the mSCAPE site field.

2025-03-06

All

  • Start of changelog