Skip to content

PATH-SAFE Uploader Specification

Files to be provided

  • A FASTQ 1 file containing the forward sequencing reads.
  • A FASTQ 2 file containing the reverse sequencing reads.
  • A CSV file containing the metadata associated with sequencing the sample.

File naming convention

The base filenames should be of the form

pathsafe.[run_index].[run_id].[extension]

where:

  • [run_index] is an identifier that is unique within a sequencing run, e.g. a sequencing barcode identifier, or a 96-well plate co-ordinate.
  • [run_id] is the name of the sequencing run as given by the supplier's sequencing instrument (not an internal identifier assigned by the supplier).
  • [extension] is the file extension indicating the file type.

File name extensions

The extensions ([extension]) should be:

  • 1.fastq.gz for the forward FASTQ file.
  • 2.fastq.gz for the reverse FASTQ file.
  • csv for the CSV metadata file.

Valid characters

The [run_index], [run_id] and [extension] must contain only:

  • Letters (A-Z, a-z).
  • Numbers (0-9).
  • Hyphens (-).
  • Underscores (_).

Metadata specification

Required fields

Field                                         Data type Description Restrictions
biosample_id text The sequencing providers identifier for a sample. • Max length: 50
run_index text The sequencing provider's identifier for the position of a sample on a run. • Max length: 50
run_id text The unique identifier assigned to the run by the sequencing instrument. • Max length: 100
submitted_species choice The NCBI taxonomy id provided for the sample. • Choices: 1639, 28901, 562
year integer Year of sample collected if available or year of sample receipt otherwise.
data_steward choice Laboratory, organisation or agency that hold the data for the sample. • Choices: APHA, FSA, FSS, OTHER, PHS, PHW, SSSCDRL, UKHSA
source_type choice Source of the sample. • Choices: animal, animal_associated_environment, environment, food, food_associated_environment, human, human_associated_environment, missing, not_applicable, not_collected, not_provided, other, other_environment, restricted_access
country choice The country that the sample was collected in, using ISO-3166-1 alpha-2 codes (https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes), unless within United Kingdom. If so, use ISO-3166-2:GB (https://en.wikipedia.org/wiki/ISO_3166-2:GB). • Choices: GB, GB-ENG, GB-NIR, GB-SCT, GB-WLS
sample_purpose choice The purpose of the sample collection. • Choices: active_surveillance, not_applicable, not_collected, not_provided, other, outbreak_initiated_surveillance, outbreak_investigation, population_based_surveillance, research, restricted_access, routine_diagnostics, routine_surveillance

Optional fields

Field                                         Data type Description Restrictions
biosample_source_id text Unique identifier for an individual to permit multiple samples from the same individual to be linked. • Max length: 50
sample_accession text Sample accession number if sequence is publically available in SRA.
enterobase_barcode text Sample barcode if sequence is publically available in EnteroBase.
collection_date date Date of sample collection. • Input formats: YYYY-MM
• Output format: YYYY-MM
receipt_date date Date of receipt of the sample. • Input formats: YYYY-MM
• Output format: YYYY-MM
month integer Month of sample collected if available or month of receipt otherwise.
sequence_org choice Laboratory, organisation or agency the sample has been sequenced by. • Choices: APHA, FSA, FSS, PHS, SSSCDRL, UKHSA
sequence_org_other text Additional laboratory, organisation or agency the sample has been sequenced by. • Requires: sequence_org
data_steward_other text Additional laboratory, organisation or agency that hold the data for the sample. • Required when data_steward is: OTHER
county choice County that the sample was collected in, using the second level subdivision codes of ISO-3166-2:GB (https://www.iso.org/obp/ui/#iso:code:3166:GB). • Requires: country
• Choices: GB-ABC, GB-ABD, GB-ABE, GB-AGB, GB-AGY, GB-AND, GB-ANN, GB-ANS, GB-BAS, GB-BBD, GB-BCP, GB-BDF, GB-BDG, GB-BEN, GB-BEX, GB-BFS, GB-BGE, GB-BGW, GB-BIR, GB-BKM, ...
sample_purpose_other text Additional purpose of the sample collection. • Required when sample_purpose is: other
sequencing_kit text The sequencing kit used.
library_kit text The library kit used to prep the sample.
is_multiplexed bool Whether the sample was multiplexed.