PATH-SAFE Uploader Specification¶
Files to be provided¶
- A FASTQ 1 file containing the forward sequencing reads.
- A FASTQ 2 file containing the reverse sequencing reads.
- A CSV file containing the metadata associated with sequencing the sample.
File naming convention¶
The base filenames should be of the form
where:
[run_index]
is an identifier that is unique within a sequencing run, e.g. a sequencing barcode identifier, or a 96-well plate co-ordinate.[run_id]
is the name of the sequencing run as given by the supplier's sequencing instrument (not an internal identifier assigned by the supplier).[extension]
is the file extension indicating the file type.
File name extensions¶
The extensions ([extension]
) should be:
1.fastq.gz
for the forward FASTQ file.2.fastq.gz
for the reverse FASTQ file.csv
for the CSV metadata file.
Valid characters¶
The [run_index]
, [run_id]
and [extension]
must contain only:
- Letters (
A-Z
,a-z
). - Numbers (
0-9
). - Hyphens (
-
). - Underscores (
_
).
Metadata specification¶
Required fields¶
Field | Data type | Description | Restrictions |
---|---|---|---|
biosample_id |
text |
The sequencing providers identifier for a sample. | • Max length: 50 |
run_index |
text |
The sequencing provider's identifier for the position of a sample on a run. | • Max length: 50 |
run_id |
text |
The unique identifier assigned to the run by the sequencing instrument. | • Max length: 100 |
submitted_species |
choice |
The NCBI taxonomy id provided for the sample. | • Choices: 1639 , 28901 , 562 |
year |
integer |
Year of sample collected if available or year of sample receipt otherwise. | |
data_steward |
choice |
Laboratory, organisation or agency that hold the data for the sample. | • Choices: APHA , FSA , FSS , OTHER , PHS , PHW , SSSCDRL , UKHSA |
source_type |
choice |
Source of the sample. | • Choices: animal , animal_associated_environment , environment , food , food_associated_environment , human , human_associated_environment , missing , not_applicable , not_collected , not_provided , other , other_environment , restricted_access |
country |
choice |
The country that the sample was collected in, using ISO-3166-1 alpha-2 codes (https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes), unless within United Kingdom. If so, use ISO-3166-2:GB (https://en.wikipedia.org/wiki/ISO_3166-2:GB). | • Choices: GB , GB-ENG , GB-NIR , GB-SCT , GB-WLS |
sample_purpose |
choice |
The purpose of the sample collection. | • Choices: active_surveillance , not_applicable , not_collected , not_provided , other , outbreak_initiated_surveillance , outbreak_investigation , population_based_surveillance , research , restricted_access , routine_diagnostics , routine_surveillance |
Optional fields¶
Field | Data type | Description | Restrictions |
---|---|---|---|
biosample_source_id |
text |
Unique identifier for an individual to permit multiple samples from the same individual to be linked. | • Max length: 50 |
sample_accession |
text |
Sample accession number if sequence is publically available in SRA. | |
enterobase_barcode |
text |
Sample barcode if sequence is publically available in EnteroBase. | |
collection_date |
date |
Date of sample collection. | • Input formats: YYYY-MM • Output format: YYYY-MM |
receipt_date |
date |
Date of receipt of the sample. | • Input formats: YYYY-MM • Output format: YYYY-MM |
month |
integer |
Month of sample collected if available or month of receipt otherwise. | |
sequence_org |
choice |
Laboratory, organisation or agency the sample has been sequenced by. | • Choices: APHA , FSA , FSS , PHS , SSSCDRL , UKHSA |
sequence_org_other |
text |
Additional laboratory, organisation or agency the sample has been sequenced by. | • Requires: sequence_org |
data_steward_other |
text |
Additional laboratory, organisation or agency that hold the data for the sample. | • Required when data_steward is: OTHER |
county |
choice |
County that the sample was collected in, using the second level subdivision codes of ISO-3166-2:GB (https://www.iso.org/obp/ui/#iso:code:3166:GB). | • Requires: country • Choices: GB-ABC , GB-ABD , GB-ABE , GB-AGB , GB-AGY , GB-AND , GB-ANN , GB-ANS , GB-BAS , GB-BBD , GB-BCP , GB-BDF , GB-BDG , GB-BEN , GB-BEX , GB-BFS , GB-BGE , GB-BGW , GB-BIR , GB-BKM , ... |
sample_purpose_other |
text |
Additional purpose of the sample collection. | • Required when sample_purpose is: other |
sequencing_kit |
text |
The sequencing kit used. | |
library_kit |
text |
The library kit used to prep the sample. | |
is_multiplexed |
bool |
Whether the sample was multiplexed. |