climb_id |
text |
Unique identifier for a project record in Onyx. |
|
published_date |
date |
The date the project record was published in Onyx. |
• Output format: iso-8601 |
site |
choice |
The site or sequencing centre providing the data. |
• Choices: bham , gosh , gstt , public , ripl , uhs , ukhsa |
biosample_id |
text |
The sequencing provider's identifier for a sample. |
|
biosample_source_id |
text |
Unique identifier for an individual to permit multiple samples from the same individual to be linked. |
|
run_id |
text |
Unique identifier assigned to the run by the sequencing instrument. |
|
platform |
choice |
The platform used to sequence the data. |
• Choices: illumina , illumina.se , ont |
input_type |
choice |
The type of input sequenced. |
• Choices: community_standard , negative_control , positive_control , specimen , validation_material |
specimen_type_details |
choice |
Named control or standard for specimens. |
• Choices: asymptomatic , respiratory_infection |
control_type_details |
choice |
Named control or standard for positive and negative controls. |
• Choices: NIBSC_11/242 , NIBSC_20/170 , water_extraction_control , zepto_rp2.1 , zymo-mc_D6300 |
sample_source |
choice |
The source from which the sample was collected. |
• Choices: blood , environment , faecal , lower_respiratory , nose_and_throat , other , plasma , pleural_fluid , stool , tissue , upper_respiratory , urine |
sample_type |
choice |
The type of sampling method used. |
• Choices: aspirate , bal , biopsy , other , sputum , swab |
spike_in |
choice |
The type of spike-in used in the run. |
• Choices: ERCC-RNA_4456740 , ms2-phage , none , phix , tobacco_mosaic_virus , zymo_D6320 , zymo_D6321 |
spike_in_result |
choice |
Result assigned by scylla for the provided spike-in. |
• Choices: fail , partial , pass |
collection_date |
date |
The date the sample was collected. |
• Output format: YYYY-MM-DD |
received_date |
date |
The date the sample was received by the sequencing centre (if collection_date unavailable). |
• Output format: YYYY-MM-DD |
is_approximate_date |
bool |
The date is approximate e.g. the sample is from a public repository and it is unclear whether the date corresponds to collection or publishing. |
|
batch_id |
text |
Used to identify samples prepared in the same laboratory batch (e.g. extraction, library and/or sequencing). |
|
study_id |
text |
Used to identify study or if NHS residual sample. |
|
study_centre_id |
text |
Used to identify sequencing centre. |
|
sequence_purpose |
choice |
Used to differentiate between clinical or research studies. |
• Choices: clinical , research |
governance_status |
choice |
Did the patient consent to their sample being used for research purposes or not. |
• Choices: consented_for_research , no_consent_for_research , open |
iso_country |
choice |
Country that the sample was collected in, using ISO-3166-1 alpha-2 codes (https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2), unless within United Kingdom. If so, use ISO-3166-2:GB (https://en.wikipedia.org/wiki/ISO_3166-2:GB). |
• Choices: AD , AE , AF , AG , AI , AL , AM , AO , AQ , AR , AS , AT , AU , AW , AX , AZ , BA , BB , BD , BE , ... |
iso_region |
choice |
Region that the sample was collected in, using the second level subdivision codes of ISO-3166-2:GB (https://www.iso.org/obp/ui/#iso:code:3166:GB). |
• Choices: GB-ABC , GB-ABD , GB-ABE , GB-AGB , GB-AGY , GB-AND , GB-ANN , GB-ANS , GB-BAS , GB-BBD , GB-BCP , GB-BDF , GB-BDG , GB-BEN , GB-BEX , GB-BFS , GB-BGE , GB-BGW , GB-BIR , GB-BKM , ... |
extraction_enrichment_protocol |
text |
Details of nucleic acid extraction and optional enrichment steps. |
|
library_protocol |
text |
Details of sequencing library construction. |
|
sequencing_protocol |
text |
Details of sequencing. |
|
bioinformatics_protocol |
text |
Detail of initial bioinformatics protocol, for example versions of basecalling software and models used, any read quality filtering/trimming employed. |
|
dehumanisation_protocol |
text |
Details of bioinformatics method used for human read removal. |
|
is_public_dataset |
bool |
The sample is from a public dataset. Please only set this after it has been made public. |
|
public_database_name |
choice |
The public repository where the data is. |
• Choices: ENA , SRA |
public_database_accession |
text |
The accession for the data in the public database. |
|
ingest_report |
text |
HTML report summarising the read profile and taxa identified. |
|
taxon_reports |
text |
Folder of all classification output files. |
|
human_filtered_reads_1 |
text |
Compressed FASTQ of input reads that have been filtered for human reads. |
|
human_filtered_reads_2 |
text |
Compressed FASTQ of input reads that have been filtered for human reads. |
|
unclassified_reads_1 |
text |
Compressed FASTQ of input reads which could not be classified. |
|
unclassified_reads_2 |
text |
Compressed FASTQ of input reads which could not be classified. |
|
viral_reads_1 |
text |
Compressed FASTQ of input reads which were classified as viral. |
|
viral_reads_2 |
text |
Compressed FASTQ of input reads which were classified as viral. |
|
viral_and_unclassified_reads_1 |
text |
Compressed FASTQ of input reads which were classified as viral or were unclassified. |
|
viral_and_unclassified_reads_2 |
text |
Compressed FASTQ of input reads which were classified as viral or were unclassified. |
|
classifier |
choice |
The classifier used. |
• Choices: Kraken2 |
classifier_version |
text |
Version of the classifier used. |
|
classifier_db |
choice |
Database used for read classification. |
• Choices: PlusPF |
classifier_db_date |
date |
Date classifier database was produced. |
• Output format: YYYY-MM-DD |
ncbi_taxonomy_date |
date |
Date that the NCBI taxonomy dump was produced. |
• Output format: YYYY-MM-DD |
scylla_version |
text |
Version of the scylla pipeline used. |
|
taxa_files |
relation |
Table of all species level taxa extracted. |
|
taxa_files.taxon_id |
integer |
The NCBI taxonomy id associated with the taxa. |
|
taxa_files.human_readable |
text |
A human readable name for the taxa. |
|
taxa_files.n_reads |
integer |
The number of reads extracted for the taxa. |
|
taxa_files.avg_quality |
decimal |
The mean quality of reads extracted for the taxa. |
|
taxa_files.mean_len |
decimal |
The mean length of reads extracted for the taxa. |
|
taxa_files.rank |
choice |
The rank of the taxa. |
• Choices: C , D , F , G , K , O , P , R , S , U |
taxa_files.fastq_1 |
text |
Compressed FASTQ of extracted reads for the taxa. |
|
taxa_files.fastq_2 |
text |
Compressed FASTQ of extracted reads for the taxa. |
|
classifier_calls |
relation |
Table summarising the NCBI taxonomy ids, counts and ranks of all taxa found by the classifier. |
|
classifier_calls.taxon_id |
integer |
The NCBI taxonomy id associated with the taxa. |
|
classifier_calls.human_readable |
text |
A human readable name for the taxa. |
|
classifier_calls.percentage |
decimal |
The percentage of the (dehumanised) sample that the taxa represents. |
|
classifier_calls.count_descendants |
integer |
The number of reads mapping to this taxa and all descendant taxa. |
|
classifier_calls.count_direct |
integer |
The number of reads mapping directly to the taxa. |
|
classifier_calls.rank |
choice |
The rank of the taxa. |
• Choices: C , D , F , G , K , O , P , R , S , U |
classifier_calls.raw_rank |
text |
The rank of the taxa including an intermediate grading. |
|
classifier_calls.is_spike_in |
bool |
The taxa is a spike-in. |
|
spike_in_info |
relation |
Table containing taxonomic results found for the provided spike-in. |
|
spike_in_info.taxon_id |
integer |
The NCBI taxonomy id associated with the taxa. |
|
spike_in_info.human_readable |
text |
A human readable name for the taxa. |
|
spike_in_info.reference_header |
text |
Reference header for the individual sequence within the provided spike-in. |
|
spike_in_info.mapped_count |
integer |
Number of reads which aligned to a reference sequence for the provided spike-in. |
|