Manual

COMBATdb is a database that contains the multiomics datasets and analysis generated from the COVID-19 Multi-omic Blood Atlas (COMBAT) project (https://www.combat.ox.ac.uk/) that defines the molecular signatures associated with the pathogenesis and severity of COVID-19 disease. It comprises the metadata and multiomics molecular data, and provides the calculation and visualisation of primary and integrative data.

1 Home

Home page provides the summary of the major components and functionalities of COMBATdb as well as the diagram of the layout and structure of the database.

Homepage

2 Browse

Browse module offers the entry point for the users to access the metadata and modality data and their sophisticated interrelationships including modalities, participants, participant timepoints, samples, sources, genes and cell types.

Browse module

2.1 Modalities

This section enables the users to browse all modalities in order to show the participants, participant timepoints, samples and sources associated with each modality.

Browse all modalities

The first column in the table refers to the modality name and clicking on this can lead to the details of the specific modality including modality ID, modality name, category and level.

Browse the details for modality Bulk RNA-Seq

The numbers under the "Participants", "Participant Timepoints", "Samples" and "Sources" indicates the number of participants, participant timepoints, samples and sources for each modality. Clicking on the number under the "Participants" column, the users will see a list of participants that have data for this particular modality.

Browse participants for modality Bulk RNA-Seq

Clicking on the number under the "Participant Timepoints" column, the users will be able to check a list of participant timepoints that have data for this particular modality.

Browse participant timepoints for modality Bulk RNA-Seq

Clicking on the number under the "Samples" column, the users will be able to examine a list of samples that have data for this particular modality.

Browse samples for modality Bulk RNA-Seq

Clicking on the number under the "Sources" column, the new page gives a list of sources that have data for this particular modality.

Browse sources for modality Bulk RNA-Seq

2.2 Participants

This section enables the users to browse all participants in order to show the modalities, participant timepoints, samples, sources as well as individual modality data and multiomics data associated with each participant.

Browse all participants

If clicking on the link on the participant ID, the new page will provide detailed information for that participant including participant ID, age range and sex.

Browse the details for particpant G04225

If clicking on the number under the "Modalities Applied" column for participant G04225, the users can see one modality "Proteomics: timsTOF" where the data are available for this participant.

Browse modalities for particpant G04225

The table shows the participant timepoints that are derived from participant "G04225".

Browse participant timepoints for particpant G04225

Similarly, after clicking on the number under the "Samples" column, a list of samples associated with participant G04225 will be shown on a new page.

Browse samples for particpant G04225

The clickable number under the "Sources" column for participant G04225 will lead to a new page showing that samples categorized as "COVID_community" are available from this participant.

Browse sources for particpant G04225

All the available individual modality data for each participant are shown on each row across 13 modality columns. In addition, clicking on the link under the "Multiomics Data" column for participant G04225 will display data from across modalities for this participant of interest.

Browse data for particpant G04225

2.3 Participant Timepoints

This section enables the users to browse all participant timepoints in order to show the modalities, samples as well as individual modality data and multiomics data associated with each participant timepoint.

Browse all participant timepoints

The participant timepoint ID is clickable and will result in a new page where the details of participant are provided, including participant timepoint ID, participant ID, sample priority, sampled at max severity, timepoint and source.

Browse the details for participant timepoint G04225-Ja028

The "Modalities Applied" column will provide the modalities that have data for a specific participant timepoint.

Browse modalities for participant timepoint G04225-Ja028

"Samples" column will be able to show a list of samples that are derived from a specific participant timepoint.

Browse samples for participant timepoint G04225-Ja028

All the available individual modality data for each participant timepoint are shown on each row across 13 modality columns. Besides, clicking on the link under the "Multiomics Data" column allows one to examine all data from various different modalities for a participant timepoint of interest.

Browse data for participant timepoint G04225-Ja028

2.4 Samples

This section enables the users to browse all samples in order to show the modalities as well as individual modality data and multiomics data associated with each sample.

Browse all samples

Clicking on any entry under "ID" column will present the details for each sample, including participant timepoint ID, participant ID, cohort, patient ID, site, sub sample ID, sample type, derived sample type, processing site and batch letter.

Browse the details for sample G04225-Ja028E-PMCda

Furthermore, the items on the right column are clickable. For example, clicking on "Whole Blood (EDTA)" will provide all of the samples that have this sample type.

Browse all samples for sample type Whole Blood (EDTA)

The "Modalities Applied" column will help provide modality information for the samples.

Browse modalities for sample G04225-Ja028E-PMCda

The "Modality 1" - "Modality 13" columns provide single modality data for the samples. For example, clicking on the "Flow Cytometry: FACS" under "Modality 5" column for sample "G05061-Ja005E-PBCa" leads to the page below.

Browse modalities for sample G04225-Ja028E-PMCda

Lastly, "Multiomics Data" column will facilitate the examination of the primary analysis of multiomics data for the samples.

Browse data for sample G04225-Ja028E-PMCda

2.5 Sources

This section enables the users to browse all sources in order to show the modalities, participants, participant timepoints and samples associated with each source.

Browse all sources

After clicking on the items under the "ID" column, more details will be shown for the source category including source ID, name and short name.

Browse the details for source Healthy

"Modalities Applied" column will show the modalities that have data from this source category.

Browse modalities for source Healthy

Similarly, users will be able to find out the participants, participant timepoints and samples that are assigned to this specific source category.

Browse participants for source Healthy

Browse participant timepoints for source Healthy

Browse samples for source Healthy

2.6 Genes

This section enables the users to browse all genes as well as single modality data and multiomics data associated with each gene.

Browse all genes

Here is the table that contains the details of the gene ENSG00000000003 after clicking on "ENSG00000000003" text under the "Gene ID" column.

Browse the details for gene ENSG00000000003

This table shows the single modality data for a gene.

Browse samples for gene ENSG00000000003

This is an example of the multiomics data for gene "ENSG00000000971".

Browse cell types for gene ENSG00000000003 and sample G05061-Ja005E-PBCa from modality CITE-Seq: GEX Pseudobulk

2.7 Cell Types

This section enables the users to browse all cell types in order to show the participant timepoints and samples associated with each cell type as well as cell type related multiomics data.

Browse all cell types

The table below provides a list of samples for resolution "Cell Type" and Cell Type "B".

Browse samples for resolution Cell Type and cell type B

Here is the table showing the multiomics data for resolution "Cell Type", Cell Type "B" and sample "G05061-Ja005E-PBCa".

Browse data for resolution Cell Type, cell type B and sample G05061-Ja005E-PBCa

3 Search

Search module offers the capability to search for specific terms such as modalities, participants, participant timepoints, samples, sources, genes and cell types. There are four types of functionalities including "single mode - metadata", "multiple mode - metadata", "customisable mode - metadata" and "single mode - modality data".

Search module

3.1 Single Mode - Metadata

This section allows one to search for the details of single modality, participant, participant timepoint, sample, source, gene and cell type. Besides, "filter by all" will return the results for all possible types of data.

Options for single mode - metadata

Here is an example of term = "G05061-Ja005E-PBCa" and Filter by = "Sample".

An input example for single mode - metadata

The search results return the detailed information for the sample "G05061-Ja005E-PBCa".

Search results for single mode - metadata

3.2 Multiple Mode - Metadata

This section allows one to search for the details of multiple modalities, participants, participant timepoints, samples, sources, genes and cell types. Besides, "filter by all" will return the results for all possible types of data.

Options for multiple mode - metadata

In this example, the input is five gene IDs or gene names and Filter by is "Gene".

An input example for multiple mode - metadata

The search results present the detailed annotations for each of the five query genes.

Search results for multiple mode - metadata

3.3 Customisable Mode - Metadata

This section provides the capability to generate a list of participants, participant timepoints or samples based on the user-defined modality and source.

Options for customisable mode

In this example, modality = "Bulk RNA-Seq", source = "COVID_IP_critical" and Filter by = "Sample".

An input example for customisable mode

The results return a list of samples for modality "Bulk RNA-Seq" and source "COVID_IP_critical".

Search results for customisable mode

3.4 Single Mode - Multiomics Data

This section presents the multiomics data for user-provided single participant, participant timepoint, sample, gene and cell type.

Options for single mode - modality data

In this example, type "ERYTH" in the text box, choose "Cell type" for Filter by and press the "Search" button.

An input example for single mode - modality data

The first level page provides a number of modalities for users to choose from.

The first level page

For "CITE-Seq: Composition" modality, the table presents the modality data such as "Count", "Total" and "Percentage (%)".

The second level page for CITE-Seq: Composition modality

For "CITE-Seq: GEX Pseudobulk" modality, the second level page lists all genes.

The second level page for CITE-Seq: GEX Pseudobulk modality

The third level page shows the gene expression values such as "Count", "Residual" and "RPM" for gene "ENSG00000000003", resolution "Cell Type" and cell type "ERYTH".

The third level page

Type in CXCL10 in the box and filter by gene and press the search button.

Search for gene input

The new page will present the multiomics data for CXCL10 gene.

Search for gene output - bulk RNA-Seq

Search for gene output - proteomicsL Luminex (plasma)

Search for gene output - proteomicsL Luminex (serum)

4 Primary

Primary module houses the processed and normalised matrix data for each modality in the database including "Bulk RNA-Seq", "Proteomics: timsTOF", "Proteomics: Luminex (plasma)", "Proteomics: Luminex (serum)", "Flow Cytometry: FACS", "Mass Cytometry: CyTOF (global panel)", "Mass Cytometry: CyTOF (granulocyte panel)", "CITE-Seq: GEX Pseudobulk", "CITE-Seq: Composition", "Bulk Repertoire: BCR", "Bulk Repertoire: TCR", "CITE-Seq: Single-cell BCR" and "CITE-Seq: Single-cell TCR".

Primary module

4.1 Bulk RNA-Seq

This section provides the gene expression data for each gene and each sample from "Bulk RNA-Seq" modality.

Browse all genes from modality Bulk RNA-Seq

The table displays the gene expression values such as "Count" and "Logcpm" for gene "ENSG00000000003" for all samples.

Browse samples for gene ENSG00000000003 and modality Bulk RNA-Seq

4.2 Proteomics: timsTOF

This section provides the protein abundance data for each protein and each sample from "Proteomics: timsTOF" modality.

Browse all proteins for modality Proteomics: timsTOF

The protein annotation information is available through the hyperlinks on the "Protein ID".

Browse the details for protein P04217 for modality Proteomics: timsTOF

This table consists of protein ID, gene name, entry name, sample ID and intensity.

Browse samples for protein P04217 from modality Proteomics: timsTOF

4.3 Proteomics: Luminex (plasma)

This section provides protein abundance data for each protein and each sample from "Proteomics: Luminex (plasma)" modality.

Browse all proteins from modality Proteomics: Luminex (plasma)

This table provides additional information for the proteins, including "Protein ID", "Gene Name" and "Full Name".

Browse the details for protein P55774 for modality Proteomics: Luminex (plasma)

Two types of protein abundance measurements such as "Fluorescence Intensity" and "Concentration" can be found in this table.

Browse samples for protein P55774 from modality Proteomics: Luminex (plasma)

4.4 Proteomics: Luminex (serum)

This section provides protein abundance data for each protein and each sample from "Proteomics: Luminex (serum)" modality.

Browse all proteins from modality Proteomics: Luminex (serum)

Browse samples for protein P55774 from modality Proteomics: Luminex (serum)

4.5 Flow Cytometry: FACS

This section provides cell frequency data for each cell type and each sample from "Flow Cytometry: FACS" modality.

Browse all data for modality Flow Cytometry: FACS

The table below presents the cell frequency data for cell population "Myeloid and B cell populations" and cell type "CD19".

Browse samples for population Myeloid and B cell populations and cell type CD19 from modality Flow Cytometry: FACS

4.6 Mass Cytometry: CyTOF (global panel)

This section provides cell frequency data for each cell type and each sample from "Mass Cytometry: CyTOF (global panel)" modality.

Browse all data for modality Mass Cytometry: CyTOF (global panel)

This table presents the cell frequency data for population "Depleted B Cells" and cell type "Naive B cells".

Browse samples for population Depleted B Cells and cell type Naive B cells from modality Mass Cytometry: CyTOF (global panel)

4.7 Mass Cytometry: CyTOF (granulocyte panel)

This section provides cell marker expression data for each cell type and each sample from "Mass Cytometry: CyTOF (granulocyte panel)" modality.

Browse all data for modality Mass Cytometry: CyTOF (granulocyte panel)

This table provides cell marker expression data for population "Monocyte Markers" and cell type "CD45".

Browse samples for population Monocyte Markers and cell type CD45 from modality Mass Cytometry: CyTOF (granulocyte panel)

4.8 CITE-Seq: GEX Pseudobulk

This section provides gene expression data for each gene, each cell type and each sample in three different levels of resolution of cell types from "CITE-Seq: GEX Pseudobulk" modality.

Browse all genes from modality CITE-Seq: GEX Pseudobulk

This page provides all the cell types for the users to choose from.

Browse cell types for gene ENSG00000000003 and modality CITE-Seq: GEX Pseudobulk

This table shows three different types of gene expression values such as "Count", "Residual" and "RPM".

Browse samples for gene ENSG00000000003, resolution Cell Type, cell type HSC and modality CITE-Seq: GEX Pseudobulk

4.9 CITE-Seq: Composition

This section provides cell count and cell frequency data for each cell type and each sample in three different levels of resolution of cell types from "CITE-Seq: Composition" modality.

Browse all cell types from modality CITE-Seq: Composition

In this page, users will be able to view the results from this modality such as "Count", "Total" and "Percentage (%)".

Browse samples for resolution Minor Cell Subset, cell type B.INT and modality CITE-Seq: Composition

4.10 Bulk Repertoire: BCR

This section provides BCR metrics data for each sample from "Bulk Repertoire: BCR" modality such as "Percentage Unmutated Expanded" and "Percentage Unmutated Unexpanded".

Browse all samples from modality Bulk Repertoire: BCR

4.11 Bulk Repertoire: TCR

This section provides TCR metrics data for each sample from "Bulk Repertoire: TCR" modality such as "Vertex Gini Index All", "V Gene Replacement Frequency All" and "D5 All".

Browse all samples from modality Bulk Repertoire: TCR

4.12 CITE-Seq: Single-cell BCR

This section provides chain annotation data for each locus category, each sample and each contig from "CITE-Seq: Single-cell BCR" modality.

Browse all locus categories from modality CITE-Seq: Single-cell BCR

The next page will provide a list of samples.

Browse samples for Locus HC Heavy, Contig QC HC singleton, Locus LC Light and Contig QC LC singleton from CITE-Seq: Single-cell BCR

The final page shows the contig information.

Browse contigs for Locus HC Heavy, Contig QC HC singleton, Locus LC Light and Contig QC LC singleton for sample G05061-Ja005E-PBCa from CITE-Seq: Single-cell BCR

4.13 CITE-Seq: Single-cell TCR

This section provides the chain annotation data for each locus, each sample and each clone from "CITE-Seq: Single-cell TCR" modality.

Browse all locus categories from modality CITE-Seq: Single-cell TCR

A list of samples is shown on this page.

Browse samples for Chain Composition single_alpha from modality CITE-Seq: Single-cell TCR

This page provides information about each clone.

Browse clones for Chain Composition single_alpha for sample G05061-Ja005E-PBCa from modality CITESeq: Single-cell TCR

5 Integrate

Integrate module displays the integrative analysis of across modality data including "Bulk RNA-Seq", "Proteomics: timsTOF", "Proteomics: Luminex", "Mass Cytometry: CyTOF", "CITE-Seq: GEX Pseudobulk" and "CITE-Seq: Composition".

Integrate module

5.1 Tensor Decomposition

This section displays the results generated from tensor decomposition analysis of multiomics datasets such as loading scores and posterior inclusion probabilities. Data type includes "Cell types for expression", "CITE-Seq cell types", "CyTOF depleted cell types", "CyTOF nondepleted cell types", "Genes", "Luminex proteins", "Samples" and "timsTOF proteins". "All" covers all the above-mentioned data types.

Browse all data types for modality Tensor Decomposition

This table lists all components.

Browse components for data type CITE-Seq cell types from modality Tensor Decomposition

This table provides loading scores and posterior inclusion probabilities for data type "CITE-Seq cell types", component "1" and a number of features.

Browse features for data type CITE-Seq cell types and component 1 from modality Tensor Decomposition

6 Compare

Compare module displays the results of differential abundance analysis between the selected comparator groups and provides the statistical measurements such as fold change and FDR for the modalities such as "Bulk RNA-Seq", "Proteomics: timsTOF", "Proteomics: Luminex (plasma)", "Proteomics: Luminex (serum)", "Flow Cytometry: FACS", "Mass Cytometry: CyTOF (global panel)", "CITE-Seq: GEX Pseudobulk", "CITE-Seq: Composition" and "scATAC-Seq".

Compare module

6.1 Bulk RNA-Seq

This page asks the users to choose comparator group 1, comparator group 2 and data type.

Options for modality Bulk RNA-Seq

For instance, the users have chosen "COVID_IP_critical" for comparator group 1, "COVID_IP_mild" for comparator group 2 and "Gene" for data type on the following page.

An input example for modality Bulk RNA-Seq

Once pressing the search button, a new page will return the differential gene expression table with gene ID, gene name, AveExpr, logFC, P-value and FDR under the cut-offs of FDR < 0.05 and fold change > 1.5.

A result example for modality Bulk RNA-Seq

6.2 Proteomics: timsTOF

This section allows the users to choose three options such as comparator group 1, comparator group 2 and data type and presents the table of differentially abundant proteins with protein ID, gene name, logFC, AveExpr, t, P-value and FDR.

Options for modality Proteomics: timsTOF

An input example for modality Proteomics: timsTOF

A result example for modality Proteomics: timsTOF

6.3 Proteomics: Luminex (plasma)

Options for modality Proteomics: Luminex (plasma)

An input example for modality Proteomics: Luminex (plasma)

A result example for modality Proteomics: Luminex (plasma)

6.4 Proteomics: Luminex (serum)

Options for modality Proteomics: Luminex (serum)

An input example for modality Proteomics: Luminex (serum)

A result example for modality Proteomics: Luminex (serum)

6.5 Flow Cytometry: FACS

This section allows the users to choose three options such as comparator group 1, comparator group 2 and data type and presents the table of differentially abundant clusters with cluster name, logFC, P-value and FDR.

Options for modality Flow Cytometry: FACS

An input example for modality Flow Cytometry: FACS

A result example for modality Flow Cytometry: FACS

6.6 Mass Cytometry: CyTOF (global panel)

This section allows the users to choose three options such as comparator group 1, comparator group 2 and cell population and presents the table of differentially abundant clusters with cluster name, logFC, P-value and FDR.

Options for modality Mass Cytometry: CyTOF (global panel)

An input example for modality Mass Cytometry: CyTOF (global panel)

A result example for modality Mass Cytometry: CyTOF (global panel)

6.7 CITE-Seq: GEX Pseudobulk

This section allows the users to choose four options such as comparator group 1, comparator group 2, cell group resolution and data type and presents the table of differentially expressed genes with gene ID, gene name, logFC, logCPM, LR, P-value and FDR.

Options for modality CITE-Seq: GEX Pseudobulk

An input example for modality CITE-Seq: GEX Pseudobulk

A result example for modality CITE-Seq: GEX Pseudobulk

6.8 CITE-Seq: Composition

This section allows the users to choose four options such as comparator group 1, comparator group 2, cell group resolution and data type and presents the table of differentially abundant cell clusters with cell population name, logFC, logCPM, F, P-value and FDR.

Options for modality CITE-Seq: Composition

An input example for modality CITE-Seq: Composition

A result example for modality CITE-Seq: Composition

6.9 ScATAC-Seq

This section allows the users to choose three options such as comparator group 1, comparator group 2 and cell type and presents the table of differentially accessible peaks with chromosome name, start, end, MeanDiff, log2FC and FDR.

Options for modality ScATAC-Seq

An input example for modality ScATAC-Seq

A result example for modality ScATAC-Seq

7 Rshiny

RShiny module offers the functionality of performing the differential abundance analysis and producing the tables and plots based on the user-provided options for modalities including "Bulk RNA-Seq", "Proteomics: timsTOF", "Proteomics: Luminex (plasma)", "Proteomics: Luminex (serum)", "Flow Cytometry: FACS", "Mass Cytometry: CyTOF (global panel)", "CITE-Seq: GEX Pseudobulk", "CITE-Seq: Composition", "Bulk Repertoire: BCR", "Bulk Repertoire: TCR" and "CITE-Seq: Single-cell TCR".

RShiny module

7.1 Bulk RNA-Seq

This app accepts options including first comparator group (case group), second comparator group (control group), sample inclusion strategy, covariates to adjust for, FDR threshold and fold change threshold. After conducting the differential gene expression analysis using limma package for the comparison between two of the source groups such as "Healthy", "COVID-19 (mild)", "COVID-19 (severe)", "COVID-19 (critical)", "COVID-19 (community)" and "Sepsis", the results are in the format of volcano plots, tables and boxplots. In particular, the main results are downloadable through the download buttons surrounding the tables and graphs. Once any of the options have been changed, the results affected will be automatically refreshed to reflect the updated parameters. Besides, there are two additional options such as a gene name and a covariate for the generation of boxplots.

Options from the differential gene expression analysis for the bulk RNA-Seq modality

Volcano plot and differential expression table from the differential gene expression analysis for the bulk RNA-Seq modality

Boxplot from the differential gene expression analysis for the bulk RNA-Seq modality

7.2 Proteomics: timsTOF

Options from the differential abundance analysis for the proteomics: timsTOF modality

Volcano plot and differential abundance table from differential abundance analysis for the proteomics: timsTOF modality

Boxplot from differential abundance analysis for the proteomics: timsTOF modality

7.3 Proteomics: Luminex (plasma)

This app accepts options including first comparator group (case group), second comparator group (control group), sample inclusion strategy, covariates to adjust for, P-value threshold and fold change threshold. After conducting the differential protein abundance analysis using t-test for the comparison between two of the source groups such as "Healthy", "COVID-19 (mild)", "COVID-19 (severe)", "COVID-19 (critical)", "COVID-19 (community)" and "Sepsis", the results are in the format of volcano plots, tables and boxplots.

Options from the differential abundance analysis for the proteomics: Luminex (plasma) modality

Volcano plot and differential expression table from differential abundance analysis for the the proteomics: Luminex (plasma) modality

Boxplot from differential abundance analysis for the the proteomics: Luminex (plasma) modality

7.4 Proteomics: Luminex (serum)

This app accepts options including first comparator group (case group), second comparator group (control group), sample inclusion strategy, covariates to adjust for, P-value threshold and fold change threshold. After conducting the differential protein abundance analysis using t-test for the comparison between two of the source groups such as "Healthy", "COVID-19 (critical)" and "Flu", the results are in the format of volcano plots, tables and boxplots.

Options from the differential abundance analysis for the proteomics: Luminex (serum)

Volcano plot and differential expression table from differential abundance analysis for the proteomics: Luminex (serum)

Boxplot from differential abundance analysis for the proteomics: Luminex (serum)

7.5 Flow Cytometry: FACS

Options from the differential abundance analysis for the flow Cytometry: FACS modality

Volcano plot and differential expression table from differential abundance analysis for the flow Cytometry: FACS modality

Boxplot from differential abundance analysis for the flow Cytometry: FACS modality

7.6 Mass Cytometry: CyTOF (global panel)

Options from the differential abundance analysis for the mass Cytometry: CyTOF (global panel) modality

Volcano plot and differential expression table from differential abundance analysis for the mass Cytometry: CyTOF (global panel) modality

Boxplot from differential abundance analysis for the mass Cytometry: CyTOF (global panel) modality

7.7 CITE-Seq: GEX Pseudobulk

This app accepts options including first comparator group (case group), second comparator group (control group), sample inclusion strategy, covariates to adjust for, FDR threshold, fold change threshold, cell group resolution and a cell cluster name. After conducting the differential gene expression analysis using edgeR package for the comparison between two of the source groups such as "Healthy", "COVID-19 (mild)", "COVID-19 (severe)", "COVID-19 (critical)", "COVID-19 (community)", "Flu" and "Sepsis", the results are in the format of volcano plots, tables and boxplots.

Options from the differential abundance analysis for the CITE-Seq: GEX Pseudobulk modality

Volcano plot and differential expression table from differential abundance analysis for the CITE-Seq: GEX Pseudobulk modality

Boxplot from differential abundance analysis for the CITE-Seq: GEX Pseudobulk modality

7.8 CITE-Seq: Composition

Options from the differential abundance analysis for the CITE-Seq: Composition modality

Volcano plot and differential expression table from differential abundance analysis for the CITE-Seq: Composition modality

Boxplot from differential abundance analysis for the CITE-Seq: Composition modality

7.9 Bulk Repertoire: BCR

This app accepts options including first comparator group (case group), second comparator group (control group), sample inclusion strategy, metric, P-value threshold and fold change threshold. After conducting the differential abundance analysis using Wilcoxon test for the comparison between two of the source groups such as "Healthy", "COVID-19 (mild)", "COVID-19 (severe)", "COVID-19 (critical)", "COVID-19 (community)" and "Sepsis", the results are in the format of volcano plots, tables and boxplots.

Options from the differential abundance analysis for the Bulk Repertoire: BCR modality

Volcano plot and differential expression table from differential abundance analysis for the Bulk Repertoire: BCR modality

Boxplot from differential abundance analysis for the Bulk Repertoire: BCR modality

7.10 Bulk Repertoire: TCR

Options from the differential abundance analysis for the Bulk Repertoire: TCR modality

Volcano plot and differential expression table from differential abundance analysis for the Bulk Repertoire: TCR modality

Boxplot from differential abundance analysis for the Bulk Repertoire: TCR modality

7.11 CITE-Seq: Single-cell TCR

This app accepts options including first comparator group (case group), second comparator group (control group), sample inclusion strategy, metric, P-value threshold and fold change threshold. After conducting the differential abundance analysis using Wilcoxon test for the comparison between two of the source groups such as "Healthy", "COVID-19 (mild)", "COVID-19 (severe)", "COVID-19 (critical)", "COVID-19 (community)", "Flu" and "Sepsis", the results are in the format of volcano plots, tables and boxplots.

Options from the differential abundance analysis for the CITE-Seq: Single-cell modality

Volcano plot and differential expression table from differential abundance analysis for the CITE-Seq: Single-cell modality

Boxplot from differential abundance analysis for the CITE-Seq: Single-cell modality

8 Visualisation

Visualisation module enables data mining and visualisation for modalities such as "Bulk RNA-Seq", "Proteomics: timsTOF", "Proteomics: Luminex (plasma)", "Proteomics: Luminex (serum)", "Flow Cytometry: FACS", "Mass Cytometry: CyTOF (global panel)", "CITE-Seq: GEX Pseudobulk", "CITE-Seq: Composition" and "Tensor Decomposition".

Visualisation module

8.1 Bulk RNA-Seq

This app accepts options including X axis, Y axis, a covariate and sample inclusion strategy. The results are in the format of PCA plots, correlation plots, PCA loadings plots and heatmaps.

Options from the data visulisation for the bulk RNA-Seq modality

PCA plot and correlation plot from the data visulisation for the bulk RNA-Seq modality

PCA loadings plot and heatmap from the data visulisation for the bulk RNA-Seq modality

8.2 Proteomics: timsTOF

This app accepts options including X axis, Y axis, a covariate and sample inclusion strategy. The results are in the format of PCA plots, correlation plots, PCA loadings plots and heatmaps.

Options from the data visulisation for the proteomics: timsTOF modality

PCA plot and correlation plot from the data visulisation for the proteomics: timsTOF modality

PCA loadings plot and heatmap from the data visulisation for the proteomics: timsTOF modality

8.3 Proteomics: Luminex (plasma)

This app accepts options including X axis, Y axis, a covariate and sample inclusion strategy. The results are in the format of PCA plots, correlation plots, PCA loadings plots and heatmaps.

Options from the data visulisation for the proteomics: Luminex (plasma) modality

PCA plot and correlation plot from the data visulisation for the proteomics: Luminex (plasma) modality

PCA loadings plot and heatmap from the data visulisation for the proteomics: Luminex (plasma) modality

8.4 Proteomics: Luminex (serum)

This app accepts options including X axis, Y axis, a covariate and sample inclusion strategy. The results are in the format of PCA plots, correlation plots, PCA loadings plots and heatmaps.

Options from the data visulisation for the proteomics: Luminex (serum) modality

PCA plot and correlation plot from the data visulisation for the proteomics: Luminex (serum) modality

PCA loadings plot and heatmap from the data visulisation for the proteomics: Luminex (serum) modality

8.5 Flow Cytometry: FACS

This app accepts options including X axis, Y axis, a covariate and sample inclusion strategy. The results are in the format of PCA plots, PCA loadings plots and heatmaps.

Options from the data visulisation for the flow Cytometry: FACS modality

PCA plot and correlation plot from the data visulisation for the flow Cytometry: FACS modality

PCA loadings plot and heatmap from the data visulisation for the flow Cytometry: FACS modality

8.6 Mass Cytometry: CyTOF (global panel)

This app accepts options including X axis, Y axis, a covariate and sample inclusion strategy. The results are in the format of PCA plots, PCA loadings plots and heatmaps.

Options from the data visulisation for the mass Cytometry: CyTOF (global panel) modality

PCA plot and correlation plot from the data visulisation for the mass Cytometry: CyTOF (global panel) modality

PCA loadings plot and heatmap from the data visulisation for the mass Cytometry: CyTOF (global panel) modality

8.7 CITE-Seq: GEX Pseudobulk

This app accepts options including X axis, Y axis, a covariate and sample inclusion strategy. The results are in the format of PCA plots, correlation plots, PCA loadings plots and heatmaps.

Options from the data visulisation for the CITE-Seq: GEX Pseudobulk modality

PCA plot and correlation plot from the data visulisation for the CITE-Seq: GEX Pseudobulk modality

PCA loadings plot and heatmap from the data visulisation for the CITE-Seq: GEX Pseudobulk modality

8.8 CITE-Seq: Composition

This app accepts options including X axis, Y axis, a covariate and sample inclusion strategy. The results are in the format of PCA plots, correlation plots, PCA loadings plots and heatmaps.

Options from the data visulisation for the CITE-Seq: Composition modality

PCA plot and correlation plot from the data visulisation for the CITE-Seq: Composition modality

PCA loadings plot and heatmap from the data visulisation for the CITE-Seq: Composition modality

8.9 Tensor Decomposition

This app accepts options such as a component and the posterior inclusion probability and produces the graphs that present the loading scores for samples, cell types for expression, CITE-Seq cell types, CyTOF depleted cell types, CyTOF nondepleted cell types, Luminex proteins, timsTOF proteins and genes.

Options from the data visulisation for the tensor decomposition modality

Boxplot of samples and barplot of cell types for expression boxplots from the data visulisation for the tensor decomposition modality

Barplot of CITE-Seq cell types from the data visulisation for the tensor decomposition modality

9 Download

Download module offers ways of downloading the raw datasets and key processed datasets by providing the links to the datasets.

Download module

9.1 Raw Datasets

This section provides the dataset ID, description, accession number and the link to access the data for raw datasets.

Download the raw datasets

For example, click on the link under the "Dataset" column for CBD-RAW-CLINVAR, the users will be directed to the webpage on the European Genome-phenome Archive (EGA) for further instructions.

An example dataset on EGA

9.2 Key Processed Datasets

This section shows the dataset ID, description, accession number and the link to the data for key processed datasets.

Download the key processed datasets

10 A case study

It is anticipated that users will be able to use COMBATdb to address the biological questions relating to COVID-19 studies through various ways. In this case study, we will demonstrate the usefulness of the database in the terms of comparing the patient groups across a wide spectrum of modalities in conjunction with a variety of different presentations and visualisations of the datasets.

One of the key questions in COVID-19 studies is the discovery of the biomarkers and understanding of the molecular mechanisms of disease severity. To address this question, we decided to perform analysis using shiny apps for proteomics: timsTOF modality, bulk RNA-Seq modality, CITE-Seq: GEX pseudobulk modality, CITE-Seq: composition modality, mass cytometry: CyTOF modality (global panel) and tensor decomposition modality.

Firstly, we have identified the differentially expressed proteins from proteomics: timsTOF modality for two comparisons.

It is found that the expression of LRG1 (leucine rich alpha-2-glycoprotein 1) is significantly different between COVID-19 (critical) (case group) and COVID-19 (mild) (control group) with FDR=0.00869 and log2FC=0.88.

A case study - proteomics: timsTOF modality

LRG1 protein is also differentially expressed between COVID-19 (critical) (case group) and COVID-19 (severe) (control group) with FDR=0.021 and log2FC=0.6.

A case study - proteomics: timsTOF modality

The boxplot confirms that there is a gradual increase in the protein expression along the severity scale while the healthy control group has the relatively lower level of expression.

A case study - proteomics: timsTOF modality

Furthermore, PCA plot signifies the potential separation of participant groups along the PC1 axis.

A case study - proteomics: timsTOF modality

LGR1 has been shown to have the fourth largest loading score on PC1.

A case study - proteomics: timsTOF modality

Next, we are going to examine whether this gene exhibits the similar pattern in terms of the gene expression using bulk and single-cell RNA-Seq data. By looking into the differential expression analysis through bulk RNA-Seq modality, it has been shown that LRG1 gene is indeed significantly altered between the two comparisons such as COVID-19 (critical) (case group) vs COVID-19 (mild) (control group) with FDR=0.0000292 and log2FC=1.51.

A case study - bulk RNA-Seq modality

LRG1 gene is differentially expressed between COVID-19 (critical) (case group) and COVID-19 (severe) (control group) with FDR=0.00236 and log2FC=0.97.

A case study - bulk RNA-Seq modality

Further into the details of gene expression in specific cell types, we have performed the differential gene analysis via CITE-Seq: GEX pseudobulk modality and observe that the test for LRG1 gene is significant in MNP Cell type among the three comparisons with varying fold changes. The differential expression analysis for COVID-19 (critical) (case group) vs COVID-19 (mild) (control group) shows FDR=0.00775 and log2FC=1.38.

A case study - CITE-Seq: GEX pseudobulk modality

The differential expression analysis for COVID-19 (critical) (case group) vs COVID-19 (severe) (control group) indicates the LRG1 gene with FDR=0.00552 and log2FC=0.93.

A case study - CITE-Seq: GEX pseudobulk modality

The differential expression analysis for COVID-19 (severe) (case group) vs COVID-19 (mild) (control group) shows the LRG1 gene with FDR=0.0392 and log2FC=0.75.

A case study - CITE-Seq: GEX pseudobulk modality

On the other hand, we can determine whether a specific cell population has changed their proportions in the total populations by looking at the single-cell data and mass cytometry data. For instance, PLT stands out as a differentially abundant cell type identified from CITE-Seq: composition modality for two comparisons. The comparison between COVID-19 (critical) (case group) and COVID-19 (mild) (control group) shows the cell type with FDR=0.00277 and log2FC=2.57.

A case study - CITE-Seq: composition modality

The comparison between COVID-19 (critical) (case group) and COVID-19 (severe) (control group) shows the cell type with FDR=0.0252 and log2FC=1.65.

A case study - CITE-Seq: composition modality

For the mass cytometry: CyTOF modality (global panel), the abundance of NK cells in non-depleted samples is significantly altered between COVID-19 (critical) (case group) and COVID-19 (severe) (control group) with FDR=0.0313 and log2FC=-0.83.

A case study - mass cytometry: CyTOF modality (global panel)

The comparison between COVID-19 (critical) (case group) vs COVID-19 (mild) (control group) reports the NK cells with FDR=0.0282 and log2FC=-0.81.

A case study - mass cytometry: CyTOF modality (global panel)

Finally, we can further explore the integrated analysis data on the "tensor decomposition modality". For example, component 171 with posterior inclusion probability >= 0.5 has loading scores for samples belonging to different participant groups that are progressively lower with more severe disease, indicating that this component is related to the severity of disease.

A case study - tensor decomposition modality

For the barplot of cell types for expression, we can find that MNP makes the largest contributions to this component.

A case study - tensor decomposition modality

For barplot of timsTOF proteins, SAA1 is the first major contributor and LRG1 is the fifth major contributor in terms of negative loading scores.

A case study - tensor decomposition modality

The barplot of genes shows the loading scores of genes contributing to this component.

A case study - tensor decomposition modality