Manual
3.3 Customisable Mode - Metadata
3.4 Single Mode - Multiomics Data
4.3 Proteomics: Luminex (plasma)
4.4 Proteomics: Luminex (serum)
4.6 Mass Cytometry: CyTOF (global panel)
4.7 Mass Cytometry: CyTOF (granulocyte panel)
4.12 CITE-Seq: Single-cell BCR
4.13 CITE-Seq: Single-cell TCR
6.3 Proteomics: Luminex (plasma)
6.4 Proteomics: Luminex (serum)
6.6 Mass Cytometry: CyTOF (global panel)
7.3 Proteomics: Luminex (plasma)
7.4 Proteomics: Luminex (serum)
7.6 Mass Cytometry: CyTOF (global panel)
7.11 CITE-Seq: Single-cell TCR
8.3 Proteomics: Luminex (plasma)
8.4 Proteomics: Luminex (serum)
8.6 Mass Cytometry: CyTOF (global panel)
COMBATdb is a database that contains the multiomics datasets and analysis generated from the COVID-19 Multi-omic Blood Atlas (COMBAT) project (https://www.combat.ox.ac.uk/) that defines the molecular signatures associated with the pathogenesis and severity of COVID-19 disease. It comprises the metadata and multiomics molecular data, and provides the calculation and visualisation of primary and integrative data.
1 Home
Home page provides the summary of the major components and functionalities of COMBATdb as well as the diagram of the layout and structure of the database.
2 Browse
Browse module offers the entry point for the users to access the metadata and modality data and their sophisticated interrelationships including modalities, participants, participant timepoints, samples, sources, genes and cell types.
2.1 Modalities
This section enables the users to browse all modalities in order to show the participants, participant timepoints, samples and sources associated with each modality.
The first column in the table refers to the modality name and clicking on this can lead to the details of the specific modality including modality ID, modality name, category and level.
The numbers under the "Participants", "Participant Timepoints", "Samples" and "Sources" indicates the number of participants, participant timepoints, samples and sources for each modality. Clicking on the number under the "Participants" column, the users will see a list of participants that have data for this particular modality.
Clicking on the number under the "Participant Timepoints" column, the users will be able to check a list of participant timepoints that have data for this particular modality.
Clicking on the number under the "Samples" column, the users will be able to examine a list of samples that have data for this particular modality.
Clicking on the number under the "Sources" column, the new page gives a list of sources that have data for this particular modality.
2.2 Participants
This section enables the users to browse all participants in order to show the modalities, participant timepoints, samples, sources as well as individual modality data and multiomics data associated with each participant.
If clicking on the link on the participant ID, the new page will provide detailed information for that participant including participant ID, age range and sex.
If clicking on the number under the "Modalities Applied" column for participant G04225, the users can see one modality "Proteomics: timsTOF" where the data are available for this participant.
The table shows the participant timepoints that are derived from participant "G04225".
Similarly, after clicking on the number under the "Samples" column, a list of samples associated with participant G04225 will be shown on a new page.
The clickable number under the "Sources" column for participant G04225 will lead to a new page showing that samples categorized as "COVID_community" are available from this participant.
All the available individual modality data for each participant are shown on each row across 13 modality columns. In addition, clicking on the link under the "Multiomics Data" column for participant G04225 will display data from across modalities for this participant of interest.
2.3 Participant Timepoints
This section enables the users to browse all participant timepoints in order to show the modalities, samples as well as individual modality data and multiomics data associated with each participant timepoint.
The participant timepoint ID is clickable and will result in a new page where the details of participant are provided, including participant timepoint ID, participant ID, sample priority, sampled at max severity, timepoint and source.
The "Modalities Applied" column will provide the modalities that have data for a specific participant timepoint.
"Samples" column will be able to show a list of samples that are derived from a specific participant timepoint.
All the available individual modality data for each participant timepoint are shown on each row across 13 modality columns. Besides, clicking on the link under the "Multiomics Data" column allows one to examine all data from various different modalities for a participant timepoint of interest.
2.4 Samples
This section enables the users to browse all samples in order to show the modalities as well as individual modality data and multiomics data associated with each sample.
Clicking on any entry under "ID" column will present the details for each sample, including participant timepoint ID, participant ID, cohort, patient ID, site, sub sample ID, sample type, derived sample type, processing site and batch letter.
Furthermore, the items on the right column are clickable. For example, clicking on "Whole Blood (EDTA)" will provide all of the samples that have this sample type.
The "Modalities Applied" column will help provide modality information for the samples.
The "Modality 1" - "Modality 13" columns provide single modality data for the samples. For example, clicking on the "Flow Cytometry: FACS" under "Modality 5" column for sample "G05061-Ja005E-PBCa" leads to the page below.
Lastly, "Multiomics Data" column will facilitate the examination of the primary analysis of multiomics data for the samples.
2.5 Sources
This section enables the users to browse all sources in order to show the modalities, participants, participant timepoints and samples associated with each source.
After clicking on the items under the "ID" column, more details will be shown for the source category including source ID, name and short name.
"Modalities Applied" column will show the modalities that have data from this source category.
Similarly, users will be able to find out the participants, participant timepoints and samples that are assigned to this specific source category.
2.6 Genes
This section enables the users to browse all genes as well as single modality data and multiomics data associated with each gene.
Here is the table that contains the details of the gene ENSG00000000003 after clicking on "ENSG00000000003" text under the "Gene ID" column.
This table shows the single modality data for a gene.
This is an example of the multiomics data for gene "ENSG00000000971".
2.7 Cell Types
This section enables the users to browse all cell types in order to show the participant timepoints and samples associated with each cell type as well as cell type related multiomics data.
The table below provides a list of samples for resolution "Cell Type" and Cell Type "B".
Here is the table showing the multiomics data for resolution "Cell Type", Cell Type "B" and sample "G05061-Ja005E-PBCa".
3 Search
Search module offers the capability to search for specific terms such as modalities, participants, participant timepoints, samples, sources, genes and cell types. There are four types of functionalities including "single mode - metadata", "multiple mode - metadata", "customisable mode - metadata" and "single mode - modality data".
3.1 Single Mode - Metadata
This section allows one to search for the details of single modality, participant, participant timepoint, sample, source, gene and cell type. Besides, "filter by all" will return the results for all possible types of data.
Here is an example of term = "G05061-Ja005E-PBCa" and Filter by = "Sample".
The search results return the detailed information for the sample "G05061-Ja005E-PBCa".
3.2 Multiple Mode - Metadata
This section allows one to search for the details of multiple modalities, participants, participant timepoints, samples, sources, genes and cell types. Besides, "filter by all" will return the results for all possible types of data.
In this example, the input is five gene IDs or gene names and Filter by is "Gene".
The search results present the detailed annotations for each of the five query genes.
3.3 Customisable Mode - Metadata
This section provides the capability to generate a list of participants, participant timepoints or samples based on the user-defined modality and source.
In this example, modality = "Bulk RNA-Seq", source = "COVID_IP_critical" and Filter by = "Sample".
The results return a list of samples for modality "Bulk RNA-Seq" and source "COVID_IP_critical".
3.4 Single Mode - Multiomics Data
This section presents the multiomics data for user-provided single participant, participant timepoint, sample, gene and cell type.
In this example, type "ERYTH" in the text box, choose "Cell type" for Filter by and press the "Search" button.
The first level page provides a number of modalities for users to choose from.
For "CITE-Seq: Composition" modality, the table presents the modality data such as "Count", "Total" and "Percentage (%)".
For "CITE-Seq: GEX Pseudobulk" modality, the second level page lists all genes.
The third level page shows the gene expression values such as "Count", "Residual" and "RPM" for gene "ENSG00000000003", resolution "Cell Type" and cell type "ERYTH".
Type in CXCL10 in the box and filter by gene and press the search button.
The new page will present the multiomics data for CXCL10 gene.
4 Primary
Primary module houses the processed and normalised matrix data for each modality in the database including "Bulk RNA-Seq", "Proteomics: timsTOF", "Proteomics: Luminex (plasma)", "Proteomics: Luminex (serum)", "Flow Cytometry: FACS", "Mass Cytometry: CyTOF (global panel)", "Mass Cytometry: CyTOF (granulocyte panel)", "CITE-Seq: GEX Pseudobulk", "CITE-Seq: Composition", "Bulk Repertoire: BCR", "Bulk Repertoire: TCR", "CITE-Seq: Single-cell BCR" and "CITE-Seq: Single-cell TCR".
4.1 Bulk RNA-Seq
This section provides the gene expression data for each gene and each sample from "Bulk RNA-Seq" modality.
The table displays the gene expression values such as "Count" and "Logcpm" for gene "ENSG00000000003" for all samples.
4.2 Proteomics: timsTOF
This section provides the protein abundance data for each protein and each sample from "Proteomics: timsTOF" modality.
The protein annotation information is available through the hyperlinks on the "Protein ID".
This table consists of protein ID, gene name, entry name, sample ID and intensity.
4.3 Proteomics: Luminex (plasma)
This section provides protein abundance data for each protein and each sample from "Proteomics: Luminex (plasma)" modality.
This table provides additional information for the proteins, including "Protein ID", "Gene Name" and "Full Name".
Two types of protein abundance measurements such as "Fluorescence Intensity" and "Concentration" can be found in this table.
4.4 Proteomics: Luminex (serum)
This section provides protein abundance data for each protein and each sample from "Proteomics: Luminex (serum)" modality.
4.5 Flow Cytometry: FACS
This section provides cell frequency data for each cell type and each sample from "Flow Cytometry: FACS" modality.
The table below presents the cell frequency data for cell population "Myeloid and B cell populations" and cell type "CD19".
4.6 Mass Cytometry: CyTOF (global panel)
This section provides cell frequency data for each cell type and each sample from "Mass Cytometry: CyTOF (global panel)" modality.
This table presents the cell frequency data for population "Depleted B Cells" and cell type "Naive B cells".
4.7 Mass Cytometry: CyTOF (granulocyte panel)
This section provides cell marker expression data for each cell type and each sample from "Mass Cytometry: CyTOF (granulocyte panel)" modality.
This table provides cell marker expression data for population "Monocyte Markers" and cell type "CD45".
4.8 CITE-Seq: GEX Pseudobulk
This section provides gene expression data for each gene, each cell type and each sample in three different levels of resolution of cell types from "CITE-Seq: GEX Pseudobulk" modality.
This page provides all the cell types for the users to choose from.
This table shows three different types of gene expression values such as "Count", "Residual" and "RPM".
4.9 CITE-Seq: Composition
This section provides cell count and cell frequency data for each cell type and each sample in three different levels of resolution of cell types from "CITE-Seq: Composition" modality.
In this page, users will be able to view the results from this modality such as "Count", "Total" and "Percentage (%)".
4.10 Bulk Repertoire: BCR
This section provides BCR metrics data for each sample from "Bulk Repertoire: BCR" modality such as "Percentage Unmutated Expanded" and "Percentage Unmutated Unexpanded".
4.11 Bulk Repertoire: TCR
This section provides TCR metrics data for each sample from "Bulk Repertoire: TCR" modality such as "Vertex Gini Index All", "V Gene Replacement Frequency All" and "D5 All".
4.12 CITE-Seq: Single-cell BCR
This section provides chain annotation data for each locus category, each sample and each contig from "CITE-Seq: Single-cell BCR" modality.
The next page will provide a list of samples.
The final page shows the contig information.
4.13 CITE-Seq: Single-cell TCR
This section provides the chain annotation data for each locus, each sample and each clone from "CITE-Seq: Single-cell TCR" modality.
A list of samples is shown on this page.
This page provides information about each clone.
5 Integrate
Integrate module displays the integrative analysis of across modality data including "Bulk RNA-Seq", "Proteomics: timsTOF", "Proteomics: Luminex", "Mass Cytometry: CyTOF", "CITE-Seq: GEX Pseudobulk" and "CITE-Seq: Composition".
5.1 Tensor Decomposition
This section displays the results generated from tensor decomposition analysis of multiomics datasets such as loading scores and posterior inclusion probabilities. Data type includes "Cell types for expression", "CITE-Seq cell types", "CyTOF depleted cell types", "CyTOF nondepleted cell types", "Genes", "Luminex proteins", "Samples" and "timsTOF proteins". "All" covers all the above-mentioned data types.
This table lists all components.
This table provides loading scores and posterior inclusion probabilities for data type "CITE-Seq cell types", component "1" and a number of features.
6 Compare
Compare module displays the results of differential abundance analysis between the selected comparator groups and provides the statistical measurements such as fold change and FDR for the modalities such as "Bulk RNA-Seq", "Proteomics: timsTOF", "Proteomics: Luminex (plasma)", "Proteomics: Luminex (serum)", "Flow Cytometry: FACS", "Mass Cytometry: CyTOF (global panel)", "CITE-Seq: GEX Pseudobulk", "CITE-Seq: Composition" and "scATAC-Seq".
6.1 Bulk RNA-Seq
This page asks the users to choose comparator group 1, comparator group 2 and data type.
For instance, the users have chosen "COVID_IP_critical" for comparator group 1, "COVID_IP_mild" for comparator group 2 and "Gene" for data type on the following page.
Once pressing the search button, a new page will return the differential gene expression table with gene ID, gene name, AveExpr, logFC, P-value and FDR under the cut-offs of FDR < 0.05 and fold change > 1.5.
6.2 Proteomics: timsTOF
This section allows the users to choose three options such as comparator group 1, comparator group 2 and data type and presents the table of differentially abundant proteins with protein ID, gene name, logFC, AveExpr, t, P-value and FDR.
6.3 Proteomics: Luminex (plasma)
This section allows the users to choose three options such as comparator group 1, comparator group 2 and data type and presents the table of differentially abundant proteins with protein ID, gene name, full name, logFC and P-value.
6.4 Proteomics: Luminex (serum)
This section allows the users to choose three options such as comparator group 1, comparator group 2 and data type and presents the table of differentially abundant proteins with protein ID, gene name, full name, logFC and P-value.
6.5 Flow Cytometry: FACS
This section allows the users to choose three options such as comparator group 1, comparator group 2 and data type and presents the table of differentially abundant clusters with cluster name, logFC, P-value and FDR.
6.6 Mass Cytometry: CyTOF (global panel)
This section allows the users to choose three options such as comparator group 1, comparator group 2 and cell population and presents the table of differentially abundant clusters with cluster name, logFC, P-value and FDR.
6.7 CITE-Seq: GEX Pseudobulk
This section allows the users to choose four options such as comparator group 1, comparator group 2, cell group resolution and data type and presents the table of differentially expressed genes with gene ID, gene name, logFC, logCPM, LR, P-value and FDR.
6.8 CITE-Seq: Composition
This section allows the users to choose four options such as comparator group 1, comparator group 2, cell group resolution and data type and presents the table of differentially abundant cell clusters with cell population name, logFC, logCPM, F, P-value and FDR.
6.9 ScATAC-Seq
This section allows the users to choose three options such as comparator group 1, comparator group 2 and cell type and presents the table of differentially accessible peaks with chromosome name, start, end, MeanDiff, log2FC and FDR.
7 Rshiny
RShiny module offers the functionality of performing the differential abundance analysis and producing the tables and plots based on the user-provided options for modalities including "Bulk RNA-Seq", "Proteomics: timsTOF", "Proteomics: Luminex (plasma)", "Proteomics: Luminex (serum)", "Flow Cytometry: FACS", "Mass Cytometry: CyTOF (global panel)", "CITE-Seq: GEX Pseudobulk", "CITE-Seq: Composition", "Bulk Repertoire: BCR", "Bulk Repertoire: TCR" and "CITE-Seq: Single-cell TCR".
7.1 Bulk RNA-Seq
This app accepts options including first comparator group (case group), second comparator group (control group), sample inclusion strategy, covariates to adjust for, FDR threshold and fold change threshold. After conducting the differential gene expression analysis using limma package for the comparison between two of the source groups such as "Healthy", "COVID-19 (mild)", "COVID-19 (severe)", "COVID-19 (critical)", "COVID-19 (community)" and "Sepsis", the results are in the format of volcano plots, tables and boxplots. In particular, the main results are downloadable through the download buttons surrounding the tables and graphs. Once any of the options have been changed, the results affected will be automatically refreshed to reflect the updated parameters. Besides, there are two additional options such as a gene name and a covariate for the generation of boxplots.
7.2 Proteomics: timsTOF
This app accepts options including first comparator group (case group), second comparator group (control group), sample inclusion strategy, covariates to adjust for, FDR threshold and Fold change threshold. After conducting the differential protein abundance analysis using limma package for the comparison between two of the source groups such as "Healthy", "COVID-19 (mild)", "COVID-19 (severe)", "COVID-19 (critical)", "COVID-19 (community)" and "Sepsis", the results are in the format of volcano plots, tables and boxplots.
7.3 Proteomics: Luminex (plasma)
This app accepts options including first comparator group (case group), second comparator group (control group), sample inclusion strategy, covariates to adjust for, P-value threshold and fold change threshold. After conducting the differential protein abundance analysis using t-test for the comparison between two of the source groups such as "Healthy", "COVID-19 (mild)", "COVID-19 (severe)", "COVID-19 (critical)", "COVID-19 (community)" and "Sepsis", the results are in the format of volcano plots, tables and boxplots.
7.4 Proteomics: Luminex (serum)
This app accepts options including first comparator group (case group), second comparator group (control group), sample inclusion strategy, covariates to adjust for, P-value threshold and fold change threshold. After conducting the differential protein abundance analysis using t-test for the comparison between two of the source groups such as "Healthy", "COVID-19 (critical)" and "Flu", the results are in the format of volcano plots, tables and boxplots.
7.5 Flow Cytometry: FACS
This app accepts options including first comparator group (case group), second comparator group (control group), sample inclusion strategy, covariates to adjust for, FDR threshold and fold change threshold. After conducting the differential abundance analysis using edgeR package for the comparison between two of the source groups such as "Healthy", "COVID-19 (mild)", "COVID-19 (severe)", "COVID-19 (critical)", "COVID-19 (community)", "Flu" and "Sepsis", the results are in the format of volcano plots, tables and boxplots.
7.6 Mass Cytometry: CyTOF (global panel)
This app accepts options including first comparator group (case group), second comparator group (control group), sample inclusion strategy, covariates to adjust for, FDR threshold, fold change threshold and cell population. After conducting the differential abundance analysis using edgeR package for the comparison between two of the source groups such as "Healthy", "COVID-19 (mild)", "COVID-19 (severe)", "COVID-19 (critical)", "COVID-19 (community)", "COVID-19 (convalescent)", "Sepsis" and "Sepsis (convalescent)", the results are in the format of volcano plots, tables and boxplots.
7.7 CITE-Seq: GEX Pseudobulk
This app accepts options including first comparator group (case group), second comparator group (control group), sample inclusion strategy, covariates to adjust for, FDR threshold, fold change threshold, cell group resolution and a cell cluster name. After conducting the differential gene expression analysis using edgeR package for the comparison between two of the source groups such as "Healthy", "COVID-19 (mild)", "COVID-19 (severe)", "COVID-19 (critical)", "COVID-19 (community)", "Flu" and "Sepsis", the results are in the format of volcano plots, tables and boxplots.
7.8 CITE-Seq: Composition
This app accepts options including first comparator group (case group), second comparator group (control group), sample inclusion strategy, covariates to adjust for, FDR threshold, fold change threshold and cell group resolution. After conducting the differential abundance analysis using edgeR package for the comparison between two of the source groups such as "Healthy", "COVID-19 (mild)", "COVID-19 (severe)", "COVID-19 (critical)", "COVID-19 (community)", "Flu" and "Sepsis", the results are in the format of volcano plots, tables and boxplots.
7.9 Bulk Repertoire: BCR
This app accepts options including first comparator group (case group), second comparator group (control group), sample inclusion strategy, metric, P-value threshold and fold change threshold. After conducting the differential abundance analysis using Wilcoxon test for the comparison between two of the source groups such as "Healthy", "COVID-19 (mild)", "COVID-19 (severe)", "COVID-19 (critical)", "COVID-19 (community)" and "Sepsis", the results are in the format of volcano plots, tables and boxplots.
7.10 Bulk Repertoire: TCR
This app accepts options including first comparator group (case group), second comparator group (control group), sample inclusion strategy, metric, P-value threshold and fold change threshold. After conducting the differential abundance analysis using Wilcoxon test for the comparison between two of the source groups such as "Healthy", "COVID-19 (mild)", "COVID-19 (severe)", "COVID-19 (critical)", "COVID-19 (community)" and "Sepsis", the results are in the format of volcano plots, tables and boxplots.
7.11 CITE-Seq: Single-cell TCR
This app accepts options including first comparator group (case group), second comparator group (control group), sample inclusion strategy, metric, P-value threshold and fold change threshold. After conducting the differential abundance analysis using Wilcoxon test for the comparison between two of the source groups such as "Healthy", "COVID-19 (mild)", "COVID-19 (severe)", "COVID-19 (critical)", "COVID-19 (community)", "Flu" and "Sepsis", the results are in the format of volcano plots, tables and boxplots.
8 Visualisation
Visualisation module enables data mining and visualisation for modalities such as "Bulk RNA-Seq", "Proteomics: timsTOF", "Proteomics: Luminex (plasma)", "Proteomics: Luminex (serum)", "Flow Cytometry: FACS", "Mass Cytometry: CyTOF (global panel)", "CITE-Seq: GEX Pseudobulk", "CITE-Seq: Composition" and "Tensor Decomposition".
8.1 Bulk RNA-Seq
This app accepts options including X axis, Y axis, a covariate and sample inclusion strategy. The results are in the format of PCA plots, correlation plots, PCA loadings plots and heatmaps.
8.2 Proteomics: timsTOF
This app accepts options including X axis, Y axis, a covariate and sample inclusion strategy. The results are in the format of PCA plots, correlation plots, PCA loadings plots and heatmaps.
8.3 Proteomics: Luminex (plasma)
This app accepts options including X axis, Y axis, a covariate and sample inclusion strategy. The results are in the format of PCA plots, correlation plots, PCA loadings plots and heatmaps.
8.4 Proteomics: Luminex (serum)
This app accepts options including X axis, Y axis, a covariate and sample inclusion strategy. The results are in the format of PCA plots, correlation plots, PCA loadings plots and heatmaps.
8.5 Flow Cytometry: FACS
This app accepts options including X axis, Y axis, a covariate and sample inclusion strategy. The results are in the format of PCA plots, PCA loadings plots and heatmaps.
8.6 Mass Cytometry: CyTOF (global panel)
This app accepts options including X axis, Y axis, a covariate and sample inclusion strategy. The results are in the format of PCA plots, PCA loadings plots and heatmaps.
8.7 CITE-Seq: GEX Pseudobulk
This app accepts options including X axis, Y axis, a covariate and sample inclusion strategy. The results are in the format of PCA plots, correlation plots, PCA loadings plots and heatmaps.
8.8 CITE-Seq: Composition
This app accepts options including X axis, Y axis, a covariate and sample inclusion strategy. The results are in the format of PCA plots, correlation plots, PCA loadings plots and heatmaps.
8.9 Tensor Decomposition
This app accepts options such as a component and the posterior inclusion probability and produces the graphs that present the loading scores for samples, cell types for expression, CITE-Seq cell types, CyTOF depleted cell types, CyTOF nondepleted cell types, Luminex proteins, timsTOF proteins and genes.
9 Download
Download module offers ways of downloading the raw datasets and key processed datasets by providing the links to the datasets.
9.1 Raw Datasets
This section provides the dataset ID, description, accession number and the link to access the data for raw datasets.
For example, click on the link under the "Dataset" column for CBD-RAW-CLINVAR, the users will be directed to the webpage on the European Genome-phenome Archive (EGA) for further instructions.
9.2 Key Processed Datasets
This section shows the dataset ID, description, accession number and the link to the data for key processed datasets.
10 A case study
It is anticipated that users will be able to use COMBATdb to address the biological questions relating to COVID-19 studies through various ways. In this case study, we will demonstrate the usefulness of the database in the terms of comparing the patient groups across a wide spectrum of modalities in conjunction with a variety of different presentations and visualisations of the datasets.
One of the key questions in COVID-19 studies is the discovery of the biomarkers and understanding of the molecular mechanisms of disease severity. To address this question, we decided to perform analysis using shiny apps for proteomics: timsTOF modality, bulk RNA-Seq modality, CITE-Seq: GEX pseudobulk modality, CITE-Seq: composition modality, mass cytometry: CyTOF modality (global panel) and tensor decomposition modality.
Firstly, we have identified the differentially expressed proteins from proteomics: timsTOF modality for two comparisons.
It is found that the expression of LRG1 (leucine rich alpha-2-glycoprotein 1) is significantly different between COVID-19 (critical) (case group) and COVID-19 (mild) (control group) with FDR=0.00869 and log2FC=0.88.
LRG1 protein is also differentially expressed between COVID-19 (critical) (case group) and COVID-19 (severe) (control group) with FDR=0.021 and log2FC=0.6.
The boxplot confirms that there is a gradual increase in the protein expression along the severity scale while the healthy control group has the relatively lower level of expression.
Furthermore, PCA plot signifies the potential separation of participant groups along the PC1 axis.
LGR1 has been shown to have the fourth largest loading score on PC1.
Next, we are going to examine whether this gene exhibits the similar pattern in terms of the gene expression using bulk and single-cell RNA-Seq data. By looking into the differential expression analysis through bulk RNA-Seq modality, it has been shown that LRG1 gene is indeed significantly altered between the two comparisons such as COVID-19 (critical) (case group) vs COVID-19 (mild) (control group) with FDR=0.0000292 and log2FC=1.51.
LRG1 gene is differentially expressed between COVID-19 (critical) (case group) and COVID-19 (severe) (control group) with FDR=0.00236 and log2FC=0.97.
Further into the details of gene expression in specific cell types, we have performed the differential gene analysis via CITE-Seq: GEX pseudobulk modality and observe that the test for LRG1 gene is significant in MNP Cell type among the three comparisons with varying fold changes. The differential expression analysis for COVID-19 (critical) (case group) vs COVID-19 (mild) (control group) shows FDR=0.00775 and log2FC=1.38.
The differential expression analysis for COVID-19 (critical) (case group) vs COVID-19 (severe) (control group) indicates the LRG1 gene with FDR=0.00552 and log2FC=0.93.
The differential expression analysis for COVID-19 (severe) (case group) vs COVID-19 (mild) (control group) shows the LRG1 gene with FDR=0.0392 and log2FC=0.75.
On the other hand, we can determine whether a specific cell population has changed their proportions in the total populations by looking at the single-cell data and mass cytometry data. For instance, PLT stands out as a differentially abundant cell type identified from CITE-Seq: composition modality for two comparisons. The comparison between COVID-19 (critical) (case group) and COVID-19 (mild) (control group) shows the cell type with FDR=0.00277 and log2FC=2.57.
The comparison between COVID-19 (critical) (case group) and COVID-19 (severe) (control group) shows the cell type with FDR=0.0252 and log2FC=1.65.
For the mass cytometry: CyTOF modality (global panel), the abundance of NK cells in non-depleted samples is significantly altered between COVID-19 (critical) (case group) and COVID-19 (severe) (control group) with FDR=0.0313 and log2FC=-0.83.
The comparison between COVID-19 (critical) (case group) vs COVID-19 (mild) (control group) reports the NK cells with FDR=0.0282 and log2FC=-0.81.
Finally, we can further explore the integrated analysis data on the "tensor decomposition modality". For example, component 171 with posterior inclusion probability >= 0.5 has loading scores for samples belonging to different participant groups that are progressively lower with more severe disease, indicating that this component is related to the severity of disease.
For the barplot of cell types for expression, we can find that MNP makes the largest contributions to this component.
For barplot of timsTOF proteins, SAA1 is the first major contributor and LRG1 is the fifth major contributor in terms of negative loading scores.
The barplot of genes shows the loading scores of genes contributing to this component.