Contents

1 Citation

The SPEEDI manuscript is currently under review.

2 Installation and Loading

To begin, we install the SPEEDI package and its dependencies.

# Install necessary packages
if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
devtools::install_github('FunctionLab/SPEEDI', repos = BiocManager::repositories())

We then load the SPEEDI package.

# Load SPEEDI
library(SPEEDI)

3 Quick Start

After loading SPEEDI, the fastest way to begin using the pipeline is through the run_SPEEDI() function. In the provided code snippet, we show a brief illustration with some example inputs, but you should read the documentation for the function (i.e., ?run_SPEEDI) and look at our complete example analysis below to learn more.

# Example parameters for run_SPEEDI - note that some optional parameters were not used
# To see other options for reference_tissue, data_type, and species, look at ?run_SPEEDI
reference_tissue <- "PBMC"
data_type <- "RNA"
species <- "human"
# input_dir points to where input data is stored
# Note that input data should be FILTERED H5 or MEX (matrix.mtx.gz / barcodes.tsv.gz / features.tsv.gz) files
# generated by Cell Ranger
input_dir <- "~/test_input"
# output_dir points to where output data will be stored
output_dir <- "~/test_output"
# Run SPEEDI pipeline (not run)
run_SPEEDI(reference_tissue = reference_tissue, data_type = data_type, species = species, input_dir = input_dir,  
           output_dir = output_dir)

4 Example Analysis for scRNA-seq (PBMC, COVID)

4.1 Introduction

Our example scRNA-seq data come from the publication Differential chromatin accessibility in peripheral blood mononuclear cells underlies COVID-19 disease severity prior to seroconversion. We integrate data from 5 healthy controls and 4 severe COVID patients, and we also perform some preliminary downstream analyses (differential expression analysis and functional module discovery).

Importantly, the input data for our analysis are available on GEO. You can download the files using wget as in the code snippet below:

wget https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM6249nnn/GSM6249272/suppl/GSM6249272_scRNA_2209_filtered_feature_bc_matrix.tar.gz
wget https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM6249nnn/GSM6249273/suppl/GSM6249273_scRNA_2211_filtered_feature_bc_matrix.tar.gz
wget https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM6249nnn/GSM6249274/suppl/GSM6249274_scRNA_2213_filtered_feature_bc_matrix.tar.gz
wget https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM6249nnn/GSM6249275/suppl/GSM6249275_scRNA_2224_filtered_feature_bc_matrix.tar.gz
wget https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM6249nnn/GSM6249276/suppl/GSM6249276_scRNA_2227_filtered_feature_bc_matrix.tar.gz
wget https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM6249nnn/GSM6249252/suppl/GSM6249252_scRNA_34-7_filtered_feature_bc_matrix.tar.gz
wget https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM6249nnn/GSM6249254/suppl/GSM6249254_scRNA_39-7_filtered_feature_bc_matrix.tar.gz
wget https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM6249nnn/GSM6249257/suppl/GSM6249257_scRNA_41-7_filtered_feature_bc_matrix.tar.gz
wget https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM6249nnn/GSM6249270/suppl/GSM6249270_scRNA_145-7_filtered_feature_bc_matrix.tar.gz

We can then unarchive these files and rename the resulting folders so that the names are cleaner:

for i in *.tar.gz; do tar -xvf $i; done
for folder in *_filtered_feature_bc_matrix; do
    new_folder="${folder%_filtered_feature_bc_matrix}"
    mv "$folder" "$new_folder"
done

Within each folder, we have filtered data generated by Cell Ranger in the MEX format. SPEEDI also works with H5 format files. If working with MEX files, sample data should always follow the standard naming convention (three files with names barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gz). If working with H5 files, sample data file names should always end with filtered_feature_bc_matrix.h5. For example, filtered_feature_bc_matrix.h5 and Sample_1_filtered_feature_bc_matrix.h5 would both be fine, but filtered_matrix.h5 would result in an error.

IMPORTANT: SPEEDI expects data that have been filtered using Cell Ranger or some other similar process.

4.2 Step 0: Set Up SPEEDI

After downloading our input data, we are ready to begin using the SPEEDI pipeline. We could use the run_SPEEDI() function to run the entire pipeline, but we will go through the pipeline step-by-step for clarity.

First, we’ll go over the different parameters for SPEEDI. We highly recommend that you read the run_SPEEDI() function documentation to see all possible values for each parameter.

We’ll begin with the parameters that we provide values for in this analysis:

  • reference_tissue: Reference tissue type (used to map cell types via reference). Here, we choose PBMC_full (full explanation below).
  • data_type: Type of data being processed. Here, we choose RNA.
  • species: Species being analyzed. Here, we choose human.
  • input_dir: Path to directory where input data are located. Here, we choose input/.
  • reference_dir: Path to directory where reference will be downloaded by LoadReferenceSPEEDI() if necessary. Here, we choose references/.
  • output_dir: Path to directory where output will be saved. Here, we choose output/.
  • metadata_df: Data frame that contains metadata about our samples. Its value will declared in the code snippet below.

There are two different PBMC reference tissue types in SPEEDI: PBMC and PBMC_full. The former uses a SeuratData reference and Azimuth to map cell types, while the latter downloads a more detailed, quite large (~8 GB) PBMC reference for cell type mapping. To get the best possible mapping, we choose the latter. Importantly, SPEEDI loads this reference into memory during processing, so you should use PBMC unless you have sufficient memory to load the PBMC_full reference.

Our other parameters will use the default values:

  • reference_file_name: Base name of custom reference file. Because we’re not using a custom reference file, we leave this at its default value (NULL).
  • reference_cell_type_attribute: If using a Seurat reference object, this parameter captures where the cell type information is stored. We leave this at its default value ("celltype.l2").
  • analysis_name: Name used to create subdirectory in output_dir for current analysis run. We’ll let SPEEDI set this for us, so we’ll leave it at its default value (NULL).
  • sample_id_list: Vector of sample names (optional - if not provided, will select all samples found recursively in input_dir). Because we’re analyzing all of the samples that we downloaded (and not a subset), we leave this at its default value (NULL).
  • sample_file_paths: Vector of sample file paths (optional - if not provided, will select all samples found recursively in input_dir). If this argument is used, sample_id_list is required and should be written in the same order as the sample file paths. Because we’re analyzing all of the samples that we downloaded (and not a subset), we leave this at its default value (NULL).
  • record_doublets: Boolean flag to indicate whether we will record doublets in the data (using the scDblFinder package). We leave this at its default value (FALSE).
reference_tissue <- "PBMC_full"
data_type <- "RNA"
species <- "human"
input_dir <- "input/"
reference_dir <- "references/"
output_dir <- "output/"
# Here, we set up our (very simple) metadata data frame
metadata_df <- data.frame("disease" = c("COVID", "COVID", "COVID", "COVID", 
                                        "healthy", "healthy", "healthy", "healthy", "healthy"))
rownames(metadata_df) <- c("scRNA_34-7", "scRNA_39-7", "scRNA_41-7", "scRNA_145-7", 
                           "scRNA_2209", "scRNA_2211", "scRNA_2213", "scRNA_2224", "scRNA_2227")

To complete our setup, we set a random seed using set.seed() to ensure reproducibility and then use the initialize_SPEEDI() function. While this function isn’t strictly necessary for using SPEEDI, it will help make sure that all variables are set properly and will create a log file for SPEEDI.

# Note that we use get_speedi_seed() to get the global seed used by SPEEDI for tasks like SCTransform normalization, UMAP generation etc.
# If you want to set a different global seed for SPEEDI, you can use the set_speedi_seed() function
set.seed(get_speedi_seed())
SPEEDI_variables <- initialize_SPEEDI(reference_tissue, data_type, species, 
                                      input_dir, reference_dir, output_dir, metadata_df)

After running initialize_SPEEDI(), all of the corrected variable values will be stored in the SPEEDI_variables list. These variables include everything highlighted above as well as a few new ones:

  • RNA_output_dir: Output directory for RNA analyses (will contain all analysis output for this example)
  • ATAC_output_dir: Output directory for ATAC analyses (not relevant for this example)
  • log_file_path: File name associated with log file
  • old_wd: ATAC analyses require our working directory be updated to ATAC_output_dir, so we save the previous working directory in old_wd (not relevant for this example)

To refer to a given variable, use list notation (e.g., the corrected input_dir is stored in SPEEDI_variables$input_dir).

Note the directory structures below:

# output_dir now includes a subdirectory for our specific analysis
# This directory contains our log file
SPEEDI_variables$output_dir
## [1] "/home/wat2/SPEEDI_vignette/output/2023-11-09_19-02-28_SPEEDI/"
# RNA_output_dir is a subdirectory of our new output_dir
SPEEDI_variables$RNA_output_dir
## [1] "/home/wat2/SPEEDI_vignette/output/2023-11-09_19-02-28_SPEEDI/RNA/"

4.3 Step 1: Load Reference

Before we begin processing the input data, we first load our PBMC reference. Because we’re using PBMC_full as our reference_tissue value, SPEEDI will automatically download the reference object (pbmc_multimodal.h5seurat) and save it in reference_dir. Note that we set log_flag to TRUE so that any messages are automatically recorded in our log file.

reference <- LoadReferenceSPEEDI(reference_tissue = SPEEDI_variables$reference_tissue, 
                                 reference_dir = SPEEDI_variables$reference_dir,
                                 species = SPEEDI_variables$species, log_flag = TRUE)

We can see that reference now stores a Seurat object:

reference
## An object of class Seurat 
## 20957 features across 161764 samples within 2 assays 
## Active assay: SCT (20729 features, 5000 variable features)
##  2 layers present: counts, data
##  1 other assay present: ADT
##  6 dimensional reductions calculated: apca, aumap, pca, spca, umap, wnn.umap

4.4 Step 2: Read Input Data

Next, we will read in the input data (scRNA-seq) for processing using the Read_RNA() function. Sample data are processed in parallel and compiled into a single comprehensive expression matrix.

all_sc_exp_matrices <- Read_RNA(input_dir = SPEEDI_variables$input_dir, log_flag = TRUE)
nrow(all_sc_exp_matrices)
## [1] 32738
ncol(all_sc_exp_matrices)
## [1] 124692

Our matrix contains 124692 barcodes and 32738 transcripts in total.

4.5 Step 3: Filter Input Data

While our input data have been previously filtered by Cell Ranger, we still need to perform QC filtering (based on number of transcripts, number of genes, percentage of mitochondrial content, percentage of hemoglobin content, and percentage of ribosomal content). For each category, the FilterRawData_RNA() function automatically selects the QC threshold for each sample using the kneedle algorithm and the quantile function. In addition, the FilterRawData_RNA() function performs the following tasks:

  • Creates pre-filtered QC plots for each category above (see below)
  • Transforms the expression matrix into a Seurat object for easier downstream processing
  • If record_doublets is set to TRUE, records information about doublets in the Seurat object
sc_obj <- FilterRawData_RNA(all_sc_exp_matrices = all_sc_exp_matrices, species = SPEEDI_variables$species,
                           output_dir = SPEEDI_variables$RNA_output_dir, log_file_path = SPEEDI_variables$log_file_path, 
                           log_flag = TRUE)
# We can delete all_sc_exp_matrices because we're done with it
rm(all_sc_exp_matrices)
sc_obj
## An object of class Seurat 
## 21693 features across 88870 samples within 1 assay 
## Active assay: RNA (21693 features, 0 variable features)
##  9 layers present: counts.1.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject, counts.2.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject, counts.unbias.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject, counts.unbias.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject, counts.unbias.SeuratProject.SeuratProject.SeuratProject.SeuratProject, counts.unbias.SeuratProject.SeuratProject.SeuratProject, counts.unbias.SeuratProject.SeuratProject, counts.unbias.SeuratProject, counts.unbias

After QC filtering is complete, we see that 88870 barcodes and 21693 transcripts remain. You can always see the automatically chosen QC thresholds by looking at the log file (they may also be displayed in your standard R output depending on your environment). In addition, if you’re interested in the quality of your samples before filtering, you can look at the automatically generated pre-filtering QC plots (found in RNA_output_dir). One example (for number of genes) is displayed below:

Pre-filtered QC plot for number of genes.

Figure 1: Pre-filtered QC plot for number of genes

4.6 Step 4: Initial Processing

Now that the data have been filtered, we will normalize the data using SCTransform() (from Seurat). We regress against mitochondrial percentage (percent.mt), ribosomal percentage (percent.rps and percent.rpl), and hemoglobin percentage (percent.hb). We also calculate cell cycle scores and regress against the differences between S score and G2M score (CC.Difference). Finally, we add metadata from metadata_df into our Seurat object.

sc_obj <- InitialProcessing_RNA(sc_obj = sc_obj, species = SPEEDI_variables$species, 
                                output_dir = SPEEDI_variables$RNA_output_dir, metadata_df = SPEEDI_variables$metadata_df, 
                                log_flag = TRUE)

Looking at the UMAP generated from InitialProcessing_RNA(), we can see that the samples are not yet integrated:

UMAP of sample data before batch correction and integration.

Figure 2: UMAP of sample data before batch correction and integration

4.7 Step 5: Inferring Data-Derived Batches

Next, we will use SPEEDI’s data-derived batch inference method to determine sample batches in our data:

sc_obj <- InferBatches(sc_obj = sc_obj, log_flag = TRUE)

In total, SPEEDI found 4 batches for our 9 samples:

length(unique(sc_obj$batch))
## [1] 4

4.8 Step 6: Integrate Batches

After we have grouped our samples according to batch, we are ready to integrate by batch. We first normalize counts within each batch using SCTransform() and then use a selection of methods from Seurat (SelectIntegrationFeatures(), PrepSCTIntegration(), FindIntegrationAnchors(), and IntegrateData()) to integrate our data.

sc_obj <- IntegrateByBatch_RNA(sc_obj = sc_obj, log_flag = TRUE)

Note that we will now use the integrated assay for visualization (as opposed to the SCT assay) unless only one batch was found above:

sc_obj
## An object of class Seurat 
## 43931 features across 88870 samples within 3 assays 
## Active assay: integrated (3000 features, 3000 variable features)
##  2 layers present: data, scale.data
##  2 other assays present: RNA, SCT

4.9 Step 7: Final Processing

We perform some final processing of our integrated data (scaling and preparing for differential expression) using the VisualizeIntegration() function.

sc_obj <- VisualizeIntegration(sc_obj = sc_obj, output_dir = SPEEDI_variables$RNA_output_dir, log_flag = TRUE)

4.10 Step 8: Map Cell Types

We are now ready to use our PBMC reference to map cell types onto our data. In this case, because we have a full Seurat reference object, our MapCellTypes_RNA() function uses the Seurat functions FindMappingAnchors() and MapQuery() to map cell types. For other built-in references, SPEEDI will use the Azimuth function RunAzimuth() to map cell types. Finally, SPEEDI runs a majority vote algorithm (MajorityVote_RNA()) to finalize cell types within each cluster.

sc_obj <- MapCellTypes_RNA(sc_obj = sc_obj, reference = reference, output_dir = SPEEDI_variables$RNA_output_dir,
                           log_flag = TRUE)
# Table of final cell type predictions
knitr::kable(table(sc_obj$predicted_celltype_majority_vote), col.names = c("Cell Type", "Cell Count"))
Cell Type Cell Count
B memory 2789
B naive 3941
CD14 Mono 15975
CD16 Mono 2424
CD4 Naive 2163
CD4 TCM 28319
CD8 Naive 3262
CD8 TEM 10655
cDC2 678
HSPC 81
MAIT 6531
NK 7426
NK_CD56bright 599
pDC 518
Platelet 1044
Treg 2465

The MapCellTypes_RNA() function prints UMAPs of our integrated sample data broken down by majority vote cell type, raw predicted cell type, cluster, and sample. Below, we print the sample UMAP and the majority vote cell type UMAP.

UMAP of sample data after batch correction and integration.

Figure 3: UMAP of sample data after batch correction and integration

Note that the samples are much better integrated versus our previous UMAP.

UMAP of cell types determined by majority vote.

Figure 4: UMAP of cell types determined by majority vote

Importantly, if the cells in a given cluster have an even mix of assigned cell types, the majority vote algorithm may not be able to determine an overall cell type for the cluster. We mark these clusters as “Undetermined”. To better understand these Undetermined clusters, we recommend that users visualize the expression of canonical markers for different cell types (e.g., by using the Seurat function FeaturePlot()).

In addition to the UMAPs described above, SPEEDI also prints a heatmap showing cell type proportions for each sample:

Heatmap of cell type proportions.

Figure 5: Heatmap of cell type proportions

4.11 Step 9: Downstream Analyses

Now that we have a uniformly cell-type annotated and integrated dataset, we will use SPEEDI to perform two downstream analyses: differential expression analysis and functional module discovery. We use the run_downstream_analyses_RNA() function to run both analyses.

de_results <- run_downstream_analyses_RNA(sc_obj = sc_obj, reference_tissue = SPEEDI_variables$reference_tissue, 
                                      species = SPEEDI_variables$species, metadata_df = SPEEDI_variables$metadata_df, 
                                      output_dir = SPEEDI_variables$RNA_output_dir, log_flag = TRUE)

In our metadata data frame, we provided one metadata category (“disease”) with each sample being assigned a value of “COVID” or “healthy”. The run_downstream_analyses_RNA() wrapper function finds differentially expressed genes (DEGs) for each cell type (COVID vs. healthy) using SPEEDI’s RunDE_RNA() function. SPEEDI first uses the Seurat function FindMarkers() to find DEGs on a single cell sample level (adjusted p-value threshold of 0.05, log fold change threshold of 0.1, and min.pct threshold of 0.1) and then uses DESeq2 to further curate this list using pseudobulk level filtering (p-value threshold of 0.05 and log fold change threshold of 0.3).

The final output is a table containing DEGs for each cell type:

DEGs_table <- utils::read.table(paste0(SPEEDI_variables$RNA_output_dir, "disease.DE.tsv"), header = TRUE, sep = "\t")
print(nrow(DEGs_table))
## [1] 2516
subset_DEGs_table <- rbind(DEGs_table[100:110,], DEGs_table[400:410,])
subset_DEGs_table$sc_pval_adj <- format(subset_DEGs_table$sc_pval_adj, digits = 5)
knitr::kable(subset_DEGs_table, "simple")
Cell_Type Gene_Name sc_pval_adj sc_log2FC pseudo_bulk_pval pseudo_bulk_log2FC
100 CD4 TCM NR4A2 5.6160e-210 -1.5251357 0.0007697 -1.8935872
101 CD4 TCM SUN2 6.9428e-185 0.7928248 0.0128745 0.8659437
102 CD4 TCM MT-ND1 2.5417e-167 -0.2726354 0.0060409 -0.3768555
103 CD4 TCM DNAJB1 1.4335e-160 -0.5047394 0.0035380 -0.6876837
104 CD4 TCM KLF6 5.8023e-155 0.7111543 0.0002115 0.6499916
105 CD4 TCM LINC00861 6.7041e-152 0.6511951 0.0000535 0.5334065
106 CD4 TCM C1orf63 4.0507e-127 0.5987083 0.0134283 0.4393595
107 CD4 TCM ZFP36 1.6867e-125 -0.6325388 0.0053459 -0.8987507
108 CD4 TCM ZRANB2 2.4364e-118 0.6791998 0.0000096 0.6048188
109 CD4 TCM FKBP8 1.4353e-110 0.5802708 0.0203568 0.6735848
110 CD4 TCM RN7SL1 2.0350e-110 1.7416472 0.0181545 1.6971147
400 CD8 Naive RASSF3 2.4735e-04 1.0512495 0.0406761 0.7675221
401 CD8 Naive C11orf48 2.7214e-04 0.8435597 0.0124949 0.7051809
402 CD8 Naive CABIN1 5.2228e-04 0.6316841 0.0432814 0.3746976
403 CD8 Naive NHLRC3 5.7896e-04 0.8454088 0.0336770 0.5575887
404 CD8 Naive LINC00936 6.1817e-04 0.8217519 0.0414509 0.5629821
405 CD8 Naive RBM3 6.3853e-04 -0.2762142 0.0121591 -0.3742365
406 CD8 Naive CCR7 9.2105e-04 -0.2937378 0.0053840 -0.3944238
407 CD8 Naive METTL12 9.2811e-04 -1.0073312 0.0024619 -1.1553749
408 CD8 Naive ZNF331 1.2551e-03 -0.8437378 0.0003378 -1.1235462
409 CD8 Naive TRIM73 1.3633e-03 0.9726430 0.0390137 0.6770856
410 CD8 Naive WDR37 1.8856e-03 0.8626598 0.0456461 0.5463623

In total, there were 2516 DEGs found. We show a subset of the DEGs found for CD4 TCM and CD8 Naive cells.

Next, because our samples are human, the run_downstream_analyses_RNA() wrapper function will also run functional module discovery (FMD) using SPEEDI’s run_fmd_wrapper() function. As input, SPEEDI uses the cell-type specific DEGs found above. First, each list of cell-type DEGs is divided into separate lists of positive and negative fold change. For each separate list, if at least 20 genes are found in HumanBase, an FMD job is launched. Results are provided in a TSV file and include a HumanBase URL linking to the user’s full results as well as a complete list of associated gene ontology (GO) enrichment terms.

Below, we show example output for CD16 monocytes that were upregulated in our COVID patients:

FMD_file <- paste0(SPEEDI_variables$RNA_output_dir, "FMD_high_CD16_Mono_blood_disease.tsv")
FMD_URL <- readLines(FMD_file, n = 1)
FMD_table <- utils::read.table(FMD_file, header = TRUE, sep = "\t")
print(FMD_URL)
## [1] "# https://hb.flatironinstitute.org/module/overview/?body_tag=c89a1690becc46f63dfeb2185b97feb26eb6ba73"
knitr::kable(head(FMD_table, 20), "simple")
cluster term name p_value q_value
0 GO:0017144 drug metabolic process 0.0387789 0.0634331
0 GO:0042110 T cell activation 0.0562688 0.0794194
0 GO:0001775 cell activation 0.0451936 0.0689062
0 GO:0044282 small molecule catabolic process 0.0610010 0.0836018
0 GO:0045321 leukocyte activation 0.0318460 0.0576678
0 GO:1901136 carbohydrate derivative catabolic process 0.0182024 0.0424151
0 GO:0009164 nucleoside catabolic process 0.0013437 0.0195233
0 GO:0009116 nucleoside metabolic process 0.0076972 0.0322241
0 GO:1901657 glycosyl compound metabolic process 0.0129072 0.0403553
0 GO:0009119 ribonucleoside metabolic process 0.0052480 0.0275802
0 GO:0034656 nucleobase-containing small molecule catabolic process 0.0013437 0.0195233
0 GO:1901658 glycosyl compound catabolic process 0.0021896 0.0208010
0 GO:0022407 regulation of cell-cell adhesion 0.0763038 0.0947087
0 GO:0042454 ribonucleoside catabolic process 0.0010601 0.0195233
0 GO:0031334 positive regulation of protein complex assembly 0.0582871 0.0818006
0 GO:0032091 negative regulation of protein binding 0.0186749 0.0427103
0 GO:0002253 activation of immune response 0.0021053 0.0208010
0 GO:0002429 immune response-activating cell surface receptor signaling pathway 0.0177345 0.0424151
0 GO:0050778 positive regulation of immune response 0.0008411 0.0195233
0 GO:0002757 immune response-activating signal transduction 0.0009157 0.0195233

If we go to the URL listed at the top of the table, we will see a network visualization of our results:

HumanBase network visualization for CD16 monocytes upregulated in COVID patients (background network of blood).

Figure 6: HumanBase network visualization for CD16 monocytes upregulated in COVID patients (background network of blood)

Note that HumanBase has multiple background networks for FMD. In our case, because our reference tissue was PBMC_full, SPEEDI automatically analyzed each list of DEGs using the global network as well as the blood network.

4.12 Reminder: Use the SPEEDI Wrapper!

Above, we went through the SPEEDI pipeline step-by-step. However, the easiest way to run SPEEDI is with the run_SPEEDI() wrapper function. This function will complete all of the above steps automatically and will even save your Seurat object in RNA_output_dir. For our example, we would simply write:

sc_obj <- run_SPEEDI(reference_tissue = reference_tissue, data_type = data_type, species = species, 
                     input_dir = input_dir, reference_dir = reference_dir, 
                     output_dir = output_dir, metadata_df = metadata_df)

5 Other Types of Data Analyses

In addition to scRNA-seq analyses, SPEEDI can also run scATAC-seq, paired scRNA-seq / scATAC-seq, and same-cell multiome snRNA-seq / snRNA-seq analyses. To learn more, read the documentation for run_SPEEDI()!

Session info

## R version 4.2.3 (2023-03-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Springdale Open Enterprise Linux 8.8 (Modena)
## 
## Matrix products: default
## BLAS:   /usr/local/R/4.2.3/lib64/R/lib/libRblas.so
## LAPACK: /usr/local/R/4.2.3/lib64/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    grid      stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] bluster_1.8.0               Seurat_5.0.0               
##  [3] SeuratObject_5.0.0          sp_2.1-1                   
##  [5] SPEEDI_1.2.0.0000           rhdf5_2.42.1               
##  [7] SummarizedExperiment_1.28.0 Biobase_2.58.0             
##  [9] MatrixGenerics_1.10.0       Rcpp_1.0.11                
## [11] Matrix_1.6-1.1              GenomicRanges_1.50.2       
## [13] GenomeInfoDb_1.34.9         IRanges_2.32.0             
## [15] S4Vectors_0.36.2            BiocGenerics_0.44.0        
## [17] matrixStats_1.1.0           data.table_1.14.8          
## [19] stringr_1.5.0               plyr_1.8.9                 
## [21] magrittr_2.0.3              ggplot2_3.4.4              
## [23] gtable_0.3.4                gtools_3.9.4               
## [25] gridExtra_2.3               ArchR_1.0.2                
## [27] BiocStyle_2.26.0            knitr_1.45                 
## [29] rmarkdown_2.25             
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.4                spatstat.explore_3.2-5   
##   [3] reticulate_1.34.0         tidyselect_1.2.0         
##   [5] RSQLite_2.3.3             AnnotationDbi_1.60.2     
##   [7] htmlwidgets_1.6.2         BiocParallel_1.32.6      
##   [9] Rtsne_0.16                munsell_0.5.0            
##  [11] codetools_0.2-19          ragg_1.2.6               
##  [13] ica_1.0-3                 future_1.33.0            
##  [15] miniUI_0.1.1.1            withr_2.5.2              
##  [17] spatstat.random_3.2-1     colorspace_2.1-0         
##  [19] progressr_0.14.0          highr_0.10               
##  [21] ROCR_1.0-11               tensor_1.5               
##  [23] listenv_0.9.0             labeling_0.4.3           
##  [25] GenomeInfoDbData_1.2.9    polyclip_1.10-6          
##  [27] pheatmap_1.0.12           bit64_4.0.5              
##  [29] farver_2.1.1              rprojroot_2.0.4          
##  [31] parallelly_1.36.0         vctrs_0.6.4              
##  [33] generics_0.1.3            xfun_0.41                
##  [35] R6_2.5.1                  doParallel_1.0.17        
##  [37] ggbeeswarm_0.7.2          locfit_1.5-9.8           
##  [39] hdf5r_1.3.8               bitops_1.0-7             
##  [41] rhdf5filters_1.10.1       spatstat.utils_3.0-4     
##  [43] cachem_1.0.8              DelayedArray_0.24.0      
##  [45] promises_1.2.1            scales_1.2.1             
##  [47] beeswarm_0.4.0            Cairo_1.6-1              
##  [49] globals_0.16.2            goftest_1.2-3            
##  [51] spam_2.10-0               rlang_1.1.2              
##  [53] systemfonts_1.0.5         splines_4.2.3            
##  [55] lazyeval_0.2.2            spatstat.geom_3.2-7      
##  [57] BiocManager_1.30.22       yaml_2.3.7               
##  [59] reshape2_1.4.4            abind_1.4-5              
##  [61] httpuv_1.6.12             SeuratDisk_0.0.0.9021    
##  [63] tools_4.2.3               bookdown_0.36            
##  [65] ellipsis_0.3.2            jquerylib_0.1.4          
##  [67] RColorBrewer_1.1-3        ggridges_0.5.4           
##  [69] sparseMatrixStats_1.10.0  zlibbioc_1.44.0          
##  [71] purrr_1.0.2               RCurl_1.98-1.13          
##  [73] deldir_1.0-9              pbapply_1.7-2            
##  [75] cowplot_1.1.1             lisi_1.0                 
##  [77] zoo_1.8-12                ggrepel_0.9.4            
##  [79] cluster_2.1.4             here_1.0.1               
##  [81] RSpectra_0.16-1           glmGamPoi_1.10.2         
##  [83] scattermore_1.2           lmtest_0.9-40            
##  [85] RANN_2.6.1                fitdistrplus_1.1-11      
##  [87] patchwork_1.1.3           mime_0.12                
##  [89] evaluate_0.23             xtable_1.8-4             
##  [91] XML_3.99-0.15             fastDummies_1.7.3        
##  [93] compiler_4.2.3            tibble_3.2.1             
##  [95] KernSmooth_2.23-22        crayon_1.5.2             
##  [97] htmltools_0.5.7           later_1.3.1              
##  [99] geneplotter_1.76.0        tidyr_1.3.0              
## [101] logr_1.3.5                DBI_1.1.3                
## [103] rappdirs_0.3.3            MASS_7.3-60              
## [105] cli_3.6.1                 parallel_4.2.3           
## [107] dotCall64_1.1-0           igraph_1.5.1             
## [109] pkgconfig_2.0.3           plotly_4.10.3            
## [111] spatstat.sparse_3.0-3     foreach_1.5.2            
## [113] annotate_1.76.0           vipor_0.4.5              
## [115] bslib_0.5.1               XVector_0.38.0           
## [117] digest_0.6.33             sctransform_0.4.1        
## [119] RcppAnnoy_0.0.21          common_1.1.0             
## [121] Biostrings_2.66.0         spatstat.data_3.0-3      
## [123] leiden_0.4.3              uwot_0.1.16              
## [125] DelayedMatrixStats_1.20.0 curl_5.1.0               
## [127] shiny_1.7.5.1             outliers_0.15            
## [129] lifecycle_1.0.4           nlme_3.1-163             
## [131] jsonlite_1.8.7            Rhdf5lib_1.20.0          
## [133] BiocNeighbors_1.16.0      limma_3.54.2             
## [135] viridisLite_0.4.2         fansi_1.0.5              
## [137] pillar_1.9.0              lattice_0.22-5           
## [139] KEGGREST_1.38.0           ggrastr_1.0.2            
## [141] fastmap_1.1.1             httr_1.4.7               
## [143] survival_3.5-7            glue_1.6.2               
## [145] png_0.1-8                 iterators_1.0.14         
## [147] bit_4.0.5                 presto_1.0.0             
## [149] stringi_1.7.12            sass_0.4.7               
## [151] blob_1.2.4                textshaping_0.3.7        
## [153] RcppHNSW_0.5.0            DESeq2_1.38.3            
## [155] memoise_2.0.1             dplyr_1.1.3              
## [157] irlba_2.3.5.1             future.apply_1.11.0