Annotating cells in the pbmc3K dataset (without clustering)
In this blog post, we use CellKb to assign cell types to individual cells in the dataset of Peripheral Blood Mononuclear Cells (PBMC) freely available from 10X Genomics. There are 2,700 single cells that were sequenced on the Illumina NextSeq 500. The raw data can be found here. This is the same dataset analyzed in the Scanpy and Seurat clustering tutorials.
Annotating cells individually is more likely to identify rare cell types. Annotating cells individually will also give a better result when the clusters identified in your dataset are not cleanly separated.
Upload the dataset to CellKb
We upload the dataset to CellKb. CellKb uses Scanpy for reading/pre-processing the dataset. Support for annotation of cells in Seurat objects will be added in the future.
Responsive image
Analyze the dataset
We can now submit the dataset for single-cell annotation, choosing "Blood" in the filter criteria.
Responsive image
Download results
The analysis runs in the background, and you can check the status on the "Results" screen.
Responsive image
After the analysis is complete, you can download the results. The results contain TSV file and PDF files with one plot for each cell type identified, and a summary plot with the broad and detailed cell types identified. We show the summary plot for the pbmc3K dataset here.
Responsive image
Analyzing the pbmc3k dataset using SingleR
We analyze the same pbmc3k dataset using SingleR with the HumanPrimaryCellAtlasData provided by the celldex package as the reference dataset. Our script is as follows.

  library(dplyr)
  library(Seurat) # Seurat version 5.0.1
  library(celldex) # version version 1.12.0
  library(SingleR) # SingleR version 2.4.1
  
  pbmc.data <- Read10X(data.dir="pbmc3k_filtered_gene_bc_matrices\\filtered_gene_bc_matrices\\hg19", gene.column=1)
  pbmc <- CreateSeuratObject(counts=pbmc.data, project="pbmc3k", min.cells=3, min.features=200)
  pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern="^MT-")
  pbmc <- NormalizeData(pbmc)
  pbmc <- FindVariableFeatures(pbmc, selection.method="vst", nfeatures=2000)
  all.genes <- rownames(pbmc)
  pbmc <- ScaleData(pbmc, features=all.genes)
  pbmc <- RunPCA(pbmc, features=VariableFeatures(object=pbmc))
  pbmc <- FindNeighbors(pbmc, dims=1:10)
  pbmc <- FindClusters(pbmc, resolution=0.5)
  pbmc <- RunUMAP(pbmc, dims=1:10)
  print(DimPlot(pbmc, reduction="umap"))
  
  ref.data <- celldex::HumanPrimaryCellAtlasData(ensembl=TRUE)
  results_main <- SingleR(test=as.SingleCellExperiment(pbmc), ref=ref.data, labels=ref.data$label.main)
  results_fine <- SingleR(test=as.SingleCellExperiment(pbmc), ref=ref.data, labels=ref.data$label.fine)
  
  pbmc$celltype_broad <- results_main$labels
  pbmc$celltype_detailed <- results_fine$labels
  print(DimPlot(pbmc, reduction="umap", group.by='celltype_broad', repel=T, label=TRUE))
  print(DimPlot(pbmc, reduction="umap", group.by='celltype_detailed', repel=T, label=TRUE))
  
  results_fine$labels_main = results_main$labels
  write.table(results_fine, file="singler_results_without_clustering.txt", sep="\t", row.names=TRUE)
  write.table(table(results_fine$labels), file="singler_results_without_clustering_summary.txt", sep="\t", row.names=TRUE)
We show the summary plot with the broad and detailed cell types identified by SingleR for the pbmc3K dataset here.
Responsive image
Comparing the results (CellKb vs SingleR)
You will notice that while the broad cell types are similar, CellKb has identified more detailed cell types than SingleR.
Category Name of the cell type in SingleRCell count Name of the cell type in CellKbCell count
T cells (CD4+)T_cell:CD4+_central_memory610central memory CD4 positive, alpha-beta T cell23
T cells (CD4+)T_cell:CD4+28CD4-positive, alpha-beta T cell408
T cells (CD4+)T_cell:CD4+_effector_memory347effector memory CD4-positive, alpha-beta T cell156
T cells (CD4+)T_cell:CD4+_naive164naive thymus-derived CD4-positive, alpha-beta T cell23
T cells (CD4+)--CD4-positive, alpha-beta memory T cell119
T cells (CD4+)--CD4-positive helper T cell23
T cells (CD4+)--CD4-positive, alpha-beta cytotoxic T cell11
T cells (CD4+)--CD4-positive, CD25-positive, alpha-beta regulatory T cell4
T cells (CD4+)--effector CD4-positive, alpha-beta T cell2
T cells (CD4+)--T follicular helper cell2
T cells (CD4+)--T-helper 1 cell2
T cells (CD4+)--T-helper 17 cell2
T cells (CD8+)T_cell:CD8+240CD8-positive, alpha-beta T cell94
T cells (CD8+)T_cell:CD8+_effector_memory48effector memory CD8-positive, alpha-beta T cell62
T cells (CD8+)T_cell:CD8+_naive9naive thymus-derived CD8-positive, alpha-beta T cell90
T cells (CD8+)T_cell:CD8+_effector_memory_RA5CD8-positive, alpha-beta memory T cell55
T cells (CD8+)T_cell:CD8+_central_memory4central memory CD8-positive, alpha-beta T cell30
T cells (CD8+)--CD8-positive, alpha-beta cytotoxic T cell4
T cells (CD8+)--effector CD8-positive, alpha-beta T cell4
T cells (CD8+)--activated CD8-positive, alpha-beta T cell12
T cells (CD8+)--effector memory CD8-positive, alpha-beta T cell, terminally differentiated2
T cells (other)T_cell:gamma-delta5gamma-delta T cell14
T cells (other)--T cell102
T cells (other)--mucosal invariant T cell82
T cells (other)--regulatory T cell51
T cells (other)--naive T cell50
T cells (other)--exhausted T cell2
T cells (other)--mature NK T cell20
T cells (other)--immature NK T cell1
T cells (other)--memory T cell1
NK cellsNK_cell141natural killer cell162
NK cellsNK_cell:CD56hiCD62L+31CD16-negative, CD56-bright natural killer cell, human10
NK cellsNK_cell:IL28mature NK T cell20
NK cells--immature NK T cell1
MonocytesMonocyte:CD16-411classical monocyte117
MonocytesMonocyte:CD16+259non-classical monocyte149
MonocytesMonocyte4monocyte320
MonocytesMonocyte:leukotriene_D41CD14-positive monocyte28
Monocytes--CD14-low, CD16-positive monocyte25
Monocytes--CD14-positive, CD16-positive monocyte4
Monocytes--intermediate monocyte3
B cellsB_cell:immature216-
B cellsB_cell32B cell268
B cellsB_cell:Naive82naive B cell39
B cellsB_cell:Memory19memory B cell36
B cellsB_cell:Plasma_cell4plasma cell3
B cells--class switched memory B cell1
Dendritic cells--conventional dendritic cell21
Dendritic cells--dendritic cell19
Dendritic cells--plasmacytoid dendritic cell3
PlateletsPlatelets9platelet21
Platelets--megakaryocyte1
OtherPre-B_cell_CD34-17double negative thymocyte3
OtherCMP4erythrocyte1
OtherMEP1group 1 innate lymphoid cell1
OtherNeutrophil:commensal_E._coli_MG16551lymphocyte1
Other--progenitor cell1
We also notice that dendritic cells were incorrectly annotated by SingleR/celldex.
Responsive image
Conclusion
CellKb uses a large set of publications as a reference, and is able to identify cell types to a more detailed resolution. CellKb uses standardized ontology terms in the results, making it easier to integrate in your pipelines. In this blog post, we have shown how you can annotate individual cells using CellKb.