About CellKb
Single-cell RNA-seq has become a popular method used to study transcriptional patterns of genes in individual cells. In spite of current advances in technology and computational methods, assigning cell types in single-cell datasets remains a bottleneck. Due to the lack of a comprehensive reference database, researchers often spend considerable time and effort searching through scientific literature to find marker genes associated with cell types.
CellKb is a knowledgebase of author-defined cell type marker gene sets that can be rapidly searched for matching cell types. Marker gene sets are collected directly from publications describing mainly single-cell, and selected bulk RNA-seq or microarray experiments. It contains extensive cell type, tissue and disease annotations for each cell type as described in the publication.

Data Collection
The cell type markers in CellKb are taken primarily from single-cell RNA-seq experiments. This is done by extracting the cell-type specific marker genes defined by authors based on their significant change in expression within a group of similar cells.
The marker gene sets are extracted from tables, figures or supplementary materials of publications describing these experiments. Marker genes are also extracted from select bulk RNA-seq and microarray experiments from public databases. These include gene signatures from Human Protein Altas, SingleR and MSig-db.
All publications related to single-cell experiments are taken from PubMed and manually screened to select those identifying cell type specific gene expression patterns.
22,568 Marker gene sets
Selection
403 Publications
Marker gene sets are selected from published experimental studies using the following criteria:
  • Availability of data for download
  • Type of experimental method
  • Number of cells studied
  • Computational methods used to normalize, filter and cluster cell types, along with identification of cluster-specific genes
  • Availability of associated values (eg. average expression, fold change, statistical significance)
For version 2.0 of CellKb, 403 high quality publications from 11 species were selected after screening over 7000 studies published between 2013 and 2020.
Curation
Each selected publication is read and information about the name, tissue, condition and any special characteristics of each cell type is manually extracted and associated with the corresponding gene marker set.
Genes from the selected marker sets are mapped to valid entries in the latest version of the Ensembl database. Only genes with valid identifiers and associated values are retained. Orthologous genes are identified across all selected species using the Ensembl database.
233 Organs/Tissues
Annotation
1,802 Cell Types
Cell types, tissue names and disease conditions are assigned standardized ontology terms. Associated values given by authors with each signature are also stored in CellKb. These include, but are not limited to, rank, score, average expression, log fold change and corrected/uncorrected p-values.
The cell type specific marker genes are either directly taken as defined by the authors, or they are calculated based on the associated values provided. Finally, all marker genes and their annotations are stored in a standardized format that enables rapid searching of the data.
Reliability scores are assigned to each marker set based on its similarity to other gene sets of the same cell type. Marker gene specificity is calculated to identify genes that are common among marker sets of different cell types, and thus likely to be less specific as markers.

Functionality
Search cell types using genes
Use one or more lists of marker genes to find matching cell type signatures published in literature. Multiple gene lists need to be assigned a cluster or cell type identifier. CellKb uses a variation of the Rank-biased Overlap method to identify the cell type marker gene sets matching the users gene list. Other statistics such as the Fisher's exact test, Pearson's correlation coefficient and Jaccard index are also calculated for each cell type match. Ranks of common genes in the user gene list and matching cell types are provided.
Search cell types with keywords
This functionality helps users find cell types in CellKb based on their annotations, eg. the tissue in which they are found, the experimental conditions, the publication information or disease information. It also allows users to find all cell types having a specific gene in their marker list.
Search cell types by ontology
Users can navigate the entire database contents of CellKb by species, publication, disease, organ/tissue or cell type ontology, tissue or disease ontology. See experimental details and get cell type marker genes.
Search cell types across species
Use a list of marker genes from one species to search matching cell type signatures in another species. Users can give a ranked list of gene markers from one species to find matching cell types in another species. This is done by identifying orthologous gene pairs between two species, where available, in Ensembl. This is particularly useful given the bias of cell type identification studies conducted in one species (eg. mouse) versus another (eg. human).

Licensing Options
We would like to make CellKb available to as many researchers as possible. However, CellKb does not receive any financial support from government or private funding agencies. Therefore, we are trying to strike a balance between open-access and long-term sustenance of CellKb through various licensing options. A large portion of the data in CellKb is free for academic users to search and browse through a trial license. Commercial users can also signup for the trial license to evaluate CellKb for a limited period.

Commercial users and organizations will need to buy a paid subscription to CellKb to access and download all data. Academic users and organizations will also need to pay a license fee if they wish to access all search results and download data from CellKb. We also provide paid service options to customize CellKb or integrate it with other databases. These licensing and service fees help support the development and maintenance of CellKb without external funding.
If you would like to partner with us or know more about how you can use CellKb, please feel free to contact us.

People
CellKb is developed by Ashwini Patil PhD with technical support by Ajay Patil. CellKb is licensed through Combinatics Inc., a Tokyo-based bioinformatics company. Prior to founding Combinatics, Ashwini was a Lecturer at the University of Tokyo.