Automated cell type annotation for scRNA-seq datasets

How are the models trained?

All models are built on the logistic regression framework. Traditional logistic regression will be used in most cases. SGD learning can be optionally implemented depending on the size of the training dataset. For example, when the training dataset contains a huge number of cells, the data can be modelled with SGD logistic regression using mini-batch training. Briefly, in each epoch cells are shuffled and binned into equal-sized mini-batches (1,000 cells per batch), and later are sequentially trained by 100 such batches randomly sampled out of all batches. This process is repeated for 10~30 epochs. In addition to the models listed below, users can train their own custom models.

Which model should I select?

Models are usually selected based on the context of the query data. For example, for immune cell types it is recommended to start from the "Immune_All_Low/High" models as they contain immune cell types collected from different tissues. The "Low" indicates low-hierarchy (high-resolution) cell types and subtypes, and the "High" indicates high-hierarchy (low-resolution) ones.

Model list

Models can be downloaded in a pickle format to be opened and examined in Python.

Name	Details	No. cell types	Publish date	Version	Source	Download

BACK

Models

How are the models trained?

Which model should I select?

Model list

About Teichmann Lab in the Cambridge Stem Cell Institute

About CellTypist

Citation

Important Links