How are the models trained?

All models are built on the logistic regression framework. Traditional logistic regression will be used in most cases. SGD learning can be optionally implemented depending on the size of the training dataset. For example, when the training dataset contains a huge number of cells, the data can be modelled with SGD logistic regression using mini-batch training. Briefly, in each epoch cells are shuffled and binned into equal-sized mini-batches (1,000 cells per batch), and later are sequentially trained by 100 such batches randomly sampled out of all batches. This process is repeated for 10~30 epochs. In addition to the models listed below, users can train their own custom models.

Which model should I select?

Models are usually selected based on the context of the query data. For example, for immune cell types it is recommended to start from the "Immune_All_Low/High" models as they contain immune cell types collected from different tissues. The "Low" indicates low-hierarchy (high-resolution) cell types and subtypes, and the "High" indicates high-hierarchy (low-resolution) ones. You can also try the expanded reference model of immune cell types ("Immune_All_AddPIP").

Model list

Models can be downloaded in a pickle format to be opened and examined in Python.

Name Details No. cell types Publish date Version Source Download
BACK