Upload query data

Online analysis only accepts a .csv or .h5ad file, which contains an expression matrix with cells as rows and gene symbols as columns (or the opposite). For .csv files, a raw count matrix is expected in order to reduce the file size and online upload burden. For .h5ad files, a log-normalised expression matrix (to 10,000 counts per cell) is expected (raw-count adata processed by scanpy.pp.normalize_total(target_sum=1e4) and scanpy.pp.log1p).

Choose a model

Detailed model information can be found here. For immune cell types, we recommend users to start from the default model (Immune_All_Low.pkl).

Enable/disable majority voting refinement

Majority voting refines the prediction result in a local cell cluster by choosing the dominant cell type label but may increase the runtime especially for a large dataset due to the over-clustering step. This approach usually improves the cell annotation, as voting is conducted in small subclusters derived from over-clustering (cells belonging to a given cell type will be assigned the same label regardless of potential batch effects separating them).

Interpret the result

The prediction result consisting of three tables will be sent to the email provided by the user.

  • predicted_labels.csv contains the main result of predicted labels, cell over-clustering, and (if majority voting is enabled) predicted labels after the majority voting approach.
  • decision_matrix.csv contains the matrix representing the decision scores for each cell across cell types, which is used to determine the ultimate predicted cell type of each cell.
  • probability_matrix.csv contains the matrix representing the probability each cell belongs to a given cell type (transformed from decision matrix by the sigmoid function).