Exploring pre-calculated summary cell counts

This tutorial describes how to access pre-calculated summary cell counts. Each Census contains a top-level dataframe summarizing counts of various cell labels, this is the census_summary_cell_counts dataframe . You can read this into a Pandas DataFrame

Contents

  1. Fetching the census_summary_cell_counts dataframe.

  2. Creating summary counts beyond pre-calculated values.

Fetching the census_summary_cell_counts dataframe

[1]:
import cellxgene_census

census = cellxgene_census.open_soma()
census_summary_cell_counts = census["census_info"]["summary_cell_counts"].read().concat().to_pandas()

# Dropping the soma_joinid column as it isn't useful in this demo
census_summary_cell_counts = census_summary_cell_counts.drop(columns=["soma_joinid"])

census_summary_cell_counts
[1]:
organism category ontology_term_id unique_cell_count total_cell_count label
0 Homo sapiens all na 29817108 46050829 na
1 Homo sapiens assay EFO:0008722 206279 260396 Drop-seq
2 Homo sapiens assay EFO:0008780 25652 51304 inDrop
3 Homo sapiens assay EFO:0008913 133511 133511 single-cell RNA sequencing
4 Homo sapiens assay EFO:0008919 44721 161998 Seq-Well
... ... ... ... ... ... ...
1263 Mus musculus tissue_general UBERON:0002113 164881 188361 kidney
1264 Mus musculus tissue_general UBERON:0002365 15577 31154 exocrine gland
1265 Mus musculus tissue_general UBERON:0002367 37715 130135 prostate gland
1266 Mus musculus tissue_general UBERON:0002368 13322 26644 endocrine gland
1267 Mus musculus tissue_general UBERON:0002371 90225 144962 bone marrow

1268 rows × 6 columns

Creating summary counts beyond pre-calculated values.

The dataframe above is precomputed from the experiments in the Census, providing a quick overview of the Census contents.

You can do similar group statistics using Pandas groupby functions.

The code below reproduces the above counts using full obs dataframe in the Homo_sapiens experiment.

Keep in mind that the Census is very large, and any queries will return significant amount of data. You can manage that by narrowing the query request using column_names and value_filter in your query.

[2]:
human = census["census_data"]["homo_sapiens"]
obs_df = human.obs.read(column_names=["cell_type_ontology_term_id", "cell_type"]).concat().to_pandas()
obs_df.groupby(by=["cell_type_ontology_term_id", "cell_type"], as_index=False, observed=True).size()
[2]:
cell_type_ontology_term_id cell_type size
0 CL:0000001 primary cultured cell 80
1 CL:0000003 native cell 778811
2 CL:0000006 neuronal receptor cell 2502
3 CL:0000019 sperm 11
4 CL:0000031 neuroblast (sensu Vertebrata) 2355
... ... ... ...
585 CL:4023070 caudal ganglionic eminence derived GABAergic c... 8463
586 CL:4028002 alveolar capillary type 1 endothelial cell 16048
587 CL:4028003 alveolar capillary type 2 endothelial cell 7157
588 CL:4028006 alveolar type 2 fibroblast cell 4670
589 CL:4030006 fallopian tube secretory epithelial cell 463461

590 rows × 3 columns

Close the census when complete.

[3]:
census.close()