Census summary cell counts example

Goal: demonstrate basic use of the census_summary_cell_counts dataframe.

Each Census contains a top-level dataframe summarizing counts of various cell labels. You can read this into a Pandas DataFrame:

[1]:
import cellxgene_census

census = cellxgene_census.open_soma()
census_summary_cell_counts = census["census_info"]["summary_cell_counts"].read().concat().to_pandas()

# Dropping the soma_joinid column as it isn't useful in this demo
census_summary_cell_counts = census_summary_cell_counts.drop(columns=["soma_joinid"])

census_summary_cell_counts
[1]:
organism category ontology_term_id unique_cell_count total_cell_count label
0 Homo sapiens all na 29461044 45501425 na
1 Homo sapiens assay EFO:0008722 206279 260396 Drop-seq
2 Homo sapiens assay EFO:0008780 25652 51304 inDrop
3 Homo sapiens assay EFO:0008913 133511 133511 single-cell RNA sequencing
4 Homo sapiens assay EFO:0008919 44721 161998 Seq-Well
... ... ... ... ... ... ...
1259 Mus musculus tissue_general UBERON:0002113 164881 188361 kidney
1260 Mus musculus tissue_general UBERON:0002365 15577 31154 exocrine gland
1261 Mus musculus tissue_general UBERON:0002367 37715 130135 prostate gland
1262 Mus musculus tissue_general UBERON:0002368 13322 26644 endocrine gland
1263 Mus musculus tissue_general UBERON:0002371 90225 144962 bone marrow

1264 rows × 6 columns

This dataframe is precomputed from the experiments in the Census, and is intended to simplify quick looks at the Census contents.

You can do similar group statistics using Pandas groupby functions.

The code below reproduces the above counts using full obs dataframe in the Homo_sapiens experiment.

Keep in mind that the Census is very large, and any queries will return significant amount of data. You can manage that by narrowing the query request using column_names and value_filter in your query.

[2]:
human = census["census_data"]["homo_sapiens"]
obs_df = human.obs.read(column_names=["cell_type_ontology_term_id", "cell_type"]).concat().to_pandas()
obs_df.groupby(by=["cell_type_ontology_term_id", "cell_type"], as_index=False, observed=True).size()
[2]:
cell_type_ontology_term_id cell_type size
0 CL:0000001 primary cultured cell 80
1 CL:0000003 native cell 778526
2 CL:0000006 neuronal receptor cell 2502
3 CL:0000019 sperm 11
4 CL:0000031 neuroblast (sensu Vertebrata) 2355
... ... ... ...
583 CL:4023070 caudal ganglionic eminence derived GABAergic c... 8463
584 CL:4028002 alveolar capillary type 1 endothelial cell 16048
585 CL:4028003 alveolar capillary type 2 endothelial cell 7157
586 CL:4028006 alveolar type 2 fibroblast cell 4670
587 CL:4030006 fallopian tube secretory epithelial cell 463461

588 rows × 3 columns

Close the census when complete.

[3]:
census.close()