Census summary cell counts example¶

Goal: demonstrate basic use of the census_summary_cell_counts dataframe.

Each Census contains a top-level dataframe summarizing counts of various cell labels. You can read this into a Pandas DataFrame:

[1]:

import cellxgene_census

census = cellxgene_census.open_soma()
census_summary_cell_counts = census["census_info"]["summary_cell_counts"].read().concat().to_pandas()

# Dropping the soma_joinid column as it isn't useful in this demo
census_summary_cell_counts = census_summary_cell_counts.drop(columns=["soma_joinid"])

census_summary_cell_counts

[1]:

	organism	category	ontology_term_id	unique_cell_count	total_cell_count	label
0	Homo sapiens	all	na	29461044	45501425	na
1	Homo sapiens	assay	EFO:0008722	206279	260396	Drop-seq
2	Homo sapiens	assay	EFO:0008780	25652	51304	inDrop
3	Homo sapiens	assay	EFO:0008913	133511	133511	single-cell RNA sequencing
4	Homo sapiens	assay	EFO:0008919	44721	161998	Seq-Well
...	...	...	...	...	...	...
1259	Mus musculus	tissue_general	UBERON:0002113	164881	188361	kidney
1260	Mus musculus	tissue_general	UBERON:0002365	15577	31154	exocrine gland
1261	Mus musculus	tissue_general	UBERON:0002367	37715	130135	prostate gland
1262	Mus musculus	tissue_general	UBERON:0002368	13322	26644	endocrine gland
1263	Mus musculus	tissue_general	UBERON:0002371	90225	144962	bone marrow

1264 rows × 6 columns

This dataframe is precomputed from the experiments in the Census, and is intended to simplify quick looks at the Census contents.

You can do similar group statistics using Pandas groupby functions.

The code below reproduces the above counts using full obs dataframe in the Homo_sapiens experiment.

Keep in mind that the Census is very large, and any queries will return significant amount of data. You can manage that by narrowing the query request using column_names and value_filter in your query.

[2]:

human = census["census_data"]["homo_sapiens"]
obs_df = human.obs.read(column_names=["cell_type_ontology_term_id", "cell_type"]).concat().to_pandas()
obs_df.groupby(by=["cell_type_ontology_term_id", "cell_type"], as_index=False, observed=True).size()

[2]:

	cell_type_ontology_term_id	cell_type	size
0	CL:0000001	primary cultured cell	80
1	CL:0000003	native cell	778526
2	CL:0000006	neuronal receptor cell	2502
3	CL:0000019	sperm	11
4	CL:0000031	neuroblast (sensu Vertebrata)	2355
...	...	...	...
583	CL:4023070	caudal ganglionic eminence derived GABAergic c...	8463
584	CL:4028002	alveolar capillary type 1 endothelial cell	16048
585	CL:4028003	alveolar capillary type 2 endothelial cell	7157
586	CL:4028006	alveolar type 2 fibroblast cell	4670
587	CL:4030006	fallopian tube secretory epithelial cell	463461

588 rows × 3 columns

Close the census when complete.

[3]:

census.close()