Exploring pre-calculated summary cell counts¶

This tutorial describes how to access pre-calculated summary cell counts. Each Census contains a top-level dataframe summarizing counts of various cell labels, this is the census_summary_cell_counts dataframe . You can read this into a Pandas DataFrame

Contents

Fetching the census_summary_cell_counts dataframe.
Creating summary counts beyond pre-calculated values.

Fetching the `census_summary_cell_counts` dataframe¶

[1]:

import cellxgene_census

census = cellxgene_census.open_soma()
census_summary_cell_counts = census["census_info"]["summary_cell_counts"].read().concat().to_pandas()

# Dropping the soma_joinid column as it isn't useful in this demo
census_summary_cell_counts = census_summary_cell_counts.drop(columns=["soma_joinid"])

census_summary_cell_counts

[1]:

	organism	category	ontology_term_id	unique_cell_count	total_cell_count	label
0	Homo sapiens	all	na	29817108	46050829	na
1	Homo sapiens	assay	EFO:0008722	206279	260396	Drop-seq
2	Homo sapiens	assay	EFO:0008780	25652	51304	inDrop
3	Homo sapiens	assay	EFO:0008913	133511	133511	single-cell RNA sequencing
4	Homo sapiens	assay	EFO:0008919	44721	161998	Seq-Well
...	...	...	...	...	...	...
1263	Mus musculus	tissue_general	UBERON:0002113	164881	188361	kidney
1264	Mus musculus	tissue_general	UBERON:0002365	15577	31154	exocrine gland
1265	Mus musculus	tissue_general	UBERON:0002367	37715	130135	prostate gland
1266	Mus musculus	tissue_general	UBERON:0002368	13322	26644	endocrine gland
1267	Mus musculus	tissue_general	UBERON:0002371	90225	144962	bone marrow

1268 rows × 6 columns

Creating summary counts beyond pre-calculated values.¶

The dataframe above is precomputed from the experiments in the Census, providing a quick overview of the Census contents.

You can do similar group statistics using Pandas groupby functions.

The code below reproduces the above counts using full obs dataframe in the Homo_sapiens experiment.

Keep in mind that the Census is very large, and any queries will return significant amount of data. You can manage that by narrowing the query request using column_names and value_filter in your query.

[2]:

human = census["census_data"]["homo_sapiens"]
obs_df = human.obs.read(column_names=["cell_type_ontology_term_id", "cell_type"]).concat().to_pandas()
obs_df.groupby(by=["cell_type_ontology_term_id", "cell_type"], as_index=False, observed=True).size()

[2]:

	cell_type_ontology_term_id	cell_type	size
0	CL:0000001	primary cultured cell	80
1	CL:0000003	native cell	778811
2	CL:0000006	neuronal receptor cell	2502
3	CL:0000019	sperm	11
4	CL:0000031	neuroblast (sensu Vertebrata)	2355
...	...	...	...
585	CL:4023070	caudal ganglionic eminence derived GABAergic c...	8463
586	CL:4028002	alveolar capillary type 1 endothelial cell	16048
587	CL:4028003	alveolar capillary type 2 endothelial cell	7157
588	CL:4028006	alveolar type 2 fibroblast cell	4670
589	CL:4030006	fallopian tube secretory epithelial cell	463461

590 rows × 3 columns

Close the census when complete.

[3]:

census.close()

Exploring pre-calculated summary cell counts¶

Fetching the census_summary_cell_counts dataframe¶

Creating summary counts beyond pre-calculated values.¶

Fetching the `census_summary_cell_counts` dataframe¶