Census summary cell counts example¶
Goal: demonstrate basic use of the census_summary_cell_counts dataframe.
Each Census contains a top-level dataframe summarizing counts of various cell labels. You can read this into a Pandas DataFrame:
[1]:
import cellxgene_census
census = cellxgene_census.open_soma()
census_summary_cell_counts = census["census_info"]["summary_cell_counts"].read().concat().to_pandas()
# Dropping the soma_joinid column as it isn't useful in this demo
census_summary_cell_counts = census_summary_cell_counts.drop(columns=["soma_joinid"])
census_summary_cell_counts
[1]:
| organism | category | ontology_term_id | unique_cell_count | total_cell_count | label | |
|---|---|---|---|---|---|---|
| 0 | Homo sapiens | all | na | 29461044 | 45501425 | na |
| 1 | Homo sapiens | assay | EFO:0008722 | 206279 | 260396 | Drop-seq |
| 2 | Homo sapiens | assay | EFO:0008780 | 25652 | 51304 | inDrop |
| 3 | Homo sapiens | assay | EFO:0008913 | 133511 | 133511 | single-cell RNA sequencing |
| 4 | Homo sapiens | assay | EFO:0008919 | 44721 | 161998 | Seq-Well |
| ... | ... | ... | ... | ... | ... | ... |
| 1259 | Mus musculus | tissue_general | UBERON:0002113 | 164881 | 188361 | kidney |
| 1260 | Mus musculus | tissue_general | UBERON:0002365 | 15577 | 31154 | exocrine gland |
| 1261 | Mus musculus | tissue_general | UBERON:0002367 | 37715 | 130135 | prostate gland |
| 1262 | Mus musculus | tissue_general | UBERON:0002368 | 13322 | 26644 | endocrine gland |
| 1263 | Mus musculus | tissue_general | UBERON:0002371 | 90225 | 144962 | bone marrow |
1264 rows × 6 columns
This dataframe is precomputed from the experiments in the Census, and is intended to simplify quick looks at the Census contents.
You can do similar group statistics using Pandas groupby functions.
The code below reproduces the above counts using full obs dataframe in the Homo_sapiens experiment.
Keep in mind that the Census is very large, and any queries will return significant amount of data. You can manage that by narrowing the query request using column_names and value_filter in your query.
[2]:
human = census["census_data"]["homo_sapiens"]
obs_df = human.obs.read(column_names=["cell_type_ontology_term_id", "cell_type"]).concat().to_pandas()
obs_df.groupby(by=["cell_type_ontology_term_id", "cell_type"], as_index=False, observed=True).size()
[2]:
| cell_type_ontology_term_id | cell_type | size | |
|---|---|---|---|
| 0 | CL:0000001 | primary cultured cell | 80 |
| 1 | CL:0000003 | native cell | 778526 |
| 2 | CL:0000006 | neuronal receptor cell | 2502 |
| 3 | CL:0000019 | sperm | 11 |
| 4 | CL:0000031 | neuroblast (sensu Vertebrata) | 2355 |
| ... | ... | ... | ... |
| 583 | CL:4023070 | caudal ganglionic eminence derived GABAergic c... | 8463 |
| 584 | CL:4028002 | alveolar capillary type 1 endothelial cell | 16048 |
| 585 | CL:4028003 | alveolar capillary type 2 endothelial cell | 7157 |
| 586 | CL:4028006 | alveolar type 2 fibroblast cell | 4670 |
| 587 | CL:4030006 | fallopian tube secretory epithelial cell | 463461 |
588 rows × 3 columns
Close the census when complete.
[3]:
census.close()