Exploring pre-calculated summary cell counts¶
This tutorial describes how to access pre-calculated summary cell counts. Each Census contains a top-level dataframe summarizing counts of various cell labels, this is the census_summary_cell_counts dataframe . You can read this into a Pandas DataFrame
Contents
Fetching the
census_summary_cell_countsdataframe.Creating summary counts beyond pre-calculated values.
Fetching the census_summary_cell_counts dataframe¶
[1]:
import cellxgene_census
census = cellxgene_census.open_soma()
census_summary_cell_counts = census["census_info"]["summary_cell_counts"].read().concat().to_pandas()
# Dropping the soma_joinid column as it isn't useful in this demo
census_summary_cell_counts = census_summary_cell_counts.drop(columns=["soma_joinid"])
census_summary_cell_counts
[1]:
| organism | category | ontology_term_id | unique_cell_count | total_cell_count | label | |
|---|---|---|---|---|---|---|
| 0 | Homo sapiens | all | na | 29817108 | 46050829 | na |
| 1 | Homo sapiens | assay | EFO:0008722 | 206279 | 260396 | Drop-seq |
| 2 | Homo sapiens | assay | EFO:0008780 | 25652 | 51304 | inDrop |
| 3 | Homo sapiens | assay | EFO:0008913 | 133511 | 133511 | single-cell RNA sequencing |
| 4 | Homo sapiens | assay | EFO:0008919 | 44721 | 161998 | Seq-Well |
| ... | ... | ... | ... | ... | ... | ... |
| 1263 | Mus musculus | tissue_general | UBERON:0002113 | 164881 | 188361 | kidney |
| 1264 | Mus musculus | tissue_general | UBERON:0002365 | 15577 | 31154 | exocrine gland |
| 1265 | Mus musculus | tissue_general | UBERON:0002367 | 37715 | 130135 | prostate gland |
| 1266 | Mus musculus | tissue_general | UBERON:0002368 | 13322 | 26644 | endocrine gland |
| 1267 | Mus musculus | tissue_general | UBERON:0002371 | 90225 | 144962 | bone marrow |
1268 rows × 6 columns
Creating summary counts beyond pre-calculated values.¶
The dataframe above is precomputed from the experiments in the Census, providing a quick overview of the Census contents.
You can do similar group statistics using Pandas groupby functions.
The code below reproduces the above counts using full obs dataframe in the Homo_sapiens experiment.
Keep in mind that the Census is very large, and any queries will return significant amount of data. You can manage that by narrowing the query request using column_names and value_filter in your query.
[2]:
human = census["census_data"]["homo_sapiens"]
obs_df = human.obs.read(column_names=["cell_type_ontology_term_id", "cell_type"]).concat().to_pandas()
obs_df.groupby(by=["cell_type_ontology_term_id", "cell_type"], as_index=False, observed=True).size()
[2]:
| cell_type_ontology_term_id | cell_type | size | |
|---|---|---|---|
| 0 | CL:0000001 | primary cultured cell | 80 |
| 1 | CL:0000003 | native cell | 778811 |
| 2 | CL:0000006 | neuronal receptor cell | 2502 |
| 3 | CL:0000019 | sperm | 11 |
| 4 | CL:0000031 | neuroblast (sensu Vertebrata) | 2355 |
| ... | ... | ... | ... |
| 585 | CL:4023070 | caudal ganglionic eminence derived GABAergic c... | 8463 |
| 586 | CL:4028002 | alveolar capillary type 1 endothelial cell | 16048 |
| 587 | CL:4028003 | alveolar capillary type 2 endothelial cell | 7157 |
| 588 | CL:4028006 | alveolar type 2 fibroblast cell | 4670 |
| 589 | CL:4030006 | fallopian tube secretory epithelial cell | 463461 |
590 rows × 3 columns
Close the census when complete.
[3]:
census.close()