hrchy_cytocommunity.tools.data_preprocessing.formulate_HRCHYCytoCommunity_input_from_anndata_spot
- hrchy_cytocommunity.tools.data_preprocessing.formulate_HRCHYCytoCommunity_input_from_anndata_spot(adata, sample_id, output_dir, graph_id, coarse_gt_col=None, fine_gt_col=None)
Formulate HRCHY-CytoCommunity input files from an AnnData object of spatial transcriptomics data with cell type deconvolution results.
This function converts a spot-level spatial transcriptomics dataset into a set of text files that serve as standardized input for HRCHY-CytoCommunity. Unlike the single-cell version, this function assumes that each spot contains mixed cell-type proportions (deconvolution results stored in
adata.obsm['deconv_ret']).- Parameters:
adata (anndata.AnnData) –
Spatial transcriptomics dataset. Must contain: -
adata.obsm['spatial']: array-like of shape (n_spots, 2), spatial coordinates. -adata.obsm['deconv_ret']: pandas.DataFrame of shape (n_spots, n_celltypes),containing cell type proportions per spot.
sample_id (str) – Unique sample identifier, used as prefix for all output files.
output_dir (The following tab-separated files are generated in) – Directory path where the HRCHY-CytoCommunity input files will be saved. Created automatically if it does not exist.
graph_id (int) – Integer identifier for the current sample (graph index). Used for multi-sample integration or batch processing.
coarse_gt_col (str, optional) – Column name in
adata.obsspecifying coarse-grained ground truth labels. If None, the coarse ground truth file is not generated.fine_gt_col (str, optional) – Column name in
adata.obsspecifying fine-grained ground truth labels. If None, the fine ground truth file is not generated.Outputs
-------
output_dir
(x (- <sample_id>_Coordinates.txt — spatial coordinates)
y)
result) (- <sample_id>_CellTypeLabel.txt — list of cell type names (columns from deconvolution)
spot) (- <sample_id>_NodeAttr.txt — node attribute matrix (cell type proportions per)
above) (- <sample_id>_NodeName.txt — names of cell type attributes (same as)
sample/graph (- <sample_id>_GraphIndex.txt — integer index of this)
labels (- <sample_id>_fineGT.txt — optional fine ground truth)
labels
Notes
The deconvolution result
adata.obsm['deconv_ret']must be a DataFrame with cell type names as columns and spots as rows.Missing values are not explicitly handled; users should ensure numeric completeness before calling this function.
The output format is consistent with the single-cell version (formulate_HRCHYCytoCommunity_input_from_anndata_singlecell), enabling joint downstream analysis in HRCHY-CytoCommunity.
All files are written in tab-delimited text format.
Examples
>>> import scanpy as sc >>> adata = sc.read_h5ad("Visium_BC_sample.h5ad") >>> formulate_HRCHYCytoCommunity_input_from_anndata_spot( ... adata=adata, ... sample_id="VisiumBC_P2", ... output_dir="data/HRCHY_input/", ... graph_id=1, ... coarse_gt_col="compartment", ... fine_gt_col="subregion" ... )