hrchy_cytocommunity.tools.data_preprocessing.formulate_HRCHYCytoCommunity_input_from_anndata_spot

hrchy_cytocommunity.tools.data_preprocessing.formulate_HRCHYCytoCommunity_input_from_anndata_spot(adata, sample_id, output_dir, graph_id, coarse_gt_col=None, fine_gt_col=None)

Formulate HRCHY-CytoCommunity input files from an AnnData object of spatial transcriptomics data with cell type deconvolution results.

This function converts a spot-level spatial transcriptomics dataset into a set of text files that serve as standardized input for HRCHY-CytoCommunity. Unlike the single-cell version, this function assumes that each spot contains mixed cell-type proportions (deconvolution results stored in adata.obsm['deconv_ret']).

Parameters:

adata (anndata.AnnData) –
Spatial transcriptomics dataset. Must contain: - adata.obsm['spatial'] : array-like of shape (n_spots, 2), spatial coordinates. - adata.obsm['deconv_ret'] : pandas.DataFrame of shape (n_spots, n_celltypes),

containing cell type proportions per spot.
sample_id (str) – Unique sample identifier, used as prefix for all output files.
output_dir (The following tab-separated files are generated in) – Directory path where the HRCHY-CytoCommunity input files will be saved. Created automatically if it does not exist.
graph_id (int) – Integer identifier for the current sample (graph index). Used for multi-sample integration or batch processing.
coarse_gt_col (str, optional) – Column name in adata.obs specifying coarse-grained ground truth labels. If None, the coarse ground truth file is not generated.
fine_gt_col (str, optional) – Column name in adata.obs specifying fine-grained ground truth labels. If None, the fine ground truth file is not generated.
Outputs
-------
output_dir
(x (- <sample_id>_Coordinates.txt — spatial coordinates)
y)
result) (- <sample_id>_CellTypeLabel.txt — list of cell type names (columns from deconvolution)
spot) (- <sample_id>_NodeAttr.txt — node attribute matrix (cell type proportions per)
above) (- <sample_id>_NodeName.txt — names of cell type attributes (same as)
sample/graph (- <sample_id>_GraphIndex.txt — integer index of this)
labels (- <sample_id>_fineGT.txt — optional fine ground truth)
labels

Notes

The deconvolution result adata.obsm['deconv_ret'] must be a DataFrame with cell type names as columns and spots as rows.
Missing values are not explicitly handled; users should ensure numeric completeness before calling this function.
The output format is consistent with the single-cell version (formulate_HRCHYCytoCommunity_input_from_anndata_singlecell), enabling joint downstream analysis in HRCHY-CytoCommunity.
All files are written in tab-delimited text format.

Examples

>>> import scanpy as sc
>>> adata = sc.read_h5ad("Visium_BC_sample.h5ad")
>>> formulate_HRCHYCytoCommunity_input_from_anndata_spot(
...     adata=adata,
...     sample_id="VisiumBC_P2",
...     output_dir="data/HRCHY_input/",
...     graph_id=1,
...     coarse_gt_col="compartment",
...     fine_gt_col="subregion"
... )