hrchy_cytocommunity.tools.data_preprocessing.formulate_HRCHYCytoCommunity_input_from_anndata_spot

hrchy_cytocommunity.tools.data_preprocessing.formulate_HRCHYCytoCommunity_input_from_anndata_spot(adata, sample_id, output_dir, graph_id, coarse_gt_col=None, fine_gt_col=None)

Formulate HRCHY-CytoCommunity input files from an AnnData object of spatial transcriptomics data with cell type deconvolution results.

This function converts a spot-level spatial transcriptomics dataset into a set of text files that serve as standardized input for HRCHY-CytoCommunity. Unlike the single-cell version, this function assumes that each spot contains mixed cell-type proportions (deconvolution results stored in adata.obsm['deconv_ret']).

Parameters:
  • adata (anndata.AnnData) –

    Spatial transcriptomics dataset. Must contain: - adata.obsm['spatial'] : array-like of shape (n_spots, 2), spatial coordinates. - adata.obsm['deconv_ret'] : pandas.DataFrame of shape (n_spots, n_celltypes),

    containing cell type proportions per spot.

  • sample_id (str) – Unique sample identifier, used as prefix for all output files.

  • output_dir (The following tab-separated files are generated in) – Directory path where the HRCHY-CytoCommunity input files will be saved. Created automatically if it does not exist.

  • graph_id (int) – Integer identifier for the current sample (graph index). Used for multi-sample integration or batch processing.

  • coarse_gt_col (str, optional) – Column name in adata.obs specifying coarse-grained ground truth labels. If None, the coarse ground truth file is not generated.

  • fine_gt_col (str, optional) – Column name in adata.obs specifying fine-grained ground truth labels. If None, the fine ground truth file is not generated.

  • Outputs

  • -------

  • output_dir

  • (x (- <sample_id>_Coordinates.txt — spatial coordinates)

  • y)

  • result) (- <sample_id>_CellTypeLabel.txt — list of cell type names (columns from deconvolution)

  • spot) (- <sample_id>_NodeAttr.txt — node attribute matrix (cell type proportions per)

  • above) (- <sample_id>_NodeName.txt — names of cell type attributes (same as)

  • sample/graph (- <sample_id>_GraphIndex.txt — integer index of this)

  • labels (- <sample_id>_fineGT.txt — optional fine ground truth)

  • labels

Notes

  • The deconvolution result adata.obsm['deconv_ret'] must be a DataFrame with cell type names as columns and spots as rows.

  • Missing values are not explicitly handled; users should ensure numeric completeness before calling this function.

  • The output format is consistent with the single-cell version (formulate_HRCHYCytoCommunity_input_from_anndata_singlecell), enabling joint downstream analysis in HRCHY-CytoCommunity.

  • All files are written in tab-delimited text format.

Examples

>>> import scanpy as sc
>>> adata = sc.read_h5ad("Visium_BC_sample.h5ad")
>>> formulate_HRCHYCytoCommunity_input_from_anndata_spot(
...     adata=adata,
...     sample_id="VisiumBC_P2",
...     output_dir="data/HRCHY_input/",
...     graph_id=1,
...     coarse_gt_col="compartment",
...     fine_gt_col="subregion"
... )