hrchy_cytocommunity.models.dataset.SpatialOmicsImageDataset

class hrchy_cytocommunity.models.dataset.SpatialOmicsImageDataset(root, transform=None, pre_transform=None)

Spatial omics dataset loader for HRCHY-CytoCommunity.

This class inherits from torch_geometric.data.InMemoryDataset and is designed to read preprocessed spatial omics graph data files (coordinates, edges, node attributes, and graph indices) from a specified directory and construct PyTorch Geometric torch_geometric.data.Data objects for downstream graph neural network training.

Each sample (region/tissue section) corresponds to one graph, and all graphs are collated into a single dataset stored in processed/SpatialOmicsImageDataset.pt.

Parameters:

root (str or Path) – Root directory containing the following subfolders: - raw/ — containing raw graph files (text format). - processed/ — where the processed dataset will be saved.
transform (callable, optional) – Data transformation function applied before returning a graph sample. See torch_geometric.transforms.
pre_transform (callable, optional) – Data preprocessing transformation function applied before saving processed data.

data

Tensor representation of the concatenated graph dataset.

Type:: torch_geometric.data.Data

slices

Indexing dictionary used by PyTorch Geometric to retrieve individual graphs efficiently.

Type:: dict

processed_paths

List of output file paths (by default ['SpatialOmicsImageDataset.pt']).

Type:: list[str]

raw_file_names(): Returns the list of expected raw input files (empty list in this case).

processed_file_names(): Returns the list of expected processed dataset files.

download(): Placeholder for downloading data (not implemented).

process()

Constructs torch_geometric.data.Data objects from input text files under raw_dir. The following files are required for each region name listed in ImageNameList.txt:

<region>_EdgeIndex.txt — edge list (tab-delimited, int64)

<region>_NodeAttr.txt — node attributes (tab-delimited, float32)

<region>_GraphIndex.txt — graph index (int)

The resulting dataset is saved to processed/SpatialOmicsImageDataset.pt.

Notes

The input file ImageNameList.txt must be located in raw_dir, containing one region name per line.
The class automatically symmetrizes edge indices when necessary and converts NumPy arrays to PyTorch tensors.
This class is intended for use with HRCHY-CytoCommunity and compatible with PyTorch Geometric’s standard data pipeline.

Examples

>>> from hrchy_cytocommunity.models.dataset import SpatialOmicsImageDataset
>>> dataset = SpatialOmicsImageDataset(root="data/HRCHY_input/")
>>> print(len(dataset))
5
>>> print(dataset[0])
Data(x=[1024, 30], edge_index=[2, 4096], graph_idx=[1])

__init__(root, transform=None, pre_transform=None)

Methods

`__init__`(root[, transform, pre_transform])
`collate`(data_list)	Collates a list of `Data` or `HeteroData` objects to the internal storage format of `InMemoryDataset`.
`copy`([idx])	Performs a deep-copy of the dataset.
`cpu`(*args)	Moves the dataset to CPU memory.
`cuda`([device])	Moves the dataset toto CUDA memory.
`download`()	Downloads the dataset to the `self.raw_dir` folder.
`get`(idx)	Gets the data object at index `idx`.
`get_summary`()	Collects summary statistics for the dataset.
`index_select`(idx)	Creates a subset of the dataset from specified indices `idx`.
`indices`()
`len`()	Returns the number of data objects stored in the dataset.
`load`(path[, data_cls])	Loads the dataset from the file path `path`.
`print_summary`([fmt])	Prints summary statistics of the dataset to the console.
`process`()	Processes the dataset to the `self.processed_dir` folder.
`save`(data_list, path)	Saves a list of data objects to the file path `path`.
`shuffle`([return_perm])	Randomly shuffles the examples in the dataset.
`to`(device)	Performs device conversion of the whole dataset.
`to_datapipe`()	Converts the dataset into a `torch.utils.data.DataPipe`.
`to_on_disk_dataset`([root, backend, log])	Converts the `InMemoryDataset` to a `OnDiskDataset` variant.

Attributes

`data`
`has_download`	Checks whether the dataset defines a `download()` method.
`has_process`	Checks whether the dataset defines a `process()` method.
`num_classes`	Returns the number of classes in the dataset.
`num_edge_features`	Returns the number of features per edge in the dataset.
`num_features`	Returns the number of features per node in the dataset.
`num_node_features`	Returns the number of features per node in the dataset.
`processed_dir`
`processed_file_names`	The name of the files in the `self.processed_dir` folder that must be present in order to skip processing.
`processed_paths`	The absolute filepaths that must be present in order to skip processing.
`raw_dir`
`raw_file_names`	The name of the files in the `self.raw_dir` folder that must be present in order to skip downloading.
`raw_paths`	The absolute filepaths that must be present in order to skip downloading.