hrchy_cytocommunity.tools.data_preprocessing.compute_knn

hrchy_cytocommunity.tools.data_preprocessing.compute_knn(coords, K, sample_id, save_folder: str | None = None)

Construct a K-Nearest Neighbor (KNN) graph and optionally save it to file.

This function builds an undirected KNN graph from spatial coordinates, symmetrizes the adjacency, and outputs the edge list as a NumPy array or tab-separated text file compatible with HRCHY-CytoCommunity.

Parameters:
  • coords (numpy.ndarray of shape (n_cells, 2)) – Spatial coordinates of all cells or spots, where each row represents a point (x, y).

  • K (int) – Number of nearest neighbors to connect for each node.

  • sample_id (str) – Identifier for the current sample, used as prefix for saved edge list.

  • save_folder (str or Path, optional) – Directory to save the resulting edge list file. If None, the graph is constructed but not written to disk.

Returns:

  • edge_index (numpy.ndarray of shape (n_edges, 2)) – Array of integer pairs representing undirected edges in the KNN graph. Each row corresponds to one edge [source, target].

  • Outputs

  • ——-

  • If save_folder is provided, the following file will be generated

    • <sample_id>_EdgeIndex.txt — tab-separated list of undirected edges.

Notes

  • The KNN graph is constructed using scikit-learn’s sklearn.neighbors.kneighbors_graph() with mode='connectivity' and include_self=False.

  • The resulting adjacency matrix is symmetrized (A = A ∪ Aᵀ) to ensure undirected connectivity.

  • The sparse adjacency is converted into an explicit edge list for downstream graph-based modeling.

  • The graph size (number of edges) depends on both K and the local density of points.

Examples

>>> import numpy as np
>>> coords = np.random.rand(100, 2) * 100  # 100 spatial points
>>> edge_index = compute_knn(coords, K=10, sample_id="sample1",
...                          save_folder="data/HRCHY_input/")
>>> edge_index.shape
(2000, 2)