hrchy_cytocommunity.tools.data_preprocessing.compute_knn
- hrchy_cytocommunity.tools.data_preprocessing.compute_knn(coords, K, sample_id, save_folder: str | None = None)
Construct a K-Nearest Neighbor (KNN) graph and optionally save it to file.
This function builds an undirected KNN graph from spatial coordinates, symmetrizes the adjacency, and outputs the edge list as a NumPy array or tab-separated text file compatible with HRCHY-CytoCommunity.
- Parameters:
coords (numpy.ndarray of shape (n_cells, 2)) – Spatial coordinates of all cells or spots, where each row represents a point (x, y).
K (int) – Number of nearest neighbors to connect for each node.
sample_id (str) – Identifier for the current sample, used as prefix for saved edge list.
save_folder (str or Path, optional) – Directory to save the resulting edge list file. If
None, the graph is constructed but not written to disk.
- Returns:
edge_index (numpy.ndarray of shape (n_edges, 2)) – Array of integer pairs representing undirected edges in the KNN graph. Each row corresponds to one edge
[source, target].Outputs
——-
If
save_folderis provided, the following file will be generated<sample_id>_EdgeIndex.txt— tab-separated list of undirected edges.
Notes
The KNN graph is constructed using scikit-learn’s
sklearn.neighbors.kneighbors_graph()withmode='connectivity'andinclude_self=False.The resulting adjacency matrix is symmetrized (A = A ∪ Aᵀ) to ensure undirected connectivity.
The sparse adjacency is converted into an explicit edge list for downstream graph-based modeling.
The graph size (number of edges) depends on both K and the local density of points.
Examples
>>> import numpy as np >>> coords = np.random.rand(100, 2) * 100 # 100 spatial points >>> edge_index = compute_knn(coords, K=10, sample_id="sample1", ... save_folder="data/HRCHY_input/") >>> edge_index.shape (2000, 2)