opengt.loader¶
- opengt.loader.master_loader.load_dataset_master(format, name, dataset_dir)[source]¶
Master loader that controls loading of all datasets, overshadowing execution of any default GraphGym dataset loader. Default GraphGym dataset loader are instead called from this function, the format keywords PyG and OGB are reserved for these default GraphGym loaders.
Custom transforms and dataset splitting is applied to each loaded dataset.
- Parameters:
format – dataset format name that identifies Dataset class
name – dataset name to select from the class identified by format
dataset_dir – path where to store the processed dataset
- Returns:
PyG dataset object with applied perturbation transforms and data splits
- opengt.transform.task_preprocessing.task_specific_preprocessing(data, cfg)[source]¶
Task-specific preprocessing before the dataset is logged and finalized.
- Parameters:
data – PyG graph
cfg – Main configuration node
- Returns:
Extended PyG Data object.
- opengt.transform.posenc_stats.compute_posenc_stats(data, pe_types, is_undirected, cfg)[source]¶
Precompute positional encodings for the given graph.
Supported PE statistics to precompute, selected by pe_types: ‘LapPE’: Laplacian eigen-decomposition. ‘RWSE’: Random walk landing probabilities (diagonals of RW matrices). ‘HKfullPE’: Full heat kernels and their diagonals. (NOT IMPLEMENTED) ‘HKdiagSE’: Diagonals of heat kernel diffusion. ‘ElstaticSE’: Kernel based on the electrostatic interaction between nodes. ‘Graphormer’: Computes spatial types and optionally edges along shortest paths. ‘LapRaw’: Laplacian eigen-decomposition without further processing. ‘RRWP’: Relative Random Walk Probabilities PE (for GRIT) ‘WLSE’: Weisfeiler-Lehman encoding.
- Parameters:
data – PyG graph
pe_types – Positional encoding types to precompute statistics for. This can also be a combination, e.g. ‘eigen+rw_landing’
is_undirected – True if the graph is expected to be undirected
cfg – Main configuration node
- Returns:
Extended PyG Data object.
- opengt.transform.posenc_stats.custom_eigh(L)[source]¶
Compute eigenvalues and eigenvectors of a Laplacian matrix. Due to a bug in PyTorch, we use scipy’s eigh instead of torch.linalg.eigh when matrix size is large.
- Parameters:
L – Laplacian matrix (Tensor)
- Returns:
Eigenvalues EigVecs: Eigenvectors
- Return type:
EigVals
- opengt.transform.posenc_stats.eigvec_normalizer(EigVecs, EigVals, normalization='L2', eps=1e-12)[source]¶
Implement different eigenvector normalizations.
- opengt.transform.posenc_stats.get_electrostatic_function_encoding(edge_index, num_nodes)[source]¶
Kernel based on the electrostatic interaction between nodes.
- opengt.transform.posenc_stats.get_heat_kernels(evects, evals, kernel_times=[])[source]¶
Compute full Heat diffusion kernels.
- Parameters:
evects – Eigenvectors of the Laplacian matrix
evals – Eigenvalues of the Laplacian matrix
kernel_times – Time for the diffusion. Analogous to the k-steps in random walk. The time is equivalent to the variance of the kernel.
- opengt.transform.posenc_stats.get_heat_kernels_diag(evects, evals, kernel_times=[], space_dim=0)[source]¶
Compute Heat kernel diagonal.
This is a continuous function that represents a Gaussian in the Euclidean space, and is the solution to the diffusion equation. The random-walk diagonal should converge to this.
- Parameters:
evects – Eigenvectors of the Laplacian matrix
evals – Eigenvalues of the Laplacian matrix
kernel_times – Time for the diffusion. Analogous to the k-steps in random walk. The time is equivalent to the variance of the kernel.
space_dim – (optional) Estimated dimensionality of the space. Used to correct the diffusion diagonal by a factor t^(space_dim/2). In euclidean space, this correction means that the height of the gaussian stays constant across time, if space_dim is the dimension of the euclidean space.
- Returns:
2D Tensor with shape (num_nodes, len(ksteps)) with RW landing probs
- opengt.transform.posenc_stats.get_lap_decomp_stats(evals, evects, max_freqs, eigvec_norm='L2')[source]¶
Compute Laplacian eigen-decomposition-based PE stats of the given graph.
- Parameters:
evals – Precomputed eigen-decomposition
evects – Precomputed eigen-decomposition
max_freqs – Maximum number of top smallest frequencies / eigenvecs to use
eigvec_norm – Normalization for the eigen vectors of the Laplacian
- Returns:
Tensor (num_nodes, max_freqs, 1) eigenvalues repeated for each node Tensor (num_nodes, max_freqs) of eigenvector values per node
- opengt.transform.posenc_stats.get_rw_landing_probs(ksteps, edge_index, edge_weight=None, num_nodes=None, space_dim=0)[source]¶
Compute Random Walk landing probabilities for given list of K steps.
- Parameters:
ksteps – List of k-steps for which to compute the RW landings
edge_index – PyG sparse representation of the graph
edge_weight – (optional) Edge weights
num_nodes – (optional) Number of nodes in the graph
space_dim – (optional) Estimated dimensionality of the space. Used to correct the random-walk diagonal by a factor k^(space_dim/2). In euclidean space, this correction means that the height of the gaussian distribution stays almost constant across the number of steps, if space_dim is the dimension of the euclidean space.
- Returns:
2D Tensor with shape (num_nodes, len(ksteps)) with RW landing probs
- opengt.encoder.graphormer_encoder.graphormer_pre_processing(data, distance)[source]
Implementation of Graphormer pre-processing. Computes in- and out-degrees for node encodings, as well as spatial types (via shortest-path lengths) and prepares edge encodings along shortest paths. The function adds the following properties to the data object:
spatial_types
- graph_index: An edge_index type tensor that contains all possible directed edges
(see more below)
shortest_path_types: Populates edge attributes along all shortest paths between two nodes
Similar to the adjacency matrix, any matrix can be batched in PyG by decomposing it into a 1D tensor of values and a 2D tensor of indices. Once batched, the graph-specific matrix can be recovered (while appropriately padded) via
to_dense_adj. We use this concept to decompose the spatial type matrix and the shortest path edge type tensor via thegraph_indextensor.- Parameters:
data (torch_geometric.data.Data) – A PyG data object holding a single graph
distance (int) – The distance up to which types are calculated
- Returns:
The augmented data object.
- Return type:
data (torch_geometric.data.Data)