opengt.loader

opengt.loader.master_loader.load_dataset_master(format, name, dataset_dir)[source]

Master loader that controls loading of all datasets, overshadowing execution of any default GraphGym dataset loader. Default GraphGym dataset loader are instead called from this function, the format keywords PyG and OGB are reserved for these default GraphGym loaders.

Custom transforms and dataset splitting is applied to each loaded dataset.

Parameters:
  • format – dataset format name that identifies Dataset class

  • name – dataset name to select from the class identified by format

  • dataset_dir – path where to store the processed dataset

Returns:

PyG dataset object with applied perturbation transforms and data splits

opengt.transform.task_preprocessing.task_specific_preprocessing(data, cfg)[source]

Task-specific preprocessing before the dataset is logged and finalized.

Parameters:
  • data – PyG graph

  • cfg – Main configuration node

Returns:

Extended PyG Data object.

opengt.transform.posenc_stats.compute_posenc_stats(data, pe_types, is_undirected, cfg)[source]

Precompute positional encodings for the given graph.

Supported PE statistics to precompute, selected by pe_types: ‘LapPE’: Laplacian eigen-decomposition. ‘RWSE’: Random walk landing probabilities (diagonals of RW matrices). ‘HKfullPE’: Full heat kernels and their diagonals. (NOT IMPLEMENTED) ‘HKdiagSE’: Diagonals of heat kernel diffusion. ‘ElstaticSE’: Kernel based on the electrostatic interaction between nodes. ‘Graphormer’: Computes spatial types and optionally edges along shortest paths. ‘LapRaw’: Laplacian eigen-decomposition without further processing. ‘RRWP’: Relative Random Walk Probabilities PE (for GRIT) ‘WLSE’: Weisfeiler-Lehman encoding.

Parameters:
  • data – PyG graph

  • pe_types – Positional encoding types to precompute statistics for. This can also be a combination, e.g. ‘eigen+rw_landing’

  • is_undirected – True if the graph is expected to be undirected

  • cfg – Main configuration node

Returns:

Extended PyG Data object.

opengt.transform.posenc_stats.custom_eigh(L)[source]

Compute eigenvalues and eigenvectors of a Laplacian matrix. Due to a bug in PyTorch, we use scipy’s eigh instead of torch.linalg.eigh when matrix size is large.

Parameters:

L – Laplacian matrix (Tensor)

Returns:

Eigenvalues EigVecs: Eigenvectors

Return type:

EigVals

opengt.transform.posenc_stats.eigvec_normalizer(EigVecs, EigVals, normalization='L2', eps=1e-12)[source]

Implement different eigenvector normalizations.

opengt.transform.posenc_stats.get_electrostatic_function_encoding(edge_index, num_nodes)[source]

Kernel based on the electrostatic interaction between nodes.

opengt.transform.posenc_stats.get_heat_kernels(evects, evals, kernel_times=[])[source]

Compute full Heat diffusion kernels.

Parameters:
  • evects – Eigenvectors of the Laplacian matrix

  • evals – Eigenvalues of the Laplacian matrix

  • kernel_times – Time for the diffusion. Analogous to the k-steps in random walk. The time is equivalent to the variance of the kernel.

opengt.transform.posenc_stats.get_heat_kernels_diag(evects, evals, kernel_times=[], space_dim=0)[source]

Compute Heat kernel diagonal.

This is a continuous function that represents a Gaussian in the Euclidean space, and is the solution to the diffusion equation. The random-walk diagonal should converge to this.

Parameters:
  • evects – Eigenvectors of the Laplacian matrix

  • evals – Eigenvalues of the Laplacian matrix

  • kernel_times – Time for the diffusion. Analogous to the k-steps in random walk. The time is equivalent to the variance of the kernel.

  • space_dim – (optional) Estimated dimensionality of the space. Used to correct the diffusion diagonal by a factor t^(space_dim/2). In euclidean space, this correction means that the height of the gaussian stays constant across time, if space_dim is the dimension of the euclidean space.

Returns:

2D Tensor with shape (num_nodes, len(ksteps)) with RW landing probs

opengt.transform.posenc_stats.get_lap_decomp_stats(evals, evects, max_freqs, eigvec_norm='L2')[source]

Compute Laplacian eigen-decomposition-based PE stats of the given graph.

Parameters:
  • evals – Precomputed eigen-decomposition

  • evects – Precomputed eigen-decomposition

  • max_freqs – Maximum number of top smallest frequencies / eigenvecs to use

  • eigvec_norm – Normalization for the eigen vectors of the Laplacian

Returns:

Tensor (num_nodes, max_freqs, 1) eigenvalues repeated for each node Tensor (num_nodes, max_freqs) of eigenvector values per node

opengt.transform.posenc_stats.get_rw_landing_probs(ksteps, edge_index, edge_weight=None, num_nodes=None, space_dim=0)[source]

Compute Random Walk landing probabilities for given list of K steps.

Parameters:
  • ksteps – List of k-steps for which to compute the RW landings

  • edge_index – PyG sparse representation of the graph

  • edge_weight – (optional) Edge weights

  • num_nodes – (optional) Number of nodes in the graph

  • space_dim – (optional) Estimated dimensionality of the space. Used to correct the random-walk diagonal by a factor k^(space_dim/2). In euclidean space, this correction means that the height of the gaussian distribution stays almost constant across the number of steps, if space_dim is the dimension of the euclidean space.

Returns:

2D Tensor with shape (num_nodes, len(ksteps)) with RW landing probs

opengt.encoder.graphormer_encoder.graphormer_pre_processing(data, distance)[source]

Implementation of Graphormer pre-processing. Computes in- and out-degrees for node encodings, as well as spatial types (via shortest-path lengths) and prepares edge encodings along shortest paths. The function adds the following properties to the data object:

  • spatial_types

  • graph_index: An edge_index type tensor that contains all possible directed edges

    (see more below)

  • shortest_path_types: Populates edge attributes along all shortest paths between two nodes

Similar to the adjacency matrix, any matrix can be batched in PyG by decomposing it into a 1D tensor of values and a 2D tensor of indices. Once batched, the graph-specific matrix can be recovered (while appropriately padded) via to_dense_adj. We use this concept to decompose the spatial type matrix and the shortest path edge type tensor via the graph_index tensor.

Parameters:
  • data (torch_geometric.data.Data) – A PyG data object holding a single graph

  • distance (int) – The distance up to which types are calculated

Returns:

The augmented data object.

Return type:

data (torch_geometric.data.Data)