Add GRIT model #777

pweigel · 2024-12-17T20:16:04Z

GRIT: "Graph Inductive Biases in Transformers without Message Passing"

This PR includes a new model based on the GRIT transformer. It uses novel methods for encoding graph information for use in sparse multi-head attention blocks. It uses a learned position encoding based on random walk probabilities, which enhances the model's expressivity.

PMLR: https://proceedings.mlr.press/v202/ma23c.html
Paper pre-print: https://arxiv.org/abs/2305.17589

Many layers/functions are adapted from the original repository: https://github.com/LiamMa/GRIT/tree/main. The original code uses graphgym to set up most of its modules, so I refactored some things to fit into graphnet. Many of the arguments have been relabeled to be more self-explanatory. In principle, other graph attention mechanisms could be used by replacing the GRIT MHA block.

Since there are a lot of changes, I will quickly summarize the significant new additions and modifications to existing files:

src/graphnet/models/components/embedding.py: Adds four encoding modules, which apply linear and RRWP encodings to nodes and edges.
src/graphnet/models/components/layers.py: Adds GRIT sparse multi-head attention, the core of the GRIT model, and the GRIT transformer layer.
src/graphnet/models/gnn/grit.py: Implementation of GRIT as a GNN model.
src/graphnet/models/graphs/edges/edges.py: Added a variant of the KNNEdge (KNNDistanceEdges) to include edge values corresponding to the distance between the node pair.
src/graphnet/models/graphs/graphs.py: Added a variant of the KNNGraph to use the KNNDistanceEdges and compute the RRWP values.
src/graphnet/models/utils.py: Added new utilities needed for the RRWP calculation and other miscellaneous tools used by the GRIT transformer layers.
examples/04_training/08_train_grit_model.py: Example training script using GRIT.

This model has many hyperparameters, but the defaults should provide a good starting point. It should be noted that the GPU memory required to train this model is quite high due to the use of global attention.

pweigel · 2024-12-24T16:19:14Z

After some experimenting, I've found that no position encoding works quite well (skipping the RRWP encodings) and drastically reduces the GPU memory requirement. I'll add some options that allow users to do this.

…ickling

RasmusOrsoe · 2025-01-02T12:26:01Z

src/graphnet/models/utils.py

+
+    rel_pe = SparseTensor.from_dense(pe, has_value=True)
+    rel_pe_row, rel_pe_col, rel_pe_val = rel_pe.coo()
+    # rel_pe_idx = torch.stack([rel_pe_row, rel_pe_col], dim=0)


forgot a comment here

RasmusOrsoe · 2025-01-02T12:26:12Z

src/graphnet/models/utils.py

+    data: Data,
+    walk_length: int = 8,
+    attr_name_abs: str = "rrwp",  # name: 'rrwp'
+    attr_name_rel: str = "rrwp",  # name: ('rrwp_idx', 'rrwp_val')


forgot a comment here

RasmusOrsoe · 2025-01-02T12:26:49Z

src/graphnet/models/utils.py

+        edge_weight = torch.ones(edge_index.size(1), device=edge_index.device)
+    num_nodes = maybe_num_nodes(edge_index, num_nodes)
+    source = edge_index[0]
+    # dest = edge_index[1]


forgot a comment here

RasmusOrsoe · 2025-01-02T12:27:32Z

src/graphnet/models/utils.py

+
+        adj = adj.view(size)
+        _edge_index = adj.nonzero(as_tuple=False).t().contiguous()
+        # _edge_index, _ = remove_self_loops(_edge_index)


forgot a comment here

RasmusOrsoe · 2025-01-02T12:36:29Z

src/graphnet/models/utils.py

+        add_identity: Add identity matrix to position encoding.
+        spd: Use shortest path distances.
+    """
+    # device = data.edge_index.device


forgot a comment here

RasmusOrsoe · 2025-01-02T13:35:06Z

src/graphnet/models/components/layers.py

+
+        Args:
+            in_dim: Dimension of the input tensor.
+            out_dim: Dimension of theo output tensor.


theo -> the

RasmusOrsoe · 2025-01-02T13:37:43Z

src/graphnet/models/components/layers.py

+        if E is not None:
+            wE = score.flatten(1)
+
+        # Complete attention ccaclculation


ccaclculation -> calculation

RasmusOrsoe · 2025-01-02T13:46:34Z

src/graphnet/models/components/layers.py

+            out_dim: Dimension of theo output tensor.
+            num_heads: Number of attention heads.
+            dropout: Dropout layer probability.
+            norm: Normalization layer.


Would be informative to mention here that the normalization layer is assumed to be un-instantiated. E.g:

norm: Uninstantiated normalization layer. Must be either BatchNorm1d or BatchNorm1d

RasmusOrsoe · 2025-01-02T13:58:58Z

src/graphnet/models/graphs/graphs.py

+
+class KNNGraphRWSE(GraphDefinition):
+    """KNN Graph with random walk structural encoding."""
+


I would suggest expanding the doc string so it's easier to see how this differs from existing representations and, specifically, how the RWSE is accessible. Here's an example:

""" A KNN graph representation with Random Walk Structural Encoding (RWSE). Identical to KNNGraph but with an additional field containing RWSE. The encoding can be accessed via `rwse = graph['rwse']` """

RasmusOrsoe · 2025-01-02T14:07:59Z

src/graphnet/models/graphs/graphs.py

+
+
+class KNNGraphRRWP(GraphDefinition):
+    """KNN Graph with relative random walk probabilities."""


Similarly to the other representation, I would suggest expanding the doc string so it's easier to see how this differs from existing representations and how the new fields are accessible. I think there are five new fields in the RRWP case. So here's an example:

"""
A KNN graph representation with Relative Random Walk Probabilities (RRWP).

Identical to KNNGraph but with five additional fields:

abs_pe = graph['abs_pe'] # Absolute positional encoding
rrwp_index = graph['rrwp_index'] # rrwp index (which is used for slicing the vals?)
rrwp_val = graph['rrwp_val'] # rrwp values
degree= graph['deg'] # Degree of each node (num. of incoming edges)
log_degree = graph['log_deg'] # Equal to torch.log10(graph['deg'] + 1)
"""

RasmusOrsoe · 2025-01-02T14:15:52Z

src/graphnet/models/graphs/graphs.py

+        return graph
+
+
+class KNNGraphNoPE(GraphDefinition):


Considering the very high level of similarity between this and KNNGraph, perhaps it would be beneficial to introduce a new argument to KNNGraph that would toggle between the vanilla KNNEdges and your new KNNDistanceEdges. For example, one could invent the argument distance_as_edge_features: bool = False (defaults to KNNEdges) and use KNNDistanceEdges if True.

RasmusOrsoe · 2025-01-02T14:20:36Z

src/graphnet/models/components/layers.py

+            use_bias: Apply bias the key and value linear layers.
+            clamp: Clamp the absolute value of the attention scores to a value.
+            dropout: Dropout layer probability.
+            activation: Activation function.


Could mention here that the activation function is assumed to be un-instantiated. E.g.

""" activation: Reference to uninstantiated activation function. E.g. `torch.nn.LeakyReLU` """

RasmusOrsoe · 2025-01-02T14:21:21Z

src/graphnet/models/components/layers.py

+        rezero: bool = False,
+        enable_edge_transform: bool = True,
+        attn_bias: bool = False,
+        attn_dropout: float = 0.0,  # CHECK


Forgot a comment here

RasmusOrsoe · 2025-01-02T14:24:14Z

src/graphnet/models/components/layers.py

+                if norm_edges
+                else nn.Identity()
+            )
+        else:  # TODO: Maybe just set this to nn.Identity. -PW


I think preferred to raise the error if the user passes a non-compatible layer instead of passing warnings and defaulting to identity

RasmusOrsoe · 2025-01-02T14:31:34Z

src/graphnet/models/components/layers.py

+        """Forward pass."""
+        x = data.x
+        num_nodes = data.num_nodes
+        log_deg = get_log_deg(data)


Was there some particular reason why you needed the utility function to grab or calculate this quantity?

Naively I would've thought the degree could've been calculated directly in the forward pass like so:

from torch_geometric.utils import degree log_deg = torch.log10(degree(data.edge_index[0]) + 1)

Doing it there would save you from needing the utility function and storing the log of the degree in the graph objects during the data loading

RasmusOrsoe · 2025-01-02T14:33:53Z

src/graphnet/models/components/layers.py

+        norm: nn.Module = nn.BatchNorm1d,
+        residual: bool = True,
+        deg_scaler: bool = True,
+        activation: nn.Module = nn.ReLU,


Could mention here that the activation function is assumed to be un-instantiated. E.g.

""" activation: Reference to uninstantiated activation function. E.g. `torch.nn.LeakyReLU` """

RasmusOrsoe · 2025-01-02T14:34:02Z

src/graphnet/models/components/layers.py

+            norm: Normalization layer.
+            residual: Apply residual connections.
+            deg_scaler: Apply degree scaling after MHA.
+            activation: Activation function.


Could mention here that the activation function is assumed to be un-instantiated. E.g.

""" activation: Reference to uninstantiated activation function. E.g. `torch.nn.LeakyReLU` """

RasmusOrsoe · 2025-01-02T14:35:05Z

src/graphnet/models/gnn/grit.py

+        edge_enhance: bool = True,
+        update_edges: bool = True,
+        attn_clamp: float = 5.0,
+        activation: nn.Module = nn.ReLU,


Could mention here that the activation function is assumed to be un-instantiated. E.g.

""" activation: Reference to uninstantiated activation function. E.g. `torch.nn.LeakyReLU` """

RasmusOrsoe · 2025-01-02T14:35:57Z

src/graphnet/models/gnn/grit.py

+        add_node_attr_as_self_loop: bool = False,
+        dropout: float = 0.0,
+        fill_value: float = 0.0,
+        norm: nn.Module = nn.BatchNorm1d,


Would be informative to mention here that the normalization layer is assumed to be un-instantiated. E.g:

norm: Uninstantiated normalization layer. Must be either BatchNorm1d or BatchNorm1d

RasmusOrsoe · 2025-01-02T14:36:20Z

src/graphnet/models/components/layers.py

+            dim_in: Input dimension.
+            dim_out: Output dimension.
+            L: Number of hidden layers.
+            activation: Activation function.


Could mention here that the activation function is assumed to be un-instantiated. E.g.

""" activation: Reference to uninstantiated activation function. E.g. `torch.nn.LeakyReLU` """

RasmusOrsoe · 2025-01-02T14:37:28Z

src/graphnet/models/components/layers.py

+            dim_out: Output dimension.
+            L: Number of hidden layers.
+            activation: Activation function.
+            pooling: Pooling method.


We could point out the supported methods in the doc string. Perhaps like this:

""" pooling: Node-wise pooling operation. Either "mean" or "add". """

RasmusOrsoe

Hi @pweigel, thank you for this very clean contribution! 🎸

I have added superficial comments, mostly about commented-out code and doc strings.

Have you, by chance, tested these different initial representations and their impact on the performance of the GRIT model in a neutrino telescope setting? The original authors appear to favor RRWP over RWSE (I presume they conclude RRPW > RWSE > NoPE), but that may be very problem-dependent.

pweigel · 2025-01-10T17:34:09Z

Hey @RasmusOrsoe, thanks for taking a look. I think I've made all of the requested changes (and fixed a few other minor things). I haven't had the chance to benchmark the different encodings yet, but I plan to. The RRWP encodings are a bit memory-hungry, so I haven't had the chance to fully train a model beyond some tests to show that it works. Without the encodings, I've trained some models that look very good.

At some point in the near future, we should consider a better method of adding the different attributes (graph.encoding) in a modular fashion that doesn't require a new graph object. It could even be added as a part of GraphDefinition, where you pass some pos_encoding=MyGraphPosEncoding() and it uses the values/indices from the edge and node definitions to add the new attributes. It might be beyond the scope of this PR, but it would definitely be an enhancement.

RasmusOrsoe · 2025-01-16T12:01:51Z

Hey @RasmusOrsoe, thanks for taking a look. I think I've made all of the requested changes (and fixed a few other minor things). I haven't had the chance to benchmark the different encodings yet, but I plan to. The RRWP encodings are a bit memory-hungry, so I haven't had the chance to fully train a model beyond some tests to show that it works. Without the encodings, I've trained some models that look very good.

At some point in the near future, we should consider a better method of adding the different attributes (graph.encoding) in a modular fashion that doesn't require a new graph object. It could even be added as a part of GraphDefinition, where you pass some pos_encoding=MyGraphPosEncoding() and it uses the values/indices from the edge and node definitions to add the new attributes. It might be beyond the scope of this PR, but it would definitely be an enhancement.

Thanks! I think your idea of adding a positional encoding as a separate module, and storing its values in a dedicated field in the graph structures, is good! I also think this is the intended usage in PyG (see here). We've not utilized this in the past because the distinction between having the position as a node feature or as a separately accessible graph feature didn't matter much for the GNN applications we've had so far. I think your use-case is a good example of how it can be beneficial!

It looks like the GitHub rollout to the new ubuntu version has affected your PR checks. @Aske-Rosted has solved this in the main branch (see #779).

Could you merge the main branch into yours, so the checks pass? The checks should pass after that 💪

RasmusOrsoe

🚀

pweigel added 15 commits December 14, 2024 08:05

Add GRIT layers and model

b2a8b25

Move linear encoders to embedding.py

c7b8dba

Add GRIT to __init__.py

4d6ad04

Fix imports, add RRWP utils and graph

8a8c9fb

Cleaning up grit model/layers

3d52f70

Cleaning up GRIT layers

4fecd5b

Remove duplicate pyg_softmax function

02d4a0d

Merge the attention calc into forward, reduce amount of saved data

924d165

Significant improvements to naming, docstrings

7ccb6f0

Remove TODO

bb232e9

Fix normalization layers

1a2d80c

Updating new graph/edge definitions

5c4b0aa

Added example training script, bug fixes

c861d9d

Improving SANGraphHead and simplify dims

e540b0e

Merge branch 'graphnet-team:main' into grit

1e46b75

pweigel requested review from RasmusOrsoe and Aske-Rosted December 17, 2024 20:16

Add newline to make flake8 happy

48d5e85

Improvements to GRIT arguments, added new position encodings, fixed p…

68634cc

…ickling

RasmusOrsoe reviewed Jan 2, 2025

View reviewed changes

RasmusOrsoe requested changes Jan 2, 2025

View reviewed changes

pweigel added 5 commits January 6, 2025 14:58

Updating docstrings, removing old comments

9673f50

More docstring fixes, simplifying graph calculations

aa70dc0

Remove KNNGraphNoPE, add distance setting for KNNGraph

417ad3b

Remove NoPE from __init__.py

422c182

Fix log_deg shape

f896062

Merge branch 'graphnet-team:main' into grit

acab6e5

RasmusOrsoe self-requested a review January 21, 2025 11:29

RasmusOrsoe approved these changes Jan 21, 2025

View reviewed changes

pweigel merged commit 79d7baf into graphnet-team:main Jan 21, 2025
13 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GRIT model #777

Add GRIT model #777

pweigel commented Dec 17, 2024 •

edited

Loading

pweigel commented Dec 24, 2024

RasmusOrsoe Jan 2, 2025

RasmusOrsoe Jan 2, 2025

RasmusOrsoe Jan 2, 2025

RasmusOrsoe Jan 2, 2025

RasmusOrsoe Jan 2, 2025

RasmusOrsoe Jan 2, 2025

RasmusOrsoe Jan 2, 2025

RasmusOrsoe Jan 2, 2025

RasmusOrsoe Jan 2, 2025

RasmusOrsoe Jan 2, 2025

RasmusOrsoe Jan 2, 2025

RasmusOrsoe Jan 2, 2025

RasmusOrsoe Jan 2, 2025

RasmusOrsoe Jan 2, 2025

RasmusOrsoe Jan 2, 2025 •

edited

Loading

RasmusOrsoe Jan 2, 2025

RasmusOrsoe Jan 2, 2025

RasmusOrsoe Jan 2, 2025

RasmusOrsoe Jan 2, 2025

RasmusOrsoe Jan 2, 2025

RasmusOrsoe Jan 2, 2025 •

edited

Loading

RasmusOrsoe left a comment

pweigel commented Jan 10, 2025

RasmusOrsoe commented Jan 16, 2025

RasmusOrsoe left a comment


		class KNNGraphRWSE(GraphDefinition):
		"""KNN Graph with random walk structural encoding."""



		class KNNGraphRRWP(GraphDefinition):
		"""KNN Graph with relative random walk probabilities."""

Add GRIT model #777

Add GRIT model #777

Conversation

pweigel commented Dec 17, 2024 • edited Loading

GRIT: "Graph Inductive Biases in Transformers without Message Passing"

pweigel commented Dec 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RasmusOrsoe Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RasmusOrsoe Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

RasmusOrsoe left a comment

Choose a reason for hiding this comment

pweigel commented Jan 10, 2025

RasmusOrsoe commented Jan 16, 2025

RasmusOrsoe left a comment

Choose a reason for hiding this comment

pweigel commented Dec 17, 2024 •

edited

Loading

RasmusOrsoe Jan 2, 2025 •

edited

Loading

RasmusOrsoe Jan 2, 2025 •

edited

Loading