You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the Appendix A.1 of your paper it is stated that the skip connections concatenate the input
with the output from the previous layer. But in the visualization in figure 8 as well as the
provided code the connection is instead from the original input to all the layers.
(Not from each previous layer).
Intuitively, I would think that the type of residual connection described in the text
would make more sense for the Latent DDIM. It would function similar to Transformers but
without the Multi-Head Attention. I would think that the skips in the code and the figure
are more suited for cases in which information is compressed by the layers, which is not the case
in the Latent DDIM.
I would be very thankful for your insight on this topic.
Kind regards, Anthony Mendil.
The text was updated successfully, but these errors were encountered:
I'm sorry if the sentence: "Each layer of the MLP has a skip connection from the input" is ambiguous. But it has the meaning as you understood, concatenating from the original input to each of the layers!
I have tried residual connection for sure. I found concatenation works better. Another point, architecture details seem to matter less than the amount of parameters in the latent DDIM. This is mostly a hunch, I have no solid evidence for that though.
Thanks for the clarification and your insight!
The contradiction was rather in the next part of the sentence:
"which simply concatenates the input with the output from the previous layer".
In the Appendix A.1 of your paper it is stated that the skip connections concatenate the input
with the output from the previous layer. But in the visualization in figure 8 as well as the
provided code the connection is instead from the original input to all the layers.
(Not from each previous layer).
Intuitively, I would think that the type of residual connection described in the text
would make more sense for the Latent DDIM. It would function similar to Transformers but
without the Multi-Head Attention. I would think that the skips in the code and the figure
are more suited for cases in which information is compressed by the layers, which is not the case
in the Latent DDIM.
I would be very thankful for your insight on this topic.
Kind regards, Anthony Mendil.
The text was updated successfully, but these errors were encountered: