You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the paper "Demystifying Neural Style Transfer", there might be a mistake, which will make Equation (8) incorrect.
For a layer L (in the paper, the authors used the lowcase L) in the loss network, NL is the number of feature maps in layer L. All the feature maps in layer L have the same size for a given input image.
Given different input images of different sizes, the size of those feature maps at the same layer will be different. For example, if the style image is 512x512 and the content image is 256x256, the size of a feature map of the style image at layer 4_2 (use VGG-19 as an example) will be 4 times of the feature map of the content image at layer 4_2.
On the right column of page 2 of the paper, ML is the size of a feature map at layer L for the content image and the generated image. For the style image, the size of a feature map at layer L typically is different. Therefore, the size of matrix to save the activations of the style image at layer L cannot be NL x ML.
If my understanding is correct, then the deduction in Equation (8) is incorrect.
The text was updated successfully, but these errors were encountered:
Hi, it should be noted that we assume that the style image and content image have the same shape at the first of section 3 in the paper. So there does not exist your problem of different size. In the implementation, we also resize the style image to have the same shape with content image.
I think resizing style image would not affect the performance. At the same time, if using different sizes, the Equations and conclusions in the paper are still right. We only need to split M_l to M_l^1 and M_l^2 and rewrite the equations, since in the original MMD (equation 1), X and Y can have different samples.
In the paper "Demystifying Neural Style Transfer", there might be a mistake, which will make Equation (8) incorrect.
For a layer L (in the paper, the authors used the lowcase L) in the loss network, NL is the number of feature maps in layer L. All the feature maps in layer L have the same size for a given input image.
Given different input images of different sizes, the size of those feature maps at the same layer will be different. For example, if the style image is 512x512 and the content image is 256x256, the size of a feature map of the style image at layer 4_2 (use VGG-19 as an example) will be 4 times of the feature map of the content image at layer 4_2.
On the right column of page 2 of the paper, ML is the size of a feature map at layer L for the content image and the generated image. For the style image, the size of a feature map at layer L typically is different. Therefore, the size of matrix to save the activations of the style image at layer L cannot be NL x ML.
If my understanding is correct, then the deduction in Equation (8) is incorrect.
The text was updated successfully, but these errors were encountered: