Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thank you for your wonderful work! #56

Open
lyf1212 opened this issue Mar 5, 2024 · 0 comments
Open

Thank you for your wonderful work! #56

lyf1212 opened this issue Mar 5, 2024 · 0 comments

Comments

@lyf1212
Copy link

lyf1212 commented Mar 5, 2024

It's so amazing for me to see such wonderful editing results just by a finger drag! Simple but effective work, sharp observation in UNet! As a junior student major in CS, I have some naive questions about this work:

  1. Why do you choose LoRA to maintain identity against the distortion during the editing? Have you tried some other common training tricks, for example, to finetune the VAE encoder/decoder?
  2. From my perspective, the modification in Diffusion latents space during sampling is unexplainable especially in some big 't', because the SNR is so low. So I am curious what will it be like in the ablation results Fig(6) in your arXiv paper without LoRA finetuning and t=50, will it be damaged and out of the natural image distribution?
  3. Your work probably claims that spatial information and semantic feature is somehow correlated in the latent space of Diffusion models. So will the merging of two images smoothly through latent space be practical? Have you any proposals in this field now or in the future?
    Some of my questions are too naive. But if you can spend some time to reply one or two of them, I will be very appreciate it~
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant