Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/hyperv-guest: drop fb_hyperv in favor of drm_hyperv #372743

Merged
merged 3 commits into from
Jan 17, 2025

Conversation

K900
Copy link
Contributor

@K900 K900 commented Jan 10, 2025

Fixes #372703.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@K900 K900 requested review from alyssais and Ma27 January 10, 2025 20:37
@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 6.topic: kernel The Linux kernel 8.has: documentation This PR adds or changes documentation 8.has: module (update) This PR changes an existing module in `nixos/` labels Jan 10, 2025
@K900
Copy link
Contributor Author

K900 commented Jan 10, 2025

Can confirm this allows my Hyper-V VMs to run Wayland just fine.

@me-and
Copy link
Contributor

me-and commented Jan 10, 2025

Testing now, although rebuilding the kernel means getting the new image is taking a while. Given it's my work machine that I'm using Hyper-V on, it might not be until Monday that I can confirm the fix works for me.

@K900
Copy link
Contributor Author

K900 commented Jan 10, 2025

You can just blacklist hyperv_fb instead, it should have the same effect.

@@ -525,6 +525,8 @@ let
DRM_I915_GVT_KVMGT = module;
# Enable Hyper-V Synthetic DRM Driver
DRM_HYPERV = whenAtLeast "5.14" module;
# And disable the legacy framebuffer driver
FB_HYPERV = no;
Copy link
Contributor

@Shawn8901 Shawn8901 Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i understand correctly FB_HYPERV gets replaced by DRM_HYPERV and is disabled before.
As we still have 5.4 and 5.10 kernels in nixpkgs, does it make sense to just disable it, if FB_HYPERV is available (so past 5.14)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does. I'll update this after we confirm it works.

@@ -525,6 +525,8 @@ let
DRM_I915_GVT_KVMGT = module;
# Enable Hyper-V Synthetic DRM Driver
DRM_HYPERV = whenAtLeast "5.14" module;
# And disable the legacy framebuffer driver when we have the new one
FB_HYPERV = whenOlder "5.14" yes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does setting it to yes sometimes disable it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, I guess we have to explicitly force it to no on newer ones...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK just forcing it to no for newer ones and doing nothing for older ones does what we want.

@alyssais
Copy link
Member

Kernel config change looks good.

@me-and
Copy link
Contributor

me-and commented Jan 14, 2025

Looks like the force-push to https://github.com/NixOS/nixpkgs/compare/54543884d9965bbf2e66c87fc075cf01d953877c..94e1e0a173482a65bb1a6ca86e95b4265c9e2bc4 has picked up a bunch of changes that weren't meant to be there?

I've been testing with the fix per ada2811, which I think should cover my scenario. It's definitely working better now: the greeter loads and I can log in and control the VM screen resolution through the Plasma settings.

However, after logging in then rebooting, on subsequent reboots, clicking the ">" login button or pressing Enter at the login prompt shows the black KDE loading screen briefly, before swapping to tty1; the desktop never loads. Pressing Alt+F7 loads a systemd log that ends at [ OK ] Started Session 2 of User root.

Very happy to repro and collect logs, but I'm not sure what logs would be useful here!

@K900
Copy link
Contributor Author

K900 commented Jan 14, 2025

Looks like the force-push to https://github.com/NixOS/nixpkgs/compare/54543884d9965bbf2e66c87fc075cf01d953877c..94e1e0a173482a65bb1a6ca86e95b4265c9e2bc4 has picked up a bunch of changes that weren't meant to be there?

Those are changes from master, which I rebased on.

However, after logging in then rebooting, on subsequent reboots, clicking the ">" login button or pressing Enter at the login prompt shows the black KDE loading screen briefly, before swapping to tty1; the desktop never loads. Pressing Alt+F7 loads a systemd log that ends at [ OK ] Started Session 2 of User root.

Plasma is not designed to run as root, please test as a non-root user.

@me-and
Copy link
Contributor

me-and commented Jan 15, 2025

Looks like the force-push to https://github.com/NixOS/nixpkgs/compare/54543884d9965bbf2e66c87fc075cf01d953877c..94e1e0a173482a65bb1a6ca86e95b4265c9e2bc4 has picked up a bunch of changes that weren't meant to be there?

Those are changes from master, which I rebased on.

Ah, of course! Apologies!

However, after logging in then rebooting, on subsequent reboots, clicking the ">" login button or pressing Enter at the login prompt shows the black KDE loading screen briefly, before swapping to tty1; the desktop never loads. Pressing Alt+F7 loads a systemd log that ends at [ OK ] Started Session 2 of User root.

Plasma is not designed to run as root, please test as a non-root user.

Initial test had the same behaviour: first login as a regular user seemed fine, subsequent attempts to log in instead dropped me to tty1.

For the sake of reproducibility, I've created https://github.com/me-and/nixpkgs-372703-testing and I'm retesting with the images generated there now.

@K900
Copy link
Contributor Author

K900 commented Jan 15, 2025

So I've done some more testing, and it looks like it's just pretty inconsistent with when it decides to start. I think I see some changes that could be relevant in newer kernels, so I'm currently trying again with 6.12.

@K900
Copy link
Contributor Author

K900 commented Jan 15, 2025

OK, I'm pretty sure it's some kind of weird race condition, and I can't reproduce it on 6.12 despite logging in and out like 20 times in a row.

@me-and
Copy link
Contributor

me-and commented Jan 16, 2025

I've just built fixed-kernel-6_12 at https://github.com/me-and/nixpkgs-372703-testing, which I believe should create a Hyper-V image using the 6.12 kernel – certainly uname -a from a tty reports that – but with that image, I still hit the no-GUI-after-initial-login problem two times out of two tests :(

@K900
Copy link
Contributor Author

K900 commented Jan 16, 2025

Just to clarify, how are you getting to the GUI? Are you logging out? Soft rebooting the OS? Hard rebooting the VM?

@me-and
Copy link
Contributor

me-and commented Jan 17, 2025

I had been doing a mix of hard reboots, soft reboots and logouts; I think when I was testing with the latest kernel, both tests I did were with hard reboots.

I've been testing with my work computer, because that's the one I have most easily set up to run Hyper-V, but the way its locked down is a bit tedious. Today I did the same tests but running the NixOS VM nested inside a less locked down Windows VM. Everything seems to be working just fine in that environment using the image with the updated Linux kernel, with both soft and hard reboots and just by logging out, so at least part of the problem is clearly in my environment.

The most obvious environmental difference is that yesterday, Hyper-V Manager was generating VMs using Hyper-V configuration version 11.0, but the working case is using configuration version 12.0.

I'm going to do some more differential testing today to try to narrow down where the difference comes from and get some more confidence it's an issue with my environment (or possibly me testing things incompetently) rather than with the fix.

@K900
Copy link
Contributor Author

K900 commented Jan 17, 2025

It's also definitely not the first time a Windows update changes VM behavior...

@me-and
Copy link
Contributor

me-and commented Jan 17, 2025

Problem reliably reproduces with the updated kernel image on my system running Windows 11 Enterprise 23H2 build 22631.4751 with experience pack 1000.22700.1055.0. This machine is fairly stringently locked down; outside of specific approved interfaces (such as for setting up and running Hyper-V VMs) I don't have admin access to it.

Problem does not reproduce with the updated kernel image on my system running Windows 11 Enterprise 24H2 build 26120.2705 with experience pack 1000.26100.42.0. This machine is itself a VM, with the machine in the previous paragraph as a host with nested virtualisation enabled. I am a system administrator on this machine.

Notably, generating a version 11.0 VM on the first machine, exporting it and copying it to the second machine – so it's still a version 11.0 VM rather than the version 12.0 that the second machine would create itself – has things working. That implies whatever the relevant difference is, it's not the VM configuration version but something else about the host system. As you say, it's not the first time VM behaviour changes between Windows versions, but I am surprised by the difference in behaviour!

I'm currently rebuilding test images for the broken and fixed-but-without-the-updated-kernel cases to run on the second machine, just to check if I'm able to reproduce the original problem there.

@K900
Copy link
Contributor Author

K900 commented Jan 17, 2025

For reference, the machine I'm testing on has all the latest updates installed. winver identifies as "Version 24H2 (OS Build 26100.2894)".

@me-and
Copy link
Contributor

me-and commented Jan 17, 2025

Okay, I've confirmed that on the 24H2 system the broken image doesn't load the GUI, so testing the fix on this host is meaningful. Even without the kernel update, I've not been able to reproduce the problem only the first login getting to the Plasma desktop, so the fix seems good.

Clearly whatever's going wrong here is at least in part down to the Windows version. Which is annoying for me, but it looks like this fix works just fine on up-to-date Windows Hyper-V hosts.

I'm not sure if that means #368084 ought to be backed out of 24.11 given it's not a backwards-compatible change for at least some Hyper-V users. I don't have any sense of (a) the number of people affected by the problems that was due to fix versus the number of people affected by the problem I hit, or (b) the NixOS community consensus on balancing regressions for some people versus fixes for others for stable releases.

@K900 K900 mentioned this pull request Jan 17, 2025
13 tasks
@K900 K900 merged commit ae2abfc into NixOS:master Jan 17, 2025
27 of 29 checks passed
@K900 K900 added the backport release-24.11 Backport PR automatically label Jan 17, 2025
@nixpkgs-ci
Copy link
Contributor

nixpkgs-ci bot commented Jan 17, 2025

Backport failed for release-24.11, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin release-24.11
git worktree add -d .worktree/backport-372743-to-release-24.11 origin/release-24.11
cd .worktree/backport-372743-to-release-24.11
git switch --create backport-372743-to-release-24.11
git cherry-pick -x c00e9ebd9d2a931518b2346bf25b7648c49c8344 b20e6abfaf51eec59a154672344d0184e48840da 94e1e0a173482a65bb1a6ca86e95b4265c9e2bc4

1 similar comment
@nixpkgs-ci
Copy link
Contributor

nixpkgs-ci bot commented Jan 17, 2025

Backport failed for release-24.11, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin release-24.11
git worktree add -d .worktree/backport-372743-to-release-24.11 origin/release-24.11
cd .worktree/backport-372743-to-release-24.11
git switch --create backport-372743-to-release-24.11
git cherry-pick -x c00e9ebd9d2a931518b2346bf25b7648c49c8344 b20e6abfaf51eec59a154672344d0184e48840da 94e1e0a173482a65bb1a6ca86e95b4265c9e2bc4

K900 added a commit that referenced this pull request Jan 17, 2025
dustypomerleau pushed a commit to dustypomerleau/nixpkgs that referenced this pull request Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: kernel The Linux kernel 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: documentation This PR adds or changes documentation 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 1-10 10.rebuild-linux: 2501-5000 backport release-24.11 Backport PR automatically
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plasma6 on Hyper-V is broken by switch to Wayland
4 participants