Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-pause for Captured Surface Switching (2nd edition) #15

Open
tovepet opened this issue Oct 21, 2024 · 50 comments
Open

Auto-pause for Captured Surface Switching (2nd edition) #15

tovepet opened this issue Oct 21, 2024 · 50 comments

Comments

@tovepet
Copy link

tovepet commented Oct 21, 2024

Note: This is a new issue separated out from #4

Some browsers allow users to switch an ongoing screen-capture to instead share another surface. When this happens, applications may need to react to the change of surface to, e.g.,

  • adapt the UI for different surface types
  • set up crop targets for Region Capture

For some applications it is also important to separate frames delivered for one surface from frames delivered for another source, e.g., if recording different surfaces to different files.

To solve these issues, we propose the following:

  1. An event is added to the CaptureController that is emitted each time a user has switched to a new captured surface:
controller.addEventHandler('surfaceswitch', ...);
  1. An autoPause option is added to getDisplayMedia:
getDisplayMedia({..., autoPause: true});

If autoPause is true, a surface switch goes through the following steps:

  1. Stop emitting frames on all tracks associated with the capture.
  2. Set enabled = false for all tracks associated with the capture.
  3. Update all tracks associated with the capture to the new surface.
  4. Fire the surfaceswitch event.
@eladalon1983
Copy link
Member

eladalon1983 commented Oct 21, 2024

Clarification - the surfaceswitch EventHandler will fire even if autoPause: false, right? (The text implies this, as does common sense. Just making sure.)

@tovepet
Copy link
Author

tovepet commented Oct 22, 2024

Yes, the event will always fire. I just added it to the list of steps for auto-pause to show when it is supposed to happen in that case.

@tovepet
Copy link
Author

tovepet commented Oct 23, 2024

@jan-ivar, @youennf, do you think we can move ahead and create a PR based on this proposal?

@youennf
Copy link

youennf commented Oct 23, 2024

A few questions:

  • Why put the event in controller instead of a MediaStreamTrack? The plan seems to set enabled to false/true at track level. There is the case of two MediaStreamTrack clones one in a worker and the other in a window and it might be more handy for each track to be notified of surface switching.
  • Why do we have to set enabled to false in step 2? Why not simply asking the web page to set enabled to false synchronously for each MediaStreamTrack within the event handler (or call replaceTrack to null) or whatever most appropriate.
  • If we have autopause, why not using configurationchange which will allow having the right information via getSettings: the surface type, maybe deviceId...

I would tend to go with the following approach:

  • Add autopause to getDisplayMedia. (Or maybe pauseOnSurfaceSwitching?)
  • When autopause is true, and capture switches to a surface S, sinks of a track are guaranteed to not receive video frames from S until the configurationchange event is fired on that particular track.

Wdyt?

@youennf
Copy link

youennf commented Oct 23, 2024

FWIW, I'd like to go to a place where when sink and track are synchronised (MSTP and MST in a worker), we would guarantee that frames emition and events are synchronised.
This would mean that MSTP would be guaranteed to not receive new video frames before the corresponding event is fired. Ditto for applyConstraints promise resolution, mute/unmute events and so on...

@eladalon1983
Copy link
Member

eladalon1983 commented Oct 23, 2024

Youenn, something to consider - if a track is in X and its clone is in Y:

  • If X != Y, like two documents or a worker or whatever, then firing distinct events for them does make sense and makes the the app's life easier.
  • If X == Y, then firing two distinct events is awkward, as the app will need to know to ignore the second event (and potentially the third and fourth). Correlating events between different objects, where there is no clear specification of which events happen first, and there could be intervening other configuration-change events... Sounds like some work.

@eladalon1983
Copy link
Member

To clarify the second bullet in my previous comment with an example - assume the app wants to show the user a dialog reading "are you sure?" You don't want to do it for every clone. So you need to catch the first event, whichever track it fires on, and then know to ignore subsequent events on other tracks, while not accidentally missing truly new events on any of these tracks.

In that case, having an event handler on the controller is easier.

@youennf
Copy link

youennf commented Oct 23, 2024

  • firing two distinct events is awkward

Say X is used for local playback and Y is used for sending to the network, we will want X to continue displaying (ignore the event) and Y to stop sending (register the event and call replaceTrack to null or a synthetic freeze track). It seems good to have 2 different events here. Also, this would allow the worker to receive video frames without waiting for the window event to be fired.

Say now we only have X which is transfered to a worker.
The event will fire where it is the least useful.
There is no way for the app to react synchronously. Hence the need in the proposal to set track.enabled to false, which is supposed to be in full control of the web application, not the UA. And the need to somehow handle transferable clones.

Wrt dealing with several events, this is not too difficult to manage:

  1. Do surface switch checks/event listening in a single track (say the first one used for local preview)
  2. Coalesce all events of same context track clones in a single callback, for instance:
track.onconfigurationchange = e => {
  if (captureController.currentDeviceId == track.getSettings().deviceId)
    return;
  captureController.currentDeviceId = track.getSettings().deviceId;
  // Handle surface switching
  notifyUserOfSurfaceSwitching();
}

@jan-ivar
Copy link
Member

Let me comment on the OP first to narrow down what is being proposed.

  1. An event is added to the CaptureController that is emitted each time a user has switched to a new captured surface:
controller.addEventHandler('surfaceswitch', ...);

Do you mean addEventListener? (EventHandlers are attributes the guidelines recommend adding as well, controller.onsurfaceswitch in our case.)

  1. Set enabled = false for all tracks associated with the capture.

This seems like a foot gun, and violates Media Capture and Streams which designates muted for UA control, and leaves enabled "available to the application to control".

Having the browser set this attribute can race with the application. Worst case, mistakenly inverted application logic might cause others to see the user when they shouldn't.

Other ideas we discussed (and I expressed support for), like postSurfaceSwitchingHandler does not have this problem. In that model, no state is touched, the default behavior is to inject (which I prefer), with late JS action required to pause. Was it considered?

  1. Update all tracks associated with the capture to the new surface.

Step 2 and 3 seem to invite confusion over whether a source or a track feature is being proposed.

@jan-ivar
Copy link
Member

Also, what happened to avoiding black frames?

@tovepet
Copy link
Author

tovepet commented Oct 23, 2024

Replying to @youennf

  • Why put the event in controller instead of a MediaStreamTrack? The plan seems to set enabled to false/true at track level. There is the case of two MediaStreamTrack clones one in a worker and the other in a window and it might be more handy for each track to be notified of surface switching.

I put it on the controller as the default place for screen-capture-related functionality, but the track might be a better place for it. I think it should be manageable for applications centralize the control of when tracks should be re-enabled to one of the event-listeners as you propose when it is needed.

  • Why do we have to set enabled to false in step 2? Why not simply asking the web page to set enabled to false synchronously for each MediaStreamTrack within the event handler (or call replaceTrack to null) or whatever most appropriate.

Yes, that might be better. That would also allow applications to not disable tracks where it is not needed.

  • If we have autopause, why not using configurationchange which will allow having the right information via getSettings: the surface type, maybe deviceId...

Yes, that might be better. If going with configurationchange, what mechanism would you suggest to detect that a source-switch has occurred?

I would tend to go with the following approach:

  • Add autopause to getDisplayMedia. (Or maybe pauseOnSurfaceSwitching?)

I like the more explicit pauseOnSurfaceSwitching.

  • When autopause is true, and capture switches to a surface S, sinks of a track are guaranteed to not receive video frames from S until the configurationchange event is fired on that particular track.

Yes, I think this should work.

Replying to @jan-ivar

Let me comment on the OP first to narrow down what is being proposed.

  1. An event is added to the CaptureController that is emitted each time a user has switched to a new captured surface:
controller.addEventHandler('surfaceswitch', ...);

Do you mean addEventListener? (EventHandlers are attributes the guidelines recommend adding as well, controller.onsurfaceswitch in our case.)

Yes, sorry, I meant addEventListener

  1. Set enabled = false for all tracks associated with the capture.

This seems like a foot gun, and violates Media Capture and Streams which designates muted for UA control, and leaves enabled "available to the application to control".

Having the browser set this attribute can race with the application. Worst case, mistakenly inverted application logic might cause others to see the user when they shouldn't.

What do you think about Youenn’s suggestion of letting the application set enabled = false? I think that should solve this issue?

Other ideas we discussed (and I expressed support for), like postSurfaceSwitchingHandler does not have this problem. In that model, no state is touched, the default behavior is to inject (which I prefer), with late JS action required to pause. Was it considered?

Yes, it was considered but as I replied to you in that comment thread, I think an event-based model using the enabled-state might be a better option:

  • It uses events rather than callbacks, which I believe is preferable.
  • I think it’s simpler to let the application change the enabled-state rather than manipulating promises.

Step 2 and 3 seem to invite confusion over whether a source or a track feature is being proposed.

Yes, it might be better and more consistent to have the event on the MediaStreamTrack as Youenn suggests.

@jan-ivar
Copy link
Member

postSurfaceSwitchingHandler exposed new functionality to temporarily halt frame delivery. track.enabled = false in contrast creates black frames.

It also allowed asynchronous delay, e.g. to support awaiting replaceTrack(filler), which Youenn's suggestion does not.

Should we perhaps nail down desired behaviors first?

@youennf
Copy link

youennf commented Oct 23, 2024

If we go down with asynchronicity, I would go with a callback at capture controller level. I described some variants at: #4 (comment).

If we prefer synchronicity, I would go with configurationchange on the track, which seems more convenient since using getSettings() might be helpful for the web application to decide what to do: continue, stop, prompt user...

@eladalon1983
Copy link
Member

I like the more explicit pauseOnSurfaceSwitching.

I think this specificity is great; otherwise, one might expect it to trigger on navigation too.

This seems like a foot gun,

Why is it a footgun?

and violates Media Capture and Streams which designates muted for UA control, and leaves enabled "available to the application to control".

I don't think that's a contradiction here. If the UA were to set enabled to true, that would be horrible. But setting it to false under clear circumstances, seems like a great way to manipulate the track in a self-consistent manner that meshes well with other API surfaces and parts of the spec. The alternatives all call for additional complexity.

@tovepet
Copy link
Author

tovepet commented Oct 24, 2024

The postSurfaceSwitchingHandler proposal as I understand it:

One or more handlers can be added to the CaptureController:

async function handler() {
  // Re-register the handler:
  controller.addPostSurfaceSwitchingHandler(handler);
  // Return a promise that is resolved when setup has finished.
};
controller.addPostSurfaceSwitchingHandler(handler);

If there is at least one handler registered, a surface switch goes through the following steps:

  1. Stop emitting frames on all tracks associated with the capture.
  2. Update all tracks associated with the capture to the new surface.
  3. Remove all handlers from the list of handlers, and keep them in a temporary list.
  4. Call the removed handlers and collect the returned promises.
  5. Resume emitting frames on all tracks associated with the capture when Promise.all(allCallbackResults) has been resolved.

@jan-ivar,
Does this agree with your idea of how the postSurfaceSwitchingHandler should work?

@youennf,

  1. What do you think about this proposal?
  2. If we instead pursue the onconfigurationchange approach, how do you envision that the application would know that a surface-switch has occurred and not some other change?

@youennf
Copy link

youennf commented Oct 25, 2024

  1. What do you think about this proposal?

I think this proposal is ok. No need for a boolean flag in this case.
I am not sure rvfc model (multiple callbacks, readd them when called) is necessary here.
I would simplify the API to something like:

callback PostXXXCallback Promise<any> ();
undefined setPostSurfaceSwitchingHandler(PostXXXCallback? callback);
  1. If we instead pursue the onconfigurationchange approach, how do you envision that the application would know that a surface-switch has occurred and not some other change?

We would expose deviceId through getSettings, deviceId being tied to a surface.
Changing of deviceId means a surface switch. See the example in #15 (comment).

My assessment is that both approaches solve the issue.
I lean towards autopause/onconfigurationchange since it is smaller and seems good enough:

  • The callback API is bigger. And we might need to expose information to the callback that getSettings already exposes.
  • Whether we have a callback or not, configurationchange will fire in case of surface switch.
  • With the callback approach, we would need to decide whether configurationchange happens before this callback or not. It would somehow make sense to have configurationchange before so that getSettings contain useful information, but delaying the callback is not great, especially if we have clones in two contexts.
  • A track based approach is somehow more convenient for web developers: pausing happens per track's sinks, wherever the track is. For instance, one could let the preview track always unpause while having the sender's track being suspended until user clicks on some UI.
  • There are some advantages to a single decision place, it is somehow less error-prone if user forgets about one track that should be paused.
  • For exposing new tracks, I would guess a separate callback would be more convenient. Web app might not want auto pause but could be interested in new tracks. Also, audio or video tracks could be added outside of surface switching.
  • The promise based callback really shine if there is a need for asynchronicity, the need has not been demonstrated so far.

@youennf
Copy link

youennf commented Oct 29, 2024

@jan-ivar, @tovepet, thoughts?

@tovepet
Copy link
Author

tovepet commented Oct 29, 2024

I think I like the autopause/onconfigurationchange API shape better and I agree with many of the arguments you give.

However, I do have concerns about black frames being emitted when using track.enable = false for pausing. Since the application-controlled pausing is mostly independent of the autopause/onconfigurationchange mechanism, we could perhaps create a separate issue to discuss how an application can pause the frame delivery without emitting black frames? If we can’t quickly reach an agreement on the black-frames problem, I think we should go with the postSurfaceSwitchingHandler since it looks like it solves all the problems at hand, and it looks like it is at least acceptable to all parties here?

@youennf
Copy link

youennf commented Oct 29, 2024

create a separate issue to discuss how an application can pause the frame delivery without emitting black frames?

Let's discuss this in this thread.

I do not think we would want to pause frame generation globally until user decides, as web page might want to have a live preview for user to decide what to do. There might be a use for asynchronously pausing on a per track's scope but this would not work with the current postSurfaceSwitchingHandler.

For pausing RTCRtpSender without black frames in a synchronous way, I think the principle would be the following:

  • clone the sender's track
  • stop the sender's track, maybe use replaceTrack to provide synthetic video to remote users
  • wait for user's input
  • call replaceTrack with the track clone to restart.

@tovepet
Copy link
Author

tovepet commented Oct 29, 2024

  • clone the sender's track
  • stop the sender's track, maybe use replaceTrack to provide synthetic video to remote users
  • wait for user's input
  • call replaceTrack with the track clone to restart.

Ah, that seems to work!

To pause frame delivery on the track-level, there is also the option of using TransformStream to drop frames until the application is ready to resume frame delivery, or to use VideoTrackGenerator.muted.

@jan-ivar
Copy link
Member

Ah, that seems to work!

Yes, but is it a bug? See w3c/webrtc-pc#3014.

This is why I recommended we nail down requirements first. Use cases > desired behavior > API

@youennf
Copy link

youennf commented Oct 31, 2024

This is why I recommended we nail down requirements first

I think there is consensus on the requirement to control when new surface frames are sent via a peer connection. One discussion point is whether the synchronous event model is sufficient or not to support this requirement (this is what we are debating here).

A second discussion point is whether it is better to pause at track level or to pause globally.
I am not sure we clearly decided one way or the other.

Wrt pausing with the synchronous model, two solutions have been found so far (MSTP/VTG and track.ended).

A third solution is to use setParameters to set active to false, which, according my interpretation of the spec, should work (although spec is pretty light at the moment).

We could check how implementations are doing by racing setParameters and postMessaging to a worker to enqueue a VideoFrame in a VTG stream. It seems getting interop there might be a good idea.

I would also think that calling MediaRecorder.pause() synchronously would work.
Again, spec is a bit light in terms of VideoFrame management.

FWIW, choosing the configurationchange model does not mean we would never allow asynchronous pausing. We could introduce per track pausing via ConfigurationChangeEvent.waitUntil, which might be a useful addition for other tracks than gdm in the future.

is it a bug? See w3c/webrtc-pc#3014.

AFAIK, all browsers behave the same here.

@jan-ivar
Copy link
Member

I think there is consensus on the requirement to control when new surface frames are sent via a peer connection.

This issue is in screen capture so there are 4 sinks to consider: HTMLVideoElement, MediaRecorder, RTCRtpSender, and MediaStreamTrackProcessor. Is the requirement the same for these?

A second discussion point is whether it is better to pause at track level or to pause globally.

We should to decide which of these two behaviors we want.

Tracks can't pause today. It might conceptually generalize to e.g. readwrite track.paused, and maybe track.onpause.

What we seem to be talking about as the synchronous model is ways for web developers to make due without these. But the workarounds web developers would need to use on the regular seem borderline unreasonable (clone, stop, replaceTrack-hop, or MSTP+VTG worker roundtrip).

So far no other sources need pausing. So the only benefit would be to pause some tracks longer than others. This benefit seems marginal to me. Extending pausing for a subset of tracks also seems possible through the same workarounds floated.

Captured Surface Switching happens at the source and is unique to this source.

So pausing where the switch happens seems inherently simpler for web developers to understand.

@youennf
Copy link

youennf commented Nov 1, 2024

Is the requirement the same for these?

I would think so.
Pausing the video element, pausing the MediaRecorder, setting track.enabled = false for RTCRtpSender should all prevent new frames from being displayed/recorded. No need for asynchronicity here.

Asynchronicity is needed for MediaStreamTrackProcessor with a CaptureController callback. There is no such requirement if the notification happens at the track level.

But the workarounds web developers would need to use on the regular seem borderline unreasonable

track.enabled = false is probably the straightforward solution to prevent exposing new frames too soon.
It should work out of the box for all sinks without requiring asynchronicity. Do we agree on this?

If you want to freeze instead of mute, there are solutions as illustrated above, some of which are only a few lines of code.

pausing where the switch happens seems inherently simpler for web developers

MSTP is one example that shows that notification at the track level is actually simpler.

Also, pauseOnSurfaceSwitching could start as a boolean and be changed to a callback returning a promise if we get a need for asynchronicity. In any case, configurationchange should fire in case of surface switch and will provide the web application potentially useful information for handling surface switch (type & ID of surface for instance).

@jan-ivar
Copy link
Member

jan-ivar commented Nov 1, 2024

Pausing the video element, pausing the MediaRecorder, setting track.enabled = false for RTCRtpSender should all prevent new frames from being displayed/recorded. No need for asynchronicity here.

If we allowed transceiver.sender.track = null we could pause the last one too! (we could make it throw on non-null)

If you want to freeze instead of mute, there are solutions as illustrated above, some of which are only a few lines of code.

There's a dependency here on w3c/webrtc-pc#3014.

configurationchange should fire in case of surface switch and will provide the web application potentially useful information for handling surface switch (type & ID of surface for instance).

There's a dependency here on (a new issue that needs to be filed following) w3c/mediacapture-screen-share#306.

Also, pauseOnSurfaceSwitching could start as a boolean and be changed to a callback returning a promise if we get a need for asynchronicity.

That might work.

@youennf
Copy link

youennf commented Nov 1, 2024

If you want to freeze instead of mute, there are solutions as illustrated above, some of which are only a few lines of code.

There's a dependency here on w3c/webrtc-pc#3014.

For ending track, not for using setParameters to set active to false.

There's a dependency here on (a new issue that needs to be filed following) w3c/mediacapture-screen-share#306.

w3c/mediacapture-screen-share#308

@youennf
Copy link

youennf commented Nov 4, 2024

It seems we could try presenting the result of this discussion (the two options) at next F2F.
It might be good to discuss the following issues to see whether there is a potential consensus:

@tovepet
Copy link
Author

tovepet commented Nov 5, 2024

I have a slot booked at the next working group meeting for this topic.

@tovepet
Copy link
Author

tovepet commented Nov 7, 2024

After further deliberations, I believe that it would be easier for applications to be notified by the CaptureController and to allow them to pause the capture at the source level.

Ensuring that all tracks are paused individually puts an extra burden on application developers when tracks are cloned. Furthermore, the setup an application needs to do may have a wider scope than individual tracks, e.g., updates to the UI.

From the discussion in this thread, it also seems more inline with existing semantics to pause on the source level. We could model this on VideoTrackGenerator.muted, which I believe have the behavior we seek:

controller.setSurfaceSwitchingCallback(() => {
  controller.muted = true
  setup().then(controller.muted = false);
});

Besides being easier to understand and use than returning a promise, this approach would also give applications the flexibility to use other methods for pausing if they so wish, e.g., setting enabled = false on a track or use any of the other methods discussed earlier in this thread.

@youennf
Copy link

youennf commented Nov 7, 2024

allow them to pause the capture at the source level.

One usecase is to protect user from oversharing the display stream during a VC.
In that case, the point is to keep the display local preview but disable sending the display stream to remote users.
And then present UI for the user to restart sharing to remote users.

The granularity in that use case is track, not source.

Ensuring that all tracks are paused individually puts an extra burden on application developers when tracks are cloned.

All tracks paused individually offers more flexibility as well which is useful for the above usecase.
If the idea is to mute the source, navigator.mediaSession.setScreenshareActive is a solution.
There are other JS based solutions like keeping track of all clones, which is anyway a good practice when the page wants to stop capture.

Please also note that if track is in a worker, it is more convenient to be notified in the worker.

Thoughts?

@eladalon1983
Copy link
Member

We should prioritize simple, fool-proof solutions to real problems that have been highlighted by Web developers as pressing, while still leaving ourselves the flexibility to address more nuanced issues in the future, if the need arises.

Fool-proof:
While showing an “are you sure?” dialog, it would be a shame if the developer pauses video but keeps transmitting audio. All other things being equal, API shapes that discourage such bugs are preferable.

Real:
Developers we have spoken to do not currently transfer MediaStreamTracks to workers. These developers do clone tracks and propagate those clones across multiple modules of rather large applications, where coordination between modules is a pain point.

Simple:
It is easier to “fan out” a single event, then to coordinate the order of multiple events.

By way of illustration of the coordination issue, suppose a track original has two tracks a and b, where a is displayed as the local preview and b is transmitted remotely. Assume that they are far apart in the app, each with its own handler for configurationchange. Suppose two source-changes occur in rapid succession, -1 and -2. The following orderings of event handler invocations are all possible:

  • a1, a2, b1, b2
  • a1, b1, a2, b2
  • a1, b1, b2, a2
  • b1, b2, a1, a2
  • b1, a1, b2, a2
  • b1, a1, a2, b2

Applications that need to only take action once per change, as the hypothetical app that wants to show a dialog, would find it non-trivial and error-prone to catch the first ?1 event and first ?2 event, while ignoring the second ?1 and ?2 events. Such applications would prefer a single event per CaptureController.

For that reason, I think the MVP is:

  1. Fire ”sourcechange” against the CaptureController and specify auto-pause behavior there.
  2. Fire ”devicechange” against the tracks as discussed, but avoid auto-pause behavior there.
  3. Specify that no1 precedes no2.

@youennf
Copy link

youennf commented Nov 8, 2024

While showing an “are you sure?” dialog

Wouldn't it be convenient for users to see a preview of what they have picked to answer the "are you sure?" question in an informed manner?

Exposing an API to allow CaptureController to mute all tracks it is related to is orthogonal to the discussion we have on the API design.

Developers we have spoken to do not currently transfer MediaStreamTracks to workers.

This pattern is being used by real web sites using data channel and WebCodecs for video conferencing.
Since we want to encourage processing in workers, it is important to have API working nicely there too.

It is easier to “fan out” a single event, then to coordinate the order of multiple events.
Applications that need to only take action once per change, as the hypothetical app that wants to show a dialog

This has been discussed earlier in the thread, a simple solution was illustrated at #15 (comment).
The above coalescing example would work whatever the order of events.
Event ordering is deterministic, 2 can never happen before 1.
The question about clones event order is interesting, I'll file an issue to see what we should do.

  • Fire ”sourcechange” against the CaptureController and specify auto-pause behavior there.
  • Fire ”devicechange” against the tracks as discussed, but avoid auto-pause behavior there.

Within sourcechange callback, the surface type exposed via track.getSettings() would be the old surface type. This is a potential footgun for web apps.
And the new surface type, a useful piece of information, would not be exposed. This is annoying.

@eladalon1983
Copy link
Member

Wouldn't it be convenient for users to see a preview of what they have picked to answer the "are you sure?" question in an informed manner?

The idea is that you'd get all tracks, video and audio, auto-paused, and then selectively unpause just the track used for the preview.
(See my previous comment for what benefits this approach yields.)

Exposing an API to allow CaptureController to mute all tracks it is related to is orthogonal to the discussion we have on the API design.

Sorry, that's not my proposal. I am not (currently) proposing a knob that the app can manipulate. Rather, I am saying:

  • We can have an unlimited number of events associated with source-changes; that's not the crux of the issue.
  • At least one of those events should be associated with auto-pause.
  • The MVP is to have an event on CaptureController be auto-pausing, and the rest to be informative-only.

This pattern is being used by real web sites using data channel and WebCodecs for video conferencing.

Could you name a few such sites? I'd love to get their feedback.

Since we want to encourage processing in workers, it is important to have API working nicely there too.

Definitely. But future use-cases can be addressed by future extensions, while existing use-cases require more immediate solutions.

Event ordering is deterministic, 2 can never happen before 1.
The question about clones event order is interesting, I'll file an issue to see what we should do.

Note that I did NOT place x2 before x1. Rather, I placed x2 before y1.
Or are you sure that the ordering between x2 and y1 (two different tracks) is deterministic even if one is in a worker? (Maybe it is...? I am looking forward to learn from you here.)

Within sourcechange callback, the surface type exposed via track.getSettings() would be the old surface type. This is a potential footgun for web apps.

Yes. Good point.
Maybe it's a "sourcechangestarted" event?

@jan-ivar
Copy link
Member

jan-ivar commented Nov 8, 2024

... I believe that it would be easier for applications to be notified by the CaptureController and to allow them to pause the capture at the source level.

I can get behind this, if by "pause" you mean halting frame delivery rather than mute.

Frame delivery is a property of the source today, so keeping a "pause" concept entirely contained there, and distinct from today's track states seems clean and appealing to me.

We could model this on VideoTrackGenerator.muted, which I believe have the behavior we seek:

VideoTrackGenerator.muted is different. It controls the muted state (firing mute and unmute events) on downstream tracks, and produces black frames. It gives JS backdoor-control over events that otherwise go unused.

That seems wrong here. Muted is not unused in screen capture.

This could work:

controller.surfaceSwitchingCallback = () => {
  controller.paused = true;
  setup().then(controller.paused = false);
});

It would would NOT fire muted. @tovepet is that still close to what you proposed?

My feedback then is that subtle details might be unobvious with this API:

  • frame delivery is on a background thread, so UAs would still need to pause before calling this callback, and only unpause if the paused state is false synchronously afterwards.
  • The API allows randomly pausing at other times, overshooting the use case for no clear reason

Therefore, I think I'd still prefer the promise version which doesn't have these problems or subtleties:

controller.surfaceSwitchingCallback = async () => {
  await setup();
});

@jan-ivar
Copy link
Member

jan-ivar commented Nov 8, 2024

One usecase is to protect user from oversharing the display stream during a VC.
In that case, the point is to keep the display local preview but disable sending the display stream to remote users.
And then present UI for the user to restart sharing to remote users.
...
Wouldn't it be convenient for users to see a preview of what they have picked to answer the "are you sure?" question in an informed manner?

This seems solvable regardless of API.

@eladalon1983
Copy link
Member

eladalon1983 commented Nov 8, 2024

This seems solvable regardless of API.

The purpose of this API is to ensure frames are not sent to a sink such as a PeerConnection between the time that the user changes the source, and the time that the application learns of the change and brings up a prompt. If there is a way to achieve this without this feature, I am not aware of it. Could you please explain?

@youennf
Copy link

youennf commented Nov 9, 2024

Could you please explain?

I think @jan-ivar meant that whatever API flavour, this seems doable.

@youennf
Copy link

youennf commented Nov 9, 2024

It does not seem that we are converging, we have more proposals than ever in this thread.
I am personally still not sure whether we need to allow asynchronous pausing of frames, maybe we should concentrate on this issue first.

Here are a bunch of questions that might be helpful for the above question and how it could interact with configurationchange.

  1. Should setting synchronously track.enabled to false as part of either event handler/callback prevent new surface frames to be provided to all track's sinks (if using a promise to pause, let's say the web app returns Promise.resolve()?
  2. If we add a new callback/event, what would be the timing of configurationchange event listeners? Would it be before, after? If after, would there be a guarantee that frames would not be delivered to sinks at the time the callback is called.
  3. In workers, what is the timing of configurationchange event vs. exposure of video frames with MSTP?
  4. Is it useful to expose deviceIds (Should screen capture tracks expose deviceId? mediacapture-screen-share#308) for this use case?

@tovepet
Copy link
Author

tovepet commented Nov 11, 2024

It does not seem that we are converging

It felt like we were quite close to converging 2-3 weeks ago, where everyone seemed to find the model with a callback on the CaptureController at least acceptable if maybe not the preferred model.

After this, there has been a discussion about if onconfigurationchange should be emitted when switching captured surfaces. I think this is reasonable even if we have a callback on the CaptureController.

If we start with the CaptureController-callback model and also emit onconfigurationchange events when the surface is switching, would that work as the basis for something that we can converge on?

I’ll need to get back to you on the other questions raised the last couple of days.

@tovepet
Copy link
Author

tovepet commented Nov 12, 2024

Here are a bunch of questions that might be helpful for the above question and how it could interact with configurationchange.

  1. Should setting synchronously track.enabled to false as part of either event handler/callback prevent new surface frames to be provided to all track's sinks (if using a promise to pause, let's say the web app returns Promise.resolve()?

Yes.

  1. If we add a new callback/event, what would be the timing of configurationchange event listeners? Would it be before, after? If after, would there be a guarantee that frames would not be delivered to sinks at the time the callback is called.

From an application point of view, I think the following order would be preferable for tracks on the main thread:

  1. Tracks are updated.
  2. The callback/event on the CaptureController is called/emitted.
  3. configurationchange is emitted.

For tracks in worker threads, we might want to relax the ordering somewhat to avoid excessive synchronization.

  1. In workers, what is the timing of configurationchange event vs. exposure of video frames with MSTP?

If we want to support the distributed use case, we need to extend the auto-pause guarantee so that the configurationchange event for a track is received before any frames from the new surface are exposed on that track.

  1. Is it useful to expose deviceIds (Should screen capture tracks expose deviceId? mediacapture-screen-share#308) for this use case?

Maybe. Exposing deviceIds is one way of informing applications that a configurationchange is caused by surface-switching, but I’m sure there are other alternatives we could employ. I believe we are open to exposing deviceIds, with the caveats noted in mediacapture-screen-share#308.

@jan-ivar
Copy link
Member

  1. Should setting synchronously track.enabled to false as part of either event handler/callback prevent new surface frames to be provided to all track's sinks (if using a promise to pause, let's say the web app returns Promise.resolve()?

Yes.

Yes. This is why I think a callback is appropriate. It's a synchronization window with two-way guarantees:

  1. The UA MUST NOT produce frames between the switch and the callback being called
  2. The UA MUST take JS state on callback return / promise resolve into account when deciding to resume frames

Events normally don't carry the second guarantee. The configurationchange event is also subject to "fuzzing", making it a questionable event to overload for timing sensitive information.

  1. If we add a new callback/event, what would be the timing of configurationchange event listeners? Would it be before, after? If after, would there be a guarantee that frames would not be delivered to sinks at the time the callback is called.

No, I'd leave configurationchange alone. It has none of these guarantees. Fires after. Anything else can be polyfilled:

controller.surfaceSwitchingCallback = async () => {
  await new Promise(r => referenceTrack.onconfigurationchange = r);
  worker.postMessage({paused: referenceTrack.getSettings()});
  await new Promise(r => worker.onmessage = r);
});

An upstream frame delay callback untangled from tracks. KISS

@youennf
Copy link

youennf commented Nov 19, 2024

The configurationchange event is also subject to "fuzzing"

The fuzzing note is related to device label change, which applies to all capture sessions using that device.
In our case, the screenshare session (and so the surface change) is owned by a single page, there should be no fuzzing.
Maybe we should beef up a bit the note to better guide implementors.

Updating the settings before the callback makes sense to me, and the only way currently is for configurationchange to fire first.
If a callback is really desired, a compromise would be the following:

  • enqueue a task on each track to fire configurationchange on each track task queue with MediaStreamTrack task source
  • enqueue a task to execute the callback on capture controller task queue with MediaStreamTrack task source

I don't think the added delay of executing a few tasks will change much in practice (OS may take some time to switch capture, and web app wants user input which may take some time anyway).

Events normally don't carry the second guarantee.

There are plenty of events that carry a window of opportunity, click events to trigger getDisplayMedia, ondatachannel for transferring a data channel, ontrack for applying a transform to a receiver, this is a known pattern.

As of the second guarantee, it is only useful for asynchronous pausing.
AIUI, there is no real use case for that, and I am really hesitant to expose a persistent no-video-frame state that is not muted.
If we want asynchronous source pausing, I would piggy back on the mute concept, especially since we can unmute with MediaSession. Something like executing the callback and then firing the mute events on tracks.

Please also note that the use cases are all about pausing the sending of video frames to specific track sinks, not to pause the source. Conceptually, a callback on CaptureController (which represents the source) is misaligned with the presented use cases.

@tovepet
Copy link
Author

tovepet commented Dec 2, 2024

Updating the settings before the callback makes sense to me, and the only way currently is for configurationchange to fire first. If a callback is really desired, a compromise would be the following:

  • enqueue a task on each track to fire configurationchange on each track task queue with MediaStreamTrack task source
  • enqueue a task to execute the callback on capture controller task queue with MediaStreamTrack task source

My concern with this ordering is how applications know that the user is switching surfaces at the time of the configurationchange event. Do you have a mechanism in mind that you think we can agree on?

Otherwise I think we should reverse the order of these tasks.

@youennf
Copy link

youennf commented Dec 5, 2024

Do you have a mechanism in mind that you think we can agree on?

The idea is to reuse deviceId, see w3c/mediacapture-screen-share#308.
Plan is to discuss it at next week WG meeting.

@jan-ivar
Copy link
Member

jan-ivar commented Dec 9, 2024

The fuzzing note is related to device label change, ...

A track's label is neither dynamic nor part of the "track's capabilities and settings" .

AFAIK this event exists for downstream sinks to adapt their consumption of the track to its evolved properties (e.g. new dimensions or skip applying blur).

The event won't even fire when users switch capture between tabs in the same window, because the configuration (e.g. resolution) will stay the same (w3c/mediacapture-screen-share#308 might change this).

Nice separation of concerns or foot-gun? YMMV. I'm no fan of events that require websites to decode their cause, because some inevitably forget to anticipate all causes.

... the screenshare session (and so the surface change) is owned by a single page, there should be no fuzzing.

Agreed. Just unfortunate for the configurationchange event to have such different timing semantics per source category.

Updating the settings before the callback makes sense to me, ...

Can you elaborate on why that makes sense to you? A singular upstream switch might cause a fan-out reaction of events on track clones potentially across multiple workers and main thread here.

Seems simpler for websites to keep track of what's going on (no pun intended) if they're informed of the cause first. Doubly so if there are no events (no net configuration change).

@jan-ivar
Copy link
Member

jan-ivar commented Dec 9, 2024

... the screenshare session (and so the surface change) is owned by a single page, there should be no fuzzing.

Fuzzing might be needed if two websites capture the same window (or different tabs of the same browser window) at the same time. Otherwise the user might be correlated across origins when they resize the window.

If this fuzzing is significant, it might even interfere with firing the same event from a source switch.

@jan-ivar
Copy link
Member

If a callback is really desired, a compromise would be the following:

  • enqueue a task on each track to fire configurationchange on each track task queue with MediaStreamTrack task source

  • enqueue a task to execute the callback on capture controller task queue with MediaStreamTrack task source

I don't think the added delay of executing a few tasks will change much in practice

Are you merely proposing an order in which tasks are queued? If so, what "added delay of executing" are you referring to?

As of the second guarantee, it is only useful for asynchronous pausing.

My understanding is both guarantees are needed to prevent frames resuming in the background media thread here:

let {deviceId} = track.getSettings();
track.onconfigurationchange = e => {
  if (deviceId != track.getSettings().deviceId) {
    deviceId = track.getSettings().deviceId;
    track.enabled = false;
  }
}

IOW, the UA MUST take JS state on callback return into account when deciding to resume frames in the background thread. I.e. it must block frames on main thread.

... I am really hesitant to expose a persistent no-video-frame state that is not muted.

It's not a state as far as the web developer is concerned, merely a delay function. At least no more than a transform is a state.

Please also note that the use cases are all about pausing the sending of video frames to specific track sinks, not to pause the source. Conceptually, a callback on CaptureController (which represents the source) is misaligned with the presented use cases.

The cause is at the source. Conceptually, it seems simpler for web developers to gate frames from a switch at the source.

@youennf
Copy link

youennf commented Dec 10, 2024

it must block frames on main thread.

Sure, the question is about blocking for more than the current task, i.e. during a time that the web app is controlling.

It's not a state as far as the web developer is concerned, merely a delay function.

It is a state if we introduce asynchronous pausing controlled by the web page.
This seems redundant with track.enabled.

Fuzzing might be needed if two websites capture the same window (or different tabs of the same browser window) at the same time. Otherwise the user might be correlated across origins when they resize the window.

In this particular case, user decided to switch one of the source, not the other. So the configurationchange event will only be received by one of the page. No fuzzing needed at all.

In general though, the fuzzing is not protecting the user since the two windows can easily correlate the user with the media content itself. This is the big difference with fuzzing devicechange for instance.
I think we added the note to be extra careful, we should probably think of removing it.

Updating the settings before the callback makes sense to me, ...

Can you elaborate on why that makes sense to you?

Having the old settings is a potential footgun and would probably come as a surprise to web developers.
It would for instance be nice to know that the capture is switching from a window to a screen, the web app might want to pause the sending of the video and display some warning to the user in this case.

@jan-ivar
Copy link
Member

It is a state if we introduce asynchronous pausing controlled by the web page. This seems redundant with track.enabled.

How so when track.enabled = false emits black frames locally and remotely?

@dontcallmedom-bot
Copy link

This issue had an associated resolution in WebRTC December 2024 meeting – 10 December 2024 (Back to Auto-pause):

RESOLUTION: proceed with configurationchange event as the way to manage surface switching handling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants