-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-capture (concurrent capture of multiple surfaces) #8
Comments
(A less complex option using a new method that always returns a sequence is also possible. Straw-man suggestion only. The main point is that there be an API for prompting the user for a set of surfaces.) |
Love this idea. I developed a native Mac/Windows app called Screegle https://www.appblit.com/screegle that allows a user to select one or more windows during a screen sharing session, and overlay them over a background picture of their choosing. It would be fantastic if web applications could provide a similar functionality through As you point out, My current workaround is ugly and very inefficient: first the user is asked to share their entire screen by clicking "Share Screen", which calls Users then can pick individual windows by clicking "Share Window", which calls again The really slow part is determining where each window appears in the overall screen image. https://www.appblit.com/static/screegle/screegle-sdk-demo.html It would be great if the browser had 2 new things:
I'm attaching a video demonstration showing what this current prototype does: getdisplaymedia-issue-screegle.mp4Thanks! |
That's a great demo. I'd be very interested in (orthogonally) adding an API for exposing these coordinates. I think you'll also want the z-order, btw...? |
@eladalon1983 yes The native versions of Screegle for Mac and Windows poll the OS for window information. On MacOS in Swift or ObjectiveC, the function is On Windows, Screegle uses ElectronJS and relies on https://github.com/sentialx/node-window-manager/blob/master/src/classes/window.ts#L22 to obtain window information, matching the windowID to the DesktopCapturer.getSources https://www.electronjs.org/docs/latest/api/desktop-capturer which conveniently uses Perhaps the easiest way would be to extend the existing Or allow a web application that could hold several streamIDs a call on the The web application would be able to call this API at any time. |
I think this idea presumes too much about application logic, which is seeping into browser UX here:
Picking, or more broadly managing multiple things is a problem best dealt with in the context of an application IMHO, and the above problems are ones that picking one thing at a time in the context of the application doesn't have. I think hyper-focusing on the initial picking rather than management skews the value-add of a monolithic picker like this. It's not going to save the application from needing to design a place where the user can manage the multiple choices made, but might lead some applications to think they can skip that by instead leaning on calling this picker again, expecting the user to check the boxes over and over, rather than let them edit an existing choice (which my fiddle above allows btw). I think picking multiple things outside of the context of an application isn't very webby, a bit of an anti-pattern on the web.
My fiddle has thumbnails. Also, as mentioned at the meeting, browsers could highlight already-captured choices in the UX today without a spec change, if they think this information is useful. This wouldn't rely on the user making all choices at once. |
Vendors that wish to, should be able to experiment with prompt-bundling by detecting multiple invocations of getDisplayMedia on the same JS task today, e.g.: const [choice1, choice2, choice3] = Promise.all([
navigator.mediaDevices.getDisplayMedia(),
navigator.mediaDevices.getDisplayMedia(),
navigator.mediaDevices.getDisplayMedia(),
]); They could satisfy such simultaneous requests using a unified picker with checkboxes. This would be backwards compatible with other browsers where users would see 3 prompts one after the other (or fewer if the user cancels). |
Great idea! At Tella we've had multiple users ask for the ability to record multiple windows at the same time, without sharing their full screen. We haven't implemented this yet, partly because like you said in the original post, the UX currently is not ideal for a user. We would indeed also want a way to make sure they don't select too many streams (like with maxSurfaces in your example), since recording a lot of streams has a performance impact. Partly related, it would be great if we could say they can only capture windows, but I know there's already a discussion about that here. So summarizing: I like the idea of allowing selection of multiple windows/tabs/screens and I think it will improve the UX for the screen picker and will make it nicer to implement recording/streaming apps. Edit: Also one advantage I can see over prompting multiple times; we don't know at the start how many streams they want to share. "Add another stream/window" is something that could be added in our own UI but could also be more confusing to the user than handling it in context; the screen picker. |
First, if we think it's important to support order, it's trivial to specify that. Sequences are ordered. (And let UX worry about communicating it to the user.) Second - see my response to bullet number 3.
The "UX presented" was a mock illustrating what is generally possible. Don't worry, when it's time to ship, we'll have something much more refined. What matters for the W3C is that the API will specify that the user must be allowed to control whether audio is shared, and the question of whether it should controlled be per-surface or global. Let's focus on that.
For many applications, order doesn't matter and there are no roles.
I have technical answers to that (cloning, app-based UX, Capture Handle, etc.). But I think it would be a mistake to start that discussion at this time, as I believe we have run into a severe methodological issue. Please see my next comment. |
My previous comment dealt with the technical details raised in your previous comment. I'm posting a separate comment here to address what I see as a severe methodological issue, which has played itself out with minor variations over multiple proposals during the passing year. I have presented a set of use-cases for which we have genuine Web-developer interest and need (some examples already in the thread, and possibly more to come). I have presented a general approach to address these use-cases, which yields an improvement over existing mechanisms (getDisplayMedia). That is, I am offering incremental advancement of the Web platform - the explicit purpose of the W3C. Let's examine your response, both in this thread as well as during the interim meeting:
This is not conducive to progress. I hope that we can address this, so that we may be more productive over the coming year. |
We're proceeding in the WICG for the time being. (https://github.com/WICG/multicapture) [Edit, 2022-11-10: When I said "W3C", I meant "WebRTC WG".] |
Ironically, it's a deleted comment that's making me change my mind here (they likely realized the privacy issue that enumerating all the user's tabs would be, so kudos and my apologies for bringing it up again). But with my chair-hat on: it reminded me that expanding on the capabilities of in-browser pickers is actually in keeping with our desire and efforts to move away from enumeration in related mediacapture specs, and should therefore be encouraged.
I'd like that as well, as a proliferation of competing APIs seems counterproductive. My concerns with the API as well as lack of implementer interest (at this point) remain, but I'd be happy to keep discussing those here (with my chair-hat off). |
I like proposal 2. |
You raise an interesting topic - the appropriateness of the W3C as a spec-hosting venue for specs which only a single browser engine intends to implement. We should discuss this question. Namely - would it not make more sense for the discussion (about a particular spec) to proceed in the WICG, until such a time as more vendors are convinced and wish to implement it or a variation of it, at which point the spec can migrate to the W3C? [Edit, 2022-11-10: When I said "W3C", I meant "WebRTC WG".] |
how to achieve the multiple capture? |
It has come to my attention that some applications wish to capture multiple display surfaces at the same time. Some examples include:
Capturing multiple display surfaces is presently achievable using existing APIs - it is possible to call
getDisplayMedia()
multiple times. However, this is not very ergonomic, and creates serious friction for the user:Ideally, a single transient activation could be used for single API invocation, providing the user with a media-picker with functionality akin to checkboxes (mentioned here by way of example; we don't need to mandate specific UX elements). The user would be allowed to choose all of the display surfaces that they want to capture, then click OK once. It is clear from context that these are all of the surfaces the user was aiming to capture, and that no additional API calls to gDM or the like are necessary.
As a straw-man proposal, imagine
getDisplayMedia({video: true, ..., maxSurfaces: N})
. The default value ofmaxSurfaces
is 1, and would trigger the current behavior, returning a singleMediaStream
. A higher value would trigger the new behavior, and return an array,[MediaStream]
.Finer points off the bat:
maxSurfaces
greater than 1 is specified, an array will be returned even if the user chooses one surface, to simplify things for the application.Interesting points to discuss:
CC @shangl, whose use-case prompted this.
--
[*] Imagine an instructor streaming multiple tabs, and individual viewers independently choosing which one to focus on. I mention this so as to discourage solutions involving stitching together of multiple surfaces on a logical surface.
The text was updated successfully, but these errors were encountered: