Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete entire site data on data corruption/loss? #431

Open
abhishek-shanthkumar opened this issue Oct 15, 2024 · 5 comments
Open

Delete entire site data on data corruption/loss? #431

abhishek-shanthkumar opened this issue Oct 15, 2024 · 5 comments

Comments

@abhishek-shanthkumar
Copy link
Contributor

abhishek-shanthkumar commented Oct 15, 2024

IndexedDB data stored on disk may get corrupted or partially lost due to several reasons including user action. Attempts by the site to read such data fail persistently, and these failures are surfaced differently by each browser.
Issue #423 specified the DOMException type for one such scenario, but this has an impact only if developers update their sites to handle this specific error appropriately.

Can we do more to ensure that the "right thing" happens in these scenarios?

We should note that:

  • Sites may store relational data across object stores and/or databases.
  • Sites may even store metadata about IndexedDB data in other storage such as LocalStorage.
  • The impact of and recoverability potential from partial data corruption/loss varies across sites based on the nature of data stored.

Considering the above, it is best left to the individual sites to handle these scenarios as suits their specific usage patterns. Discussions and efforts have been ongoing to surface sufficiently detailed errors to the sites so that they are equipped to handle them appropriately (#423, whatwg #75).

However, a majority of sites may not handle these errors at all, in which case reads of the affected data will persistently fail. Should we attempt to mitigate these errors by deleting the entire site data (perhaps limited to the containing storage bucket) if we get a strong signal that the site does not handle these errors?

cc @asutherland, @evanstade, @inexorabletash

@abhishek-shanthkumar
Copy link
Contributor Author

One wrinkle I see in adopting this approach is the challenge of differentiating between sites that currently don't handle these errors and sites that don't intend to handle these errors. I presume that we don't want to tell sites to "handle these errors by so-and-so milestone after which we'll wipe all your data if this error occurs".

@asutherland
Copy link
Collaborator

Also related is the management section of the storage spec which says:

Whenever a storage bucket is cleared by the user agent, it must be cleared in its entirety. User agents should avoid clearing storage buckets while script that is able to access them is running, unless instructed otherwise by the user.

Quoting #431 (comment)

One wrinkle I see in adopting this approach is the challenge of differentiating between sites that currently don't handle these errors and sites that don't intend to handle these errors. I presume that we don't want to tell sites to "handle these errors by so-and-so milestone after which we'll wipe all your data if this error occurs".

From my perspective, the dominant concern is site breakage and that it's hard for the browser to tell if a site is broken or not so it's safest to assume if we experience corruption and the site is not currently actively opted in to trying to handle it itself that we have to assume the site is broken. The only way to provide some kind of "deal with this in the future if you want" would be to do something like back-up the origin into a magic bucket, but this creates new privacy complications.

@asutherland
Copy link
Collaborator

Quoting #431 (comment)

Considering the above, it is best left to the individual sites to handle these scenarios as suits their specific usage patterns. Discussions and efforts have been ongoing to surface sufficiently detailed errors to the sites so that they are equipped to handle them appropriately (#423, whatwg #75).

I don't really expect that sites can meaningfully do much more than could be done automatically if the site was using multiple storage buckets. But I do believe there are sites that will want to handle this and it seems reasonable to provide an opt-in affordance.

One thing I should note is that in #423 (comment) I proposed that calling preventDefault() on an error reporting corruption could prevent the default behavior of wiping the containing storage bucket because this is a straightforward use of the IDB event model, but this is of course potentially at odd with promise ergonomics because it's easy to go async in a way that depends on a new task being scheduled rather than everything being resolved in the microtask checkpoint for the event dispatch. I believe there is some related discussion in the proposal of observables in WICG/observable#170 (in particular involving preventDefault()).

If we wanted to ensure promises could go fully async, we'd want something like along the lines of ServiceWorker's FunctionalEvent.waitUntil so that it's still valid to call preventDefault() (and stalling the IDB transaction) until the passed-in promises either all resolve or reject. Semantically, a rejection there is potentially confusing if one thinks about it too much, but from a spec perspective would map equivalently to an exception being thrown by the event handler.

@TonnyWildeman
Copy link

I seem te be missing the simple option of having a simple call for a complete nuke of the current indexedDb.
Not iterating over indexedDb.databases(), but simply a call to e.g. indexedDb.destroyDatabases().
This call would completely destroy all indexedDb corrupted files on the physical filesystem.

In my case, I actually needed to manually destroy indexedDb in DevTools. And others needed to clear cache on their mobile device. Couldn't even iterate using indexedDb.databases().

@inexorabletash
Copy link
Member

Using the Clear-Site-Data header is an option - https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Clear-Site-Data

Note that adding a new API (like a way to destroy all databases) to work around browser implementation bugs (like iteration or deletion failing) is not a pattern we like to follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants