Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file-storage: List or remove multiple files with a common prefix #32

Open
tats-u opened this issue Nov 25, 2024 · 12 comments
Open

file-storage: List or remove multiple files with a common prefix #32

tats-u opened this issue Nov 25, 2024 · 12 comments

Comments

@tats-u
Copy link

tats-u commented Nov 25, 2024

Once you accept multiple files at the same time as a draft, you can't clean up them without a database support in a draft preview page. You may want to clean up all files in stale drafts, too.
Common file systems can list or remove all files in a common group by ls -R or rm -rf. Boto3 can list_object with a prefix up to 1000 at once. However, the current file-storage can't do it. You have to find out every name of the file you want to delete first before actually doing it.

@mjackson
Copy link
Owner

I've considered adding a keys() method to the FileStorage interface. This would allow you to scan through all keys and find the ones you want to delete.

@sergiodxa
Copy link
Contributor

For my R2 implementation of FileStorage and ended up extending it to add this keys method https://github.com/edgefirst-dev/kit/blob/44ee2a128e029bf7a5f0aad52cca47f696792a44/src/lib/fs/fs.ts#L61-L66

I also thought that instead of returning that promise, it could return an AsyncIterator, I ended up using a Promise instead because usually you just get one page at the time, and not the whole list.

@tats-u
Copy link
Author

tats-u commented Dec 13, 2024

AsyncIterator<string[]>?
I think (almost) all types of storages can fetch the list of several tens, hundreds, or thousands of files at once.
You can use the flatMap or convert it to an AsyncIterator<string>.

@mjackson
Copy link
Owner

AsyncIterator<string[]> seems like a natural interface for paging through keys, yes. I'd be open to a PR that adds this functionality.

@sergiodxa
Copy link
Contributor

sergiodxa commented Dec 16, 2024

You need to return not only the keys, but a cursor to let you start pagination from where you left.

Unless you're showing all keys at once to users you don't want to list all of them at once and you will most likely need to resume pagination from where you left off.

CF R2 gives you this cursor to let you keep asking for more keys.

@mjackson
Copy link
Owner

mjackson commented Dec 17, 2024

Would this API be sufficient, @sergiodxa ?

interface FileStorage {
  keys(cursor?: string): Promise<{ cursor?: string; keys: string[] }>
}

In the return value, if you have a cursor that means you have more to keys to read and you should call keys again. If you don't, you're done.

A function that reads all keys could look like:

async function readKeys(storage: FileStorage): Promise<string[]> {
  let keys: string[] = []

  let result = await storage.keys();

  keys.push(...result.keys);

  while (result.cursor != null) {
    result = await storage.keys(result.cursor);
    keys.push(...result.keys);
  }

  return keys
}

@sergiodxa
Copy link
Contributor

Yes, that's similar to what I did myself, I also sent a done boolean so you know there are not more things to fetch, which could be implicit if cursor is not sent too.

@tats-u
Copy link
Author

tats-u commented Dec 17, 2024

I wonder what type the cursor should be in storages whose structures are completely different from S3. Could it be number, bigint, or structure types, or should its type be a type parameter? The exposure of the cursor is not necessary for users, especially those of MemoryStorage and LocalFileStorage.

You will be able to use map or something similar with AsyncIterator in the future:

https://github.com/tc39/proposal-async-iterator-helpers

Lazy users can use await Array.fromAsync for AsyncIterator even now.

The suggested API forces users to use the procedural manner (the while statement).

Do MemoryStorage and LocalFileStorage prefer (Async)Iterator or AsyncIterator<Iterator> the most?

@mjackson
Copy link
Owner

I wonder what type the cursor should be ... should its type be a type parameter?

Probably. I'd imagine string would be the default though.

interface FileStorage {
  keys<T = string>(cursor?: T): Promise<{ cursor?: T; keys: string[] }>
}

The exposure of the cursor is not necessary for users

I disagree. Imagine showing a web page with one page of keys, and a "Next page" button that will show the next page. In that case, users will need to have a cursor they can use to pass back in on the request for the next page of keys.

@tats-u
Copy link
Author

tats-u commented Dec 18, 2024

Imagine showing a web page with one page of keys, and a "Next page" button that will show the next page.

I see. I think we might as well provide both kinds.

@sergiodxa
Copy link
Contributor

If we want to keep the AsyncIterator interface there's an option to add a Symbol.asyncIterator property, that way the instance itself would be iterable.

let fs = new MyFileStorage()
for await (let key of fs) {
  // use key here
}

So if you want to iterate the whole list you can use that instead, and if you need a paginated list of keys you can use FileStorage#keys.

@tats-u
Copy link
Author

tats-u commented Dec 18, 2024

there's an option to add a Symbol.asyncIterator property, that way the instance itself would be iterable.

There are some following cons, so it is not a good idea to force implementers to implement it, or to serve only the property without an additional equivalent function at least.

  • cannot pass any additional options, e.g. prefix and # of entries in one fetch
  • induces concrete FileStorage implementers to make their FileStorage itself a state instead of returning a newly created async generator

Naming suggestions:

  • AsyncIterator → .iterateKeys()
  • Object with cursor → .partialKeys() or some better names that can reduce misunderstanding (this function could never be paused) by users

AsyncIterator seems to be able to be calculated by objects with cursor, so now I think implementers have only to implement the function returning one with cursor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants