Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Download API specification #1394

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Add Download API specification #1394

wants to merge 7 commits into from

Conversation

jbygdell
Copy link
Collaborator

@jbygdell jbygdell commented Feb 15, 2025

Related issue(s) and PR(s)

This PR closes #1378 .

Description

The /file/{fileId}: is optional since the same functionality exists in the /file endpoint. Although it provides a more convenient way to get a file out.

Endpoints

/file - endpoint that takes query arguments to return a file by ID or path, the dataset name is a required parameter.
/file/{file_stable_id} - endpoint that returns the file withe the corresponding stable_id

/info/datasets - returns a list of datasets the user has access to
/info/datasets/{dataset_name} - returns information about the requested dataset
/info/datasets/{dataset_name}/files - returns information about the files in the dataset

How to test

Paste the contents of the yaml file into: https://editor-next.swagger.io/ for a nicer view

@jbygdell jbygdell requested a review from a team February 15, 2025 08:19
@jbygdell jbygdell self-assigned this Feb 15, 2025
@jbygdell jbygdell force-pushed the feature/download_svc branch from 3ad0261 to 7daa6eb Compare February 15, 2025 08:41
@jbygdell jbygdell force-pushed the feature/download_svc branch from 7daa6eb to e02b66f Compare February 15, 2025 08:44
@jbygdell jbygdell force-pushed the feature/download_svc branch from cf3c27d to b2398a3 Compare February 15, 2025 09:22
@jbygdell jbygdell changed the title Add API specification Add Download API specification Feb 15, 2025
type: integer
responses:
"200":
content:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
content:
headers:
'Content-Disposition: attachment; filename="filePath"'
content:

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, if you request a file by stable ID that should be the filename in the header.

description: Internal application error
security:
- bearerAuth: []
/file/{fileId}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/file/{fileId}:
/file/{datasetId}/{fileId}:

? Or should it explicitly be possible to fetch a file when knowing the fileId?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File stable ID is a unique entity and since the authorization step checks whether or not the user has permission to access the file or not this should be ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it should be okay from a security standpoint and normally I'm all for not being lazy if it helps down the line, but here I'm a bit uncertain if there is a benefit in supporting this worth the added complexity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the datasetID would make the authorization step easier/faster but then again is it really needed?
One problem that might arrive here is if the dataset ID is an URL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we agree that it's not strictly needed, but the flow gets easier (should not be overly complex but compared to other things in download it's a lot). We should maybe mull on it for a while.

(Values for datasetID can be managed through encoding.)

Copy link
Contributor

@MalinAhlberg MalinAhlberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice! Added some comments.

description: Internal application error
security:
- bearerAuth: []
/ready:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not /health?

router.GET("/health", healthResponse)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

health or ready serve the same purpose

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, I was just wondering if it would make sense to keep the same endpoint name 🙃

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's no special reason, I agree with @MalinAhlberg that it is better to keep the same endpoint name.

@jbygdell jbygdell force-pushed the feature/download_svc branch from f2f809e to 83e67cd Compare February 18, 2025 14:45
pahatz
pahatz previously approved these changes Feb 20, 2025
Copy link
Contributor

@pahatz pahatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@MalinAhlberg
Copy link
Contributor

In general, I think this looks very good!
I'm a bit confused about the start/end parameters vs the Range header, though. Are they to be used in exactly the same way? Ie is this

"End of requested data in the form of the 0-index of the first octet not included in the response. That is, data delivered will have octet indices in the closed-open set [start,end). Defaults to end of file if not set"

true for the end of the range to? And exactly how should the range be formatted? "Range: bytes=20-200"?

@jbygdell
Copy link
Collaborator Author

In general, I think this looks very good! I'm a bit confused about the start/end parameters vs the Range header, though. Are they to be used in exactly the same way? Ie is this

"End of requested data in the form of the 0-index of the first octet not included in the response. That is, data delivered will have octet indices in the closed-open set [start,end). Defaults to end of file if not set"

true for the end of the range to? And exactly how should the range be formatted? "Range: bytes=20-200"?

Range headers has to adhere to the specification like what you wrote above. But also this is correct "Range: bytes=20-200, 30-400"

The query parameters on the other hand you can supply one or both, hence the logic about inferring the missing one.

Co-authored-by: Malin Klang <[email protected]>
Co-authored-by: Panos Chatzopoulos <[email protected]>
openapi: 3.0.3
info:
title: SDA file download API
version: "1.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is premature, would it be better to use version 0.x?

Copy link
Contributor

@MalinAhlberg MalinAhlberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

description: Returns the requested file to the user
parameters:
- in: header
description: "Public key used to re-encrypt the file (header) with. This should be supplied as the base64 encoding of the PEM (RFC 7468) encoding of the public key"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description Public key used to re-encrypt the file (header) with. is ambiguous, unclear whether the public key is used to encrypt just the file header or the entire file.

Suggested change
description: "Public key used to re-encrypt the file (header) with. This should be supplied as the base64 encoding of the PEM (RFC 7468) encoding of the public key"
description: "Public key used to re-encrypt the file header server-side before transmission. This should be supplied as the base64 encoding of the PEM (RFC 7468) encoding of the public key"

schema:
type: string
required: false
description: Stable ID of the file to download. It is an error to supply both filePath and fileId
Copy link
Contributor

@nanjiangshu nanjiangshu Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to specify which error code to return when both filePath and fileId are specified. e.g. 400
Or maybe decide it later?

format: binary
description: Successful partial file delivery
"400":
description: Bad request i.e start larger than end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is start lager than end the only condition for an 400 error? If not, better to use e.g. instead of i.e..

If yes, change i.e -> i.e.

"403":
description: The user does not have access to the dataset
"416":
description: Unsatisfiable range i.e. end larger than file size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar in this case, use e.g. if end larger than file size is not the only condition for 416.

type: string
format: date-time
example: "2025-02-14T14:51:26.639Z"
files:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

files represents number of files, right? Then maybe to use a clearer name, such as fileCount.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simple Download API specification
5 participants