-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add query to remove user ids from filepaths #1259
base: main
Are you sure you want to change the base?
Conversation
INSERT INTO sda.dbschema_version VALUES(sourcever+1, now(), changes); | ||
|
||
UPDATE sda.files | ||
SET submission_file_path = regexp_replace(submission_file_path, '^[^/]*/', ''); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solution is a bit harsh since it will remove the first part of the path even if is not a username.
SET submission_file_path = regexp_replace(submission_file_path, '^[^/]*/', ''); | |
SET submission_file_path = REPLACE(submission_file_path, CONCAT(REPLACE(submission_user, '@', '_'),'/'), '');; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I went this way is that we have some old entries in BP that are missing the @ domain suffix in users and the corresponding _domain suffix in file path.
But the following seems to work for those cases too:
UPDATE sda.files
SET submission_file_path = CASE
WHEN LENGTH(submission_file_path) = 40 THEN regexp_replace(submission_file_path, '[@_]/^[^/]', '')
ELSE REPLACE(submission_file_path, CONCAT(REPLACE(submission_user, '@', '_'),'/'), '')
END;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this maybe sholdn't even be a .sql
file by´ut rather a readme so the DB admin knows that it needs to be done. Because this is not something that should be allowed to happen automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, let's discuss this on a daily.
Co-authored-by: Joakim Bygdell <[email protected]>
Just a note, #1161 was merged into |
Related issue(s) and PR(s)
This PR closes #1123.
Description
Previous work from #1097 removed the user ids from the filepaths. This sql query now should take care of the current databases and update to remove the user ids from the filepaths there as well.
How to test
It has been tested manually with both test environment and production data.
bring up the test env
make-sda-s3-up
fill 4 test entries in the database
run the script
check data is right
the user ids should be gone from the
submission_file_path