Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle constraint mapping failures in data refresh #4916

Open
stacimc opened this issue Sep 11, 2024 · 0 comments
Open

Handle constraint mapping failures in data refresh #4916

stacimc opened this issue Sep 11, 2024 · 0 comments
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: catalog Related to the catalog and Airflow DAGs

Comments

@stacimc
Copy link
Collaborator

stacimc commented Sep 11, 2024

Problem

In the data refresh remap_table_constraints step, we drop constraints from the existing media table and then apply them to the temp table before promotion. However, if anything goes wrong when applying constraints or promoting the new tables/indices, we could end up in an extended situation where the live tables are missing constraints. Moreover, if another data refresh is run from the beginning, it will attempt to copy constraints from the constraint-less table.

The sql to drop/remap constraints is also all applied in a single task per table, meaning it is not idempotent and cannot be rerun if the task fails partway through without manually cleaning up.

Description

One very simple approach would be to hard code the constraints that should be applied, rather than trying to generate ALTER TABLE statements based on the existing implementation. Updates to the production DB (if we manually added a new constraint in prod) would not be automatically persisted in the next data refresh, but this is perhaps a good thing: it requires the addition of constraints to be reflected in code and go through an approval process. This mirrors what we do with the elasticsearch configuration.

We could do the same thing for the indices as well. This would also have the benefit of reducing some code complexity!

Additional context

#4833 (comment)

@stacimc stacimc added 🟩 priority: low Low priority and doesn't need to be rushed ✨ goal: improvement Improvement to an existing user-facing feature 💻 aspect: code Concerns the software code in the repository 🧱 stack: catalog Related to the catalog and Airflow DAGs labels Sep 11, 2024
@openverse-bot openverse-bot moved this to 📋 Backlog in Openverse Backlog Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: catalog Related to the catalog and Airflow DAGs
Projects
Status: 📋 Backlog
Development

No branches or pull requests

1 participant