Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

for same query_text refresh just execution once #7295

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

gaecoli
Copy link
Member

@gaecoli gaecoli commented Jan 24, 2025

What type of PR is this?

  • Refactor
  • Feature
  • Bug Fix
  • New Query Runner (Data Source)
  • New Alert Destination
  • Other

Description

How is this tested?

  • Unit tests (pytest, jest)
  • E2E Tests (Cypress)
  • Manually
  • N/A

Related Tickets & Documents

Mobile & Desktop Screenshots/Recordings (if there are UI changes)

@gaecoli gaecoli requested review from arikfr and justinclift January 24, 2025 03:46
@gaecoli
Copy link
Member Author

gaecoli commented Jan 24, 2025

This PR introduces a mechanism to prevent duplicate execution of the same SQL query in a distributed environment. By implementing a distributed locking mechanism using Redis, we ensure that only one process or thread can execute a specific SQL query at a given time, avoiding unnecessary load on the database and ensuring consistent query results.

@gaecoli gaecoli requested a review from eradman January 24, 2025 04:13
@@ -3,7 +3,7 @@ on:
push:
branches:
- master
pull_request_target:
pull_request:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaecoli I did this change to match changes on master. Not sure why it shows up in diff, but it's fine.

@arikfr
Copy link
Member

arikfr commented Feb 4, 2025

@gaecoli I wonder if we really need this? I mean the chance of the exact same query being executed at the same time are usually low vs. the complexity and potential issues this might add.

Do you have this happen frequently?

@gaecoli
Copy link
Member Author

gaecoli commented Feb 5, 2025

@gaecoli I wonder if we really need this? I mean the chance of the exact same query being executed at the same time are usually low vs. the complexity and potential issues this might add.

Do you have this happen frequently?

OK, i will consider that! Thank you! @arikfr

@gaecoli
Copy link
Member Author

gaecoli commented Feb 5, 2025

@gaecoli I wonder if we really need this? I mean the chance of the exact same query being executed at the same time are usually low vs. the complexity and potential issues this might add.

Do you have this happen frequently?

When i add the same query at a dashboard, the query has lot of visualizations, table, chart..., when i add this at a dashboard, i refresh dashboard, the same query text will all execution many times in query engine (Presto, MySQL, doris...).

At my company, 1000+ people maybe refresh same dashboard, so you know what i say.

@eradman
Copy link
Collaborator

eradman commented Feb 6, 2025

Unfortunately this does happen more often than I would have guessed. Any time multiple visualizations of the same query is included on a dashboard there is a high probability of duplicate queries

SELECT regexp_matches(query, 'Query Hash: [a-z0-9]+') FROM pg_stat_activity WHERE state='active';
\watch 1
...
                  regexp_matches
--------------------------------------------------
 {"Query Hash: 5aa874345926f2b18ecf197d3200a602"}
 {"Query Hash: 5aa874345926f2b18ecf197d3200a602"}
(2 rows)

For very long queries (not uncommon for my users!) this will cause unnecessary load.

@arikfr
Copy link
Member

arikfr commented Feb 6, 2025

Maybe the right thing will be to make dashboards refreshes smarter and reuse the same query invocation for different visualizations that depend on it? I think it will be more robust and address the core issue instead of trying to address it in an indirect way.

@eradman
Copy link
Collaborator

eradman commented Feb 6, 2025

Maybe the right thing will be to make dashboards refreshes smarter and reuse the same query invocation for different visualizations that depend on it

That seems like a good approach, unless this also becomes complex. My guess is that a dashboard is the only API user that would normally hit this race condition.

@gaecoli
Copy link
Member Author

gaecoli commented Feb 7, 2025

Maybe the right thing will be to make dashboards refreshes smarter and reuse the same query invocation for different visualizations that depend on it? I think it will be more robust and address the core issue instead of trying to address it in an indirect way.

This is a good approach, but I think such a change would make the code more complex.

@eradman
Copy link
Collaborator

eradman commented Feb 7, 2025

Tested this change manually, it does seem to work. Also I see the log messages

server-1 | [2025-02-07 13:56:32,160][PID:19][INFO][redash.utils.locks] Lock released successfully, lock_name=[lock:query_hash_job:1:d6338e9508dd103771b69483fb17d4a5], identifier=[73683a39-6aea-4f29-b0d3-02e4c0385a0b]

@gaecoli
Copy link
Member Author

gaecoli commented Feb 8, 2025

Tested this change manually, it does seem to work. Also I see the log messages

server-1 | [2025-02-07 13:56:32,160][PID:19][INFO][redash.utils.locks] Lock released successfully, lock_name=[lock:query_hash_job:1:d6338e9508dd103771b69483fb17d4a5], identifier=[73683a39-6aea-4f29-b0d3-02e4c0385a0b]

Yes, because it's work well at my company's Redash!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants