-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduled batch supervisor #17353
Scheduled batch supervisor #17353
Conversation
indexing-service/src/main/java/org/apache/druid/indexing/batch/BatchSupervisor.java
Fixed
Show fixed
Hide fixed
indexing-service/src/main/java/org/apache/druid/indexing/batch/BatchSupervisorSpec.java
Fixed
Show fixed
Hide fixed
indexing-service/src/main/java/org/apache/druid/indexing/batch/ScheduledBatchScheduler.java
Fixed
Show fixed
Hide fixed
indexing-service/src/main/java/org/apache/druid/indexing/batch/ScheduledBatchSupervisor.java
Fixed
Show fixed
Hide fixed
...xing-service/src/main/java/org/apache/druid/indexing/batch/ScheduledBatchSupervisorSpec.java
Fixed
Show fixed
Hide fixed
Please enter the commit message for your changes. Lines starting
… reusability (#17542) This PR contains non-functional / refactoring changes of the following classes in the sql module: 1. Move ExplainPlan and ExplainAttributes fromsql/src/main/java/org/apache/druid/sql/http to processing/src/main/java/org/apache/druid/query/explain 2. Move sql/src/main/java/org/apache/druid/sql/SqlTaskStatus.java -> processing/src/main/java/org/apache/druid/query/http/SqlTaskStatus.java 3. Add a new class processing/src/main/java/org/apache/druid/query/http/ClientSqlQuery.java that is effectively a thin POJO version of SqlQuery in the sql module but without any of the Calcite functionality and business logic. 4. Move BrokerClient, BrokerClientImpl and Broker classes from sql/src/main/java/org/apache/druid/sql/client to server/src/main/java/org/apache/druid/client/broker. 5. Remove BrokerServiceModule that provided the BrokerClient. The functionality is now contained in ServiceClientModule in the server package itself which provides all the clients as well. This is done so that we can reuse the said classes in #17353 without brining in Calcite and other dependencies to the Overlord.
The jobsExecutor is reponsible for submitting jobs to the broker for all scheduled batch supervisors. The cronExecutor was simply decoupling the submitting of jobs from the actual scheduled running of jobs. The service client library already has async threads, so we remove the cronExecutor to keep things simple as things are handled in an async manner by the service client. If we ever observe and see evidence of bottlenecks around task submission, etc, we should be able to make jobsExecutor multiple threaded instead of single threaded.
...service/src/main/java/org/apache/druid/indexing/scheduledbatch/ScheduledBatchSupervisor.java
Dismissed
Show dismissed
Hide dismissed
...ervice/src/main/java/org/apache/druid/indexing/scheduledbatch/ScheduledBatchTaskManager.java
Fixed
Show fixed
Hide fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay on this, @abhishekrb19 !
I have tried to do a complete review. Once we have addressed these comments, we should be good to merge this PR.
Thanks a lot for your patience!! 🙂
indexing-service/src/main/java/org/apache/druid/indexing/overlord/DruidOverlord.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/druid/indexing/scheduledbatch/QuartzCronSchedulerConfig.java
Show resolved
Hide resolved
...-service/src/main/java/org/apache/druid/indexing/scheduledbatch/UnixCronSchedulerConfig.java
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/druid/indexing/scheduledbatch/ScheduledBatchTaskManager.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/druid/indexing/scheduledbatch/ScheduledBatchTaskManager.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/druid/indexing/scheduledbatch/ScheduledBatchTaskManager.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/druid/indexing/scheduledbatch/ScheduledBatchTaskManager.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/druid/indexing/scheduledbatch/ScheduledBatchTaskManager.java
Outdated
Show resolved
Hide resolved
@kfaraz, thank you for the reviews! I believe I have addressed and/or responded to all of your comments. |
...e/src/main/java/org/apache/druid/indexing/scheduledbatch/ScheduledBatchSupervisorStatus.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/org/apache/druid/indexing/scheduledbatch/ScheduledBatchSupervisorStatus.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/druid/indexing/scheduledbatch/ScheduledBatchTaskManager.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/druid/indexing/scheduledbatch/BatchSupervisorTaskStatus.java
Outdated
Show resolved
Hide resolved
...vice/src/main/java/org/apache/druid/indexing/scheduledbatch/ScheduledBatchStatusTracker.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/druid/indexing/scheduledbatch/ScheduledBatchTaskManager.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/druid/indexing/scheduledbatch/ScheduledBatchTaskManager.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/druid/indexing/scheduledbatch/ScheduledBatchTaskManager.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/druid/indexing/scheduledbatch/ScheduledBatchTaskManager.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/apache/druid/indexing/scheduledbatch/ScheduledBatchTaskManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor non-blocking comments. 🚀
@JsonProperty("taskStatus") TaskStatus taskStatus, | ||
@JsonProperty("updatedTime") DateTime updatedTime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Since this constructor is not used for deserializing right now, we don't need the @JsonProperty
annotation.
} | ||
|
||
/** | ||
* Used for internal tracking. So this field is *not* Jackson serialized to avoid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I am not sure if markdown is honored in javadocs. Use <i>
, <b>
or caps instead for emphasis.
@abhishekrb19 , the changes look good to me. You may proceed with the merge. When you get the time though, please take another look at this comment |
Sounds good, just responded in the comment inline |
This change introduces a scheduled batch supervisor in Druid. The supervisor periodically wakes up to submit an MSQ ingest query, allowing users to automate batch ingestion directly within Druid. Think of it as simple batch task workflows natively integrated into Druid, though it doesn't replace more sophisticated workflow management systems like Apache Airflow. This is an experimental feature.
Summary of changes:
The
scheduled_batch
supervisor can be configured as follows:The supervisor will submit the
REPLACE
sql query repeatedly every 5 minutes. The supervisor supports two types of cron scheduler configurations:unix
.*/5 * * * *
to schedule the SQL task every 5 minutes.@daily
,@hourly
,@monthly
, etc.quartz
.0 0 0 ? 3,6,9,12 MON-FRI
to schedule tasks at midnight on weekdays during March, June, September, and December.Key points:
query
along with any context in thespec
. This structure is identical to what the MSQ task engine accepts.spec
as-is on its schedule.Some implementation details:
a. Validate and parse the user-specified query.
b. Submit MSQ queries to the
/druid/v2/sql/task/
endpoint.ScheduledBatchScheduler
is injected in the Overlord, which is responsible for scheduling and unscheduling all scheduled batch instances.BrokerClient
implementation has been added, leveraging theServiceClient
functionality.SqlTaskStatus
and its unit testSqlTaskStatusTest
have been moved from the msq module to the sql module so it can be reused by the BrokerClient implementation in the sql module.ExplainPlanInformation
class, which is used to deserialize the explain plan response from the Broker.The status API response for the supervisor contains the scheduler state along with active and completed tasks:
Future Improvements:
Release Note
This change introduces an experimental feature called scheduled batch supervisor in Druid. The supervisor periodically wakes up to submit an MSQ ingest query, allowing users to automate batch ingestion directly within Druid. Think of it as simple batch task workflows natively integrated into Druid, though it doesn't replace more sophisticated workflow management systems like Apache Airflow. Currently, the supervisor will repeatedly submit the same MSQ query as-is. The payload for the supervisor is below, please check the PR summary for additional details:
This PR has: