Skip to content

Commit

Permalink
fix: refine prompt to generate the most simple task in init stage (#546)
Browse files Browse the repository at this point in the history
* refine prompt to generate the most simple task in init stage

* feature test dtype check improve
  • Loading branch information
peteryang1 authored Jan 26, 2025
1 parent 712d94a commit 9d6feed
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 6 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,14 @@ if isinstance(X, pd.DataFrame) and isinstance(X_test, pd.DataFrame):
assert get_column_list(X) == get_column_list(X_test), "Mismatch in column names of training and test data."

if isinstance(X, pd.DataFrame):
assert sorted(X.dtypes.unique().tolist()) == sorted(
X_loaded.dtypes.unique().tolist()
), f"feature engineering has produced new data types which is not allowed, data loader data types are {X_loaded.dtypes.unique().tolist()} and feature engineering data types are {X.dtypes.unique().tolist()}"
X_dtypes_unique_sorted = sorted(X.dtypes.unique().tolist())
X_loaded_dtypes_unique_sorted = sorted(X_loaded.dtypes.unique().tolist())
assert (
len(X_loaded_dtypes_unique_sorted) == 1
and (X_loaded_dtypes_unique_sorted[0] == np.float64 or X_loaded_dtypes_unique_sorted[0] == np.float32)
) or (
X_dtypes_unique_sorted == X_loaded_dtypes_unique_sorted
), f"feature engineering has produced new data types which is not allowed, data loader data types are {X_loaded_dtypes_unique_sorted} and feature engineering data types are {X_dtypes_unique_sorted}"

print(
"Feature Engineering test passed successfully. All checks including length, width, and data types have been validated."
Expand Down
8 changes: 5 additions & 3 deletions rdagent/scenarios/data_science/proposal/prompts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ task_gen: # It is deprecated now, please refer to direct_exp_gen
{% if hypothesis is not none %}
The user is trying to generate new {{ targets }} based on the hypothesis generated in the previous step.
{% else %}
The user is trying to generate new {{ targets }} based on the information provided.
The user is trying to generate a very simple new {{ targets }} based on the information provided.
{% endif %}
The {{ targets }} are used in certain scenario, the scenario is as follows:
{{ scenario }}
Expand All @@ -84,7 +84,9 @@ task_gen: # It is deprecated now, please refer to direct_exp_gen
Your task should adhere to the specification above.
{% endif %}
{% if hypothesis is not none %}
{% if hypothesis is none %}
Since we are at the very beginning stage, we plan to start from a very simple task. To each component, please only generate the task to implement the most simple and basic function of the component. For example, the feature engineering should only implement the function which output the raw data without any transformation. The model component only uses the most basic and easy to implement model without any tuning. The ensemble component only uses the simplest ensemble method. The main focus at this stage is to build the first runnable version of the solution.
{% else %}
The user will use the {{ targets }} generated to do some experiments. The user will provide this information to you:
1. The target hypothesis you are targeting to generate {{ targets }} for.
2. The hypothesis generated in the previous steps and their corresponding feedbacks.
Expand Down Expand Up @@ -260,7 +262,7 @@ component_gen:
Please select the component you are going to improve the latest implementation or sota implementation.
Please generate the output following the format below:
Please generate the output in JSON format following the format below:
{{ component_output_format }}
user: |-
Expand Down

0 comments on commit 9d6feed

Please sign in to comment.