Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Picking columns to export CSV #260

Merged
merged 10 commits into from
Apr 12, 2024
Merged

Conversation

kaanolgu
Copy link
Contributor

@kaanolgu kaanolgu commented Jan 23, 2024

Fixes #252

  • For babelstream we need multiple columns to be exported so with this change, the post-processing script would be able to recognise new column data alongside with x_axis and y_axis. If needed user could define a_axis, b_axis, etc. to export more columns and the default execution flow is not affected.
  • Since the plotting part is not modified, it would not intervene the plotting part
  • For example, with this change we would be able to export a new column for dataframe :
x_axis:
  value: "tags"
  units:
    custom: "tags"
y_axis:
  value: "Triad_value"
  units:
    column: "Triad_unit"
z_axis:
  value: "spack_spec"
  units:
    column: null

series: [["partition", "cascadelake"],["partition", "volta"]]
  • Then it would print ( commenting line self.plot_generic( config["title"], df[columns][mask], config["x_axis"], config["y_axis"], series_filters) [thanks @pineapple-cat ] ) :
Selected dataframe:
   tags  Triad_value  Triad_unit                                    spack_spec    partition
0   acc    12963.337  MBytes/sec                   babelstream%[email protected] +acc  cascadelake
1   acc    13386.104  MBytes/sec       babelstream%[email protected] +acc cuda_arch=70        volta
2  cuda   846640.634  MBytes/sec      babelstream%[email protected] +cuda cuda_arch=70        volta
3   omp   159131.926  MBytes/sec                   babelstream%[email protected] +omp  cascadelake
4   tbb    99740.216  MBytes/sec  babelstream%[email protected] +tbb partitioner=auto  cascadelake

Is there any alternative method to export multiple columns that I might be missing ? Open to any feedback or suggestions

Thank you!

@kaanolgu kaanolgu added enhancement New feature or request postprocessing labels Jan 23, 2024
@kaanolgu kaanolgu self-assigned this Jan 23, 2024
@kaanolgu kaanolgu force-pushed the ko/post_processing_multiple_axis branch from fd2e03a to 438be02 Compare February 1, 2024 13:34
@kaanolgu
Copy link
Contributor Author

kaanolgu commented Feb 1, 2024

438be02

Made a change according to idea from @ilectra and @pineapple-cat . Instead of adding a new "*_axis" to the yaml file, a new list of columns to extract to csv dataframe is used and df_csv_export is generated from the original dataframe.

One question would be is it required to apply user-specified types to all relevant columns for the csv export too ?

@kaanolgu kaanolgu changed the title Featuring Multiple Column Support for Post Processing to Export Data Picking columns to export CSV Feb 1, 2024
Copy link
Collaborator

@pineapple-cat pineapple-cat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need user-specified types for csv export columns; typing is only needed for the parts of the dataframe that will be used in data manipulation in some way. Since what you want to add is purely for display purposes, it shouldn't make much difference once it's in the csv anyway.

@ilectra
Copy link
Collaborator

ilectra commented Feb 5, 2024

I think that, instead of creating a whole new df to export to csv, it would make more sense if you

  • add only the extra columns that you need in the csv_export part of the yaml, not all the axis and series etc.
  • keep those columns in the filtered dataframe, as you go along the processing/filtering, treating them as extra axes.
  • export the filtered df to csv, just before calling the plotting script (you won't call it, in your case, but that's the place to do the printing)

@kaanolgu
Copy link
Contributor Author

kaanolgu commented Feb 5, 2024

I think that, instead of creating a whole new df to export to csv, it would make more sense if you

  • add only the extra columns that you need in the csv_export part of the yaml, not all the axis and series etc.
  • keep those columns in the filtered dataframe, as you go along the processing/filtering, treating them as extra axes.
  • export the filtered df to csv, just before calling the plotting script (you won't call it, in your case, but that's the place to do the printing)

That's a great idea! I will try to implement this in a new commit

@ilectra
Copy link
Collaborator

ilectra commented Feb 5, 2024

I think that, instead of creating a whole new df to export to csv, it would make more sense if you

  • add only the extra columns that you need in the csv_export part of the yaml, not all the axis and series etc.
  • keep those columns in the filtered dataframe, as you go along the processing/filtering, treating them as extra axes.
  • export the filtered df to csv, just before calling the plotting script (you won't call it, in your case, but that's the place to do the printing)

That's a great idea! I will try to implement this in a new commit

@kaanolgu you might want to wait for the refactoring PR to be merged. It will change things quite a bit, hopefully to the better!

Copy link
Collaborator

@ilectra ilectra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add unit tests for the new functionality!

post-processing/post_processing.py Show resolved Hide resolved
post-processing/post_processing.py Outdated Show resolved Hide resolved

# gather all relevant columns
all_columns = set(columns + filter_columns + scaling_columns)
return self.df[config.plot_columns][self.mask]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this mess here is leftover from the merging with the refactored version?

post-processing/post_processing_config.yaml Outdated Show resolved Hide resolved
@ilectra
Copy link
Collaborator

ilectra commented Apr 11, 2024

@pineapple-cat @kaanolgu I think I addressed all my review comments, please have a last look and merge if happy!

Copy link
Collaborator

@pineapple-cat pineapple-cat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything seems to be in order to me 👍

@ilectra ilectra merged commit b505e07 into main Apr 12, 2024
6 checks passed
@ilectra ilectra deleted the ko/post_processing_multiple_axis branch April 12, 2024 09:55
github-actions bot pushed a commit that referenced this pull request Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request postprocessing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add more axes/dimensions/columns to post-processing
3 participants