Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query on CLI option #94

Open
mcrosson opened this issue Jul 5, 2024 · 9 comments
Open

Query on CLI option #94

mcrosson opened this issue Jul 5, 2024 · 9 comments

Comments

@mcrosson
Copy link

mcrosson commented Jul 5, 2024

Would it be possible to expose the query option as part of the cli args for this plugin? I do a lot of batch processing and cross convert my entire library from flac -> compressed formats.

Having a cli option for the query would let me more easily process my batches without having to remember to update my config on each run or waiting for the plugin to go through my entire beets library.

@geigerzaehler
Copy link
Owner

Thanks for input, @mcrosson! Something like this seems doable but I can also see some pitfalls. Before I design a solution, I would love to hear some details on the issue you are facing.

  • Could you provide a concrete example of your workflow right now and point out where the problem is?
  • What do you want to achieve by specifying the query on the CLI?
  • Is the problem only that it takes long to go through the entire library?
  • What would be the behavior of setting the query on the CLI?
  • How would it relate with the query defined in the configuration?

@mcrosson
Copy link
Author

mcrosson commented Jul 9, 2024

Could you provide a concrete example of your workflow right now and point out where the problem is?

My workflow is this:

  • import 500-1k flac files with beets import and a custom tag of my_import:[n] where [n] is a number that increments for each import
  • do some post import processing (replay gain, chroma, etc) using my_import:[n] as the query to filter the tracks/albums processed (ie. only operate on the most recently imported files)
  • run beets scrub using my_import:[n] as the query
  • run beets alt update subsonic to generate compressed audio i use with my personal, self-hosted subsonic service
    • i process all tracks in my library for use with subsonic
    • i have query: "" set in my plugin config currently

The 'problem' is the run of beets alt update hits the whole library of >90k tracks and takes a very long time to complete.

What do you want to achieve by specifying the query on the CLI?

A way to achieve a non-destructive, partial update of the tracks managed by this plugin. Essentially what I'm doing in my above process for chromaprint, replaygain and similar plugins.

Is the problem only that it takes long to go through the entire library?

This is the very reason I opened the ticket. Scanning through my library takes a very long time.

What would be the behavior of setting the query on the CLI?
How would it relate with the query defined in the configuration?

If named something like 'partial processing query' or similar I'd expect it to and the cli parameter with the config parameter or override the config parameter.

So long as I can execute a partial, non-destructive update, I can work with whatever default behavior and interactions you feel are best for the plugin.

@geigerzaehler
Copy link
Owner

Thanks for the info, that’s very helpful.

What would be the behavior of setting the query on the CLI?
How would it relate with the query defined in the configuration?

If named something like 'partial processing query' or similar I'd expect it to and the cli parameter with the config parameter or override the config parameter.

Having the CLI query to select a subset of the alternative collection (i.e. and) seems the way to go. (Overriding, on the other hand, may lead to undesired behavior.)

Are you interested in contributing a PR for this? I’d be happy guide you through it!


Additionally I would also like a look at the underlying problem that going through the whole library is slow. If that was solved we wouldn’t need the feature. To get a better idea about this could you measure the time beet alt update subsonic takes without any conversion? (Run it after the conversions are done.) It would also be helpful to know where the files live, i.e. SSD/HDD, internal/external, which filesystem.

@mcrosson
Copy link
Author

Having the CLI query to select a subset of the alternative collection (i.e. and) seems the way to go. (Overriding, on the other hand, may lead to undesired behavior.)

Sounds good to me :)

Are you interested in contributing a PR for this? I’d be happy guide you through it!

I can help with testing but I'm not sure I'd be much help with writing the code. I've never worked with developing beets plugins and would likely need quite a bit of guidance. If you're willing to help me sort the fundamentals and point me in a good direction to start with the code, I can make an attempt.

Additionally I would also like a look at the underlying problem that going through the whole library is slow. If that was solved we wouldn’t need the feature.

My entire beets library is slow. Short of the main beets project making major improvements to the querying, it cannot be solved via this plugin. There are a few performance based items open in the main beets project that will likely help address some of this if/when they get implemented.

My library is also additive, even if the performance problems are sorted 'today', I'd still run into them again in the future. I think adding the additional query cli argument, per the above, is the better solution both now and in the future.

To get a better idea about this could you measure the time beet alt update subsonic takes without any conversion? (Run it after the conversions are done.) It would also be helpful to know where the files live, i.e. SSD/HDD, internal/external, which filesystem.

hardware

  • cpu: Intel Xeon E5-2620 v3 @ 2.40GHz (hexa core)
  • ram: 64Gb ECC
  • lsi sas hba for disks
  • 7200rpm disks (non flash / non ssd)
  • zfs in a raid-z configuration (equivalent to raid-5)

software

storage use

  • source music: 2.34Tb
  • alternatives output: 715Gb
  • beets library db: 709Mb

beets library stats (note i run this plugin against the whole library)

beets@container:/opt/music$ beets-library stats
Tracks: 93294
Total time: 37.8 weeks
Approximate total size: 2.3 TiB
Artists: 3064
Albums: 6248
Album artists: 1358

alt update runtime

beets@container:/opt/music$ time beets-library alt update airsonic
real    32m35.985s
user    12m57.217s
sys     19m31.579s

@geigerzaehler
Copy link
Owner

geigerzaehler commented Jul 12, 2024

alt update runtime

Wow 🤯 That’s terrible 😅

My entire beets library is slow. Short of the main beets project making major improvements to the querying, it cannot be solved via this plugin. There are a few performance based items open in the main beets project that will likely help address some of this if/when they get implemented.

That’s good to know. So any effort to improve performance would be better spent on beets core than here.

I can help with testing but I'm not sure I'd be much help with writing the code. I've never worked with developing beets plugins and would likely need quite a bit of guidance. If you're willing to help me sort the fundamentals and point me in a good direction to start with the code, I can make an attempt.

That’s totally ok. I’m happy to get you started and give you some guidance along the way.

To get started with development, take a look at the dev guide. From the project you can run poetry run beet alt ... and it will use the local version of the plugin. This allows you to test any changes.

As a first step I’d suggest adding a query option to the CLI parser (ArgParse.add_argument) somewhere here and updating the readme to document the new option.

The list of items (i.e. tracks) that should be processed is generated in External._item_actions(). You need to make sure that it uses the combined query from the CLI and the config. But maybe we can combine it somewhere else?

I created a branch that implements a first test case for this feature. You may need to make some adjustments to it to make it work. (https://github.com/geigerzaehler/beets-alternatives/blob/cli-query-tests/test/cli_test.py#L435)

If you get stuck, feel free to reach out to me.

@mcrosson
Copy link
Author

Thank you for the information. I'll dive in at my next opportunity.

@geigerzaehler
Copy link
Owner

@mcrosson Are you still interested in implementing this?

@mcrosson
Copy link
Author

Unfortunately my health issues flared shortly after discussing the above with you and I've been unable to look at this at all. It'll likely be a long period before I could accomplish the code changes necessary.

If someone else would like to implemented the changes, I'm willing to defer. I'm also ok if the ticket gets closed.

@geigerzaehler
Copy link
Owner

I’m sorry to hear about your health issues @mcrosson. I hope you get better soon.

I just wanted to follow up to see if you needed any help or had something in the works already. Because I may take a stab at this. So I’ll definitely keep this open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants