-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find the closest matches does not find the closest matches #120
Comments
Should this issue rather be posted at the current PR-release on test? |
No, I suspect it is not introduced by that PR, so better to capture like this. |
Yes, I'll leave it here, as I have noticed the issue all along while testing the other PR's as well. |
Operationally, the ultimate measure of success of listFix() is going to be determined by how effective it is in finding the closest matches for mismatched playlist tracks. Speed and accuracy are the two essential ingredients to this. At the moment, speed is sufficient, but accuracy lacks — by a wide margin. Since accuracy is crucial to successful outcomes, I propose that this issue is put up next for a fix. |
You do the ground work for this one @touwys, good example where the matching algorithm currently flaws (does not pick the best result) are very useful. I have to idea's to improve the matching algorithm:
|
You do the ground work for this one @touwys, good example where the matching algorithm currently flaws (does not pick the best result) are very useful.
Yes, but whether I'm up to the task is quite another matter. It's like traversing the proverbial labyrinth. A useful start for me would be if you could tap into the existing algorithm, and "translate" its current train of reasoning for me. Once I have that as the base, I can then measure, and build upon it. How simple, or complex, do we want this to be?
…----------------------------------------
20 Apr 2023 21:04:59 Borewit ***@***.***>:
You do the ground work for this one @touwys[https://github.com/touwys], good example where the matching algorithm currently flaws (does not pick the best result) are very useful.
I have to idea's to improve the matching algorithm:
1. I prefer the track from the same folder, if a reasonable match is found in the same folder
2. As the parent folder(s) name(s) could represent the title, or artist, I think we could take those into account
—
Reply to this email directly, view it on GitHub[#120 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/APVPQQUQXZJ3AZCQO5JHONLXCGCFXANCNFSM6AAAAAAVJSDJEQ].
You are receiving this because you were mentioned.[Tracking image][https://github.com/notifications/beacon/APVPQQXTYM7BVIONFGJRCELXCGCFXA5CNFSM6AAAAAAVJSDJESWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTS2NCXF2.gif]
|
@Borewit, on second thoughts, we have to pay this issue long, and careful attention, because there is more to it than casually meets the eye. The supremacy of listFix() as an app, stands completely on the quality of its search for matching tracks — how closely the search results match the originals of a fractured playlist. Now, while it probably constitutes a challenge bigger than it's worth considering, don't you think that we should replace the current search model with one deploying the music file tags? Searching the tags provides many more options with which to improve the accuracy of the algorithm. |
That idea crossed my mind a few times. |
Based on: #106 (comment) The "closest match" is purely based on the filename portion of the audio track, so excluding the parent folder. listFix/src/main/java/listfix/util/FileNameTokenizer.java Lines 26 to 29 in 441fa4e
This score function is chopping the file name into words, e.g.:
Based on that score the matches are sorted, and the highest scored matches are kept. |
The "closest match" is purely based on the filename portion of the audio track, so excluding the parent folder.
Thank you. I was in the midlle of a lengthy reply to your previous post, when this one arrived. I can cut it to the following:
1. Would it not be natural to take the parent folder into account?
2. Since the music metadata is not available, what, if any, other (hopefully useful) data is getting saved along with the filename? Surely, the file creation date, modification date, and, especially, the file size, and such, are also getting saved? (Apart from the actual file content, how else is file-synchronisation achieved?) The question is, if, and how, listFix() can also make use of these during its search to find the closest matches? If, for instance, the **file size** is also available to us, I think it can be a most useful parameter when comparing files to find a perfect, or close, match.
3. Another point to consider for improved accuracy, especially in as far as media libraries may contain mixed music file formats (for e.g. both FLAC & MP3), is to restrict the search to the orginal format. This can probably be achieved by an optional setting. More on this, later.
•••
…----------------------------------------
29 Apr 2023 10:19:14 Borewit ***@***.***>:
A useful start for me would be if you could tap into the existing algorithm, and "translate" its current train of reasoning for me.
Based on: #106 (comment)[#106 (comment)]
The "closest match" is purely based on the filename portion of the audio track, so excluding the parent folder.
https://github.com/Borewit/listFix/blob/441fa4e2b8a3c5b8510b95fbfa291897310cf21d/src/main/java/listfix/util/FileNameTokenizer.java#L26-L29
This score function is chopping the file name into words, e.g.: *"01 Madonna - Like a Prayer.mp3"* becomes something like: *["01", "Madonna" , "Like", "Prayer"]*.
*scoreMatchingTokens* function is then comparing these words, in each track in your library, also converted to a similar list of words. Then a score is basically calculated comparing those sets of words.
https://github.com/Borewit/listFix/blob/a452d74c0cab53ef7f3ea2a42b3185c3e84b59d4/src/main/java/listfix/util/FileNameTokenizer.java#L81
Based on that score the matches are sorted, and the highest scored matches are kept.
—
Reply to this email directly, view it on GitHub[#120 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/APVPQQTYC5H7MGUGCTCIAN3XDTFIFANCNFSM6AAAAAAVJSDJEQ].
You are receiving this because you were mentioned.[Tracking image][https://github.com/notifications/beacon/APVPQQSMKB4FGJ2B4HSIJPDXDTFIFA5CNFSM6AAAAAAVJSDJESWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTS3DZLQU.gif]
|
The point is, there is no original file to compare with @touwys. Let assume you have the following M3U playlist:
Yet that file does not exist at that location, since I moved to: So we only have the path, no size, no tags, not anything. Neither File size is reliable indicator, as this is strongly related to the encoding of an audio file. You may want to restore a playlist going replacing mp3 tracks with FLAC. |
Thanks, thus I stand corrected.
…----------------------------------------
29 Apr 2023 11:48:21 Borewit ***@***.***>:
The point is, there is no original file to compare with @touwys[https://github.com/touwys].
Let assume you have the following M3U playlist:
*#M3U
C:\Users\Borewit\Music\Rodriguez - Rich Folks Hoax.flac
*
Yet that file does not exist at that location, since I moved to:
*C:\Users\Borewit\Music\Rodriguez\1970 - Cold Fact\Rich Folks Hoax.flac*
So we only have the path, no size, no tags, not anything. Neither File size is also not a reliable indicator, as this is strongly related to the encoding of an audio file. You may want to restore a playlist going replacing mp3 tracks with FLAC.
—
Reply to this email directly, view it on GitHub[#120 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/APVPQQTCMARROKSBIO6HW33XDTPWJANCNFSM6AAAAAAVJSDJEQ].
You are receiving this because you were mentioned.[Tracking image][https://github.com/notifications/beacon/APVPQQRX2IDCGOFS36X6W5TXDTPWJA5CNFSM6AAAAAAVJSDJESWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTS3D2YQK.gif]
|
Then I probably don't understand point 2 of #120 (comment). |
Your reply was spot-on. It is I who am moving on very unfamiliar terrain as far as untangling the intricacies of Windows, and other software operations and their interconnectedness, are concerned. I took note that there is literally nothing else to work with, than the basic filename. If the horse is dead already, how many ways are left to saddle it? The salient question is, is it yet possible to improve upon the quality of the listFix() search results? The restrictions laid on by the file name ("what" to search for), don't apply to the method of search ("how" to search), and this could be the more fruitful avenue of investigation. ••• |
This is an issue observed throughout testing:
The issue here is that the matched tracks, which are found by listFix(), do not even remotely match, or resemble, the original track. It should be noted that this occurs too often, but not in every instance.
The main thrust of the issue, however, is that these total mismatches occur while there are numerous, almost identical, copies known to exist in the Media Directory.
Also refer to the discussion here.
The text was updated successfully, but these errors were encountered: