-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update and expansion of the BfG spectral data #275
Conversation
@olessmn and @ksjewell. Thanks for the updates of the BAFG records. I just run the validation workflow and there are some encoding issues by German umlaute and obviously the grade symbol. Hence, I would like to ask you to:
There is no need to close this pull request. As soon you push the corrected data, the PR will update automatically. Best, |
Thank you @tsufz for the quick feedback! Best wishes, |
@olessmn, Txs, |
Thank you for your contribution. Im pretty sure we get this data in, but we need to clarify a few things and then modify the submission according to your information. I will assist you in this process. You wrote that you corrected a number of datasets and added 646 new spectra. The correction is in the area of the metadata, like cas or InChI. I checked manually a few files and I will now guess what you have done. You removed all files from your last submission and replaced them with new files from your new submission. For your ACCESSION you use a code put together from a few letters, a date and a running number. Because you created a new batch of files they have now a new date resulting in a new accession, eg MSBNK-BAFG-CSL2311091 -> MSBNK-BAFG-CSL2501101 and MSBNK-BAFG-CSL2311092 -> MSBNK-BAFG-CSL2501102. The content of the dataset stays the same, especially the measurement is the same. This of course only applies to the spectra from your last contribution. If this is the case, we dont want to change the ACCESSION, but just modify the content. If you actually would want to remove your old data we would add a line at the top of each file which says DEPRECATED and give a reason for the deprecation. Here is an example. The dataset would stay there as a tombstone to prevent re-usage of that id. The reason for this is easy to explain. After the release of the data, users could refer to the data by ACCESSION and if we shift around our ACCESSION and/or data this will give a mess. If you want I can assist you in fixing this. I would suggest, that we move the data which was in your last contribution back to its old name and release the 646 new spectra with the new name. If you agree I could do that for you and in this process also make the small changes Tobias has mentioned. BTW we dont want So please tell me if you agree that I tear apart your contribution and rearrange it as described. The contribution of course belongs to you guys, but it might happen that some of your commits will vanish from the GIT log and reappear with my username. I hope thats not an issue for you. |
Hi @meier-rene, hi @tsufz, Thank you for your response and support! Sorry, I should have clarified this better in the beginning: Yes, in the initial (and the following submissions) I removed all files and replaced them with the updated files (re-generating all files upon changes is easier for me). However, I missed that the ACCESSION shouldn't be changed once a file is in the MassBank repository. I changed my code accordingly, so ACCESSION strings of existing MassBank files won't be updated when I export the files (see explanation below). I made the following changes and reuploaded all files:
For the ACCESSION we use the format Example 1:
The Example 2:
Please let me know if there are any other issues, and thank you for your patience. Best wishes, |
Excellent contribution. In the last Validation report there were just 4 minor issues left with the molecular formula of MSBNK-BAFG-CSL25011734709-11. It was given as negatively charged formula while it should be neutral. I will make a manual quick fix. |
You mentioned in your records, there were created with "Export with pycsl 0.1.0.dev1 and CSL 25.0". Can you point me to that resource or is it an in-house solution? I couldn't find anything. |
Hi @meier-rene,
Regarding the export information: pycsl 0.1.0.dev1 is the Python program we are currently developing to perform various operations on the CSL (e.g., exporting all BfG spectra, as in this case). It is currently for in-house use only, but we plan to release it on GitHub later this year. We included these two information in the And thank you for fixing the last issues! Ole |
We are very happy with your contribution. I was just curious about the software, because I never heard about it and I couldn't find anything. If you release this software later for public use I would be happy if you could ping me again and then we would investigate if it will be useful for others. |
Dear MassBank Team,
we have made several updates and additions to our spectral data. This pull request includes the following changes:
All txt-files were re-generated and replaced the previous files.
Please let us know if you have any feedback or require further adjustments.
Thank you!
Best regards,
@olessmn & @ksjewell