-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW]: cifkit: A Python package for coordination geometry and atomic site analysis #7205
Comments
Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks. For a list of things I can do to help you, just type:
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
|
|
Software report:
Commit count by author:
|
Paper file info: 📄 Wordcount for ✅ The paper includes a |
License info: ✅ License found: |
Review checklist for @espottesmithConflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
|
Review checklist for @ml-evsConflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
|
Paper comments:
Code comments:
Documentation comments:
Overall comments:
|
This package exposes some convenience and utility functions on top of the gemmi CIF parser, for handling coordination and bonding analysis, polyhedron visualisation and data cleaning for primarily inorganic crystal structures represented as CIFs. The headline is that I am not sure this package is sufficiently robust to the various intricacies of CIF (or at least, is not well-documented enough to cover them). I've raised some specific technical issues at bobleesj/cifkit#27. I'll try not to repeat @espottesmith's points (which I broadly agree with), so here's my first round of comments.
|
@espottesmith @ml-evs, thank you for testing the package and for the valuable feedback. I will address each point after internalizing the feedback and reflecting on it in the next iteration. |
@editorialbot add @lancekavalsky as reviewer Thanks for agreeing to review, Lance! Adding you over here so you can get started. |
@lancekavalsky added to the reviewers list! |
Review checklist for @lancekavalskyConflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
|
In this work, the Code comments:
Documentation comments:
Paper comments:
Overall, this work would make a great addition to JOSS pending the minor revisions described above. |
@lancekavalsky Thank you for your testing and all of the comments to improve the package and the manuscript! Please allow me to address each point when a new update is ready. |
Hi @bobleesj, just wanted to check in on how the updates are going! |
@rkurchin Thanks for checking in with me. I have made progress and addressed about halfway through. Would you please allow me to respond by the third week of October? |
Hi there @bobleesj, checking in on this again! |
@rkurchin Yes, the features addressed by the reviewers have been implemented. Please look out for our responses by today or, at the latest, tomorrow! |
Response to @espottesmithPaper comments:
Thank you for your suggestion. The summary now incorporates what *.cif files are: `cifkit` provides higher-level functions and properties for coordination
geometry and atomic site analysis from .cif files, which are standard file
formats for storing crystallographic data such as atomic fractional coordinates,
symmetry operations, and unit cell dimensions.
This is a valid point. I have modified the "Statement of Need" and emphasized the utility of higher-level functions by modifying the section as follow: `cifkit` distinguishes itself from existing libraries by offering higher-level
functions and variables that allow solid-state synthesists to obtain intuitive and
measurable properties impactful properties. It facilitates the visualization of
coordination geometry from each site using four coordination determination
methods and extracts physics-based features like volume and packing
efficiency, which are crucial for structural analysis in machine learning tasks...
Thank you! Code comments:
Modified. Thank you.
In
I have added numpy-style docstrings for the main classes Documentation comments:
Thank you for your suggestion. I have now maintained the instance name
I have added the statement of need in the beginning of the docs: https://bobleesj.github.io/cifkit/.
Thank you. I have created a PR template https://github.com/bobleesj/cifkit/blob/main/.github/pull_request_template.md and instructions on how to contribute here https://github.com/bobleesj/cifkit/blob/main/CONTRIBUTING.md. Overall comments:
Thank you. For one of the projects mentioned, SAF (Structure/Composition Analyzer), which uses approximately 100 features generated using cifkit, we have a pre-print version available: https://chemrxiv.org/engage/chemrxiv/article-details/670aa269cec5d6c142f3b11a. I will continue to maintain the software for on-going research efforts.
Yes, .cif files must be provided for us to conduct analysis.
I have added information to the documentation stating that processing approximately 10,000 .cif files on a standard laptop (iMac with M1 chip) takes about 30 to 60 minutes. At this rate, we can process nearly all .cif files within 1–2 days. However, I have decided not to include performance metrics in the manuscript. Our code may evolve further—for instance, by using matrix multiplications to compute distances, as in ASE—and performance hasn't been a major bottleneck for our research.
Thank you for your review! |
Response to @lancekavalskyCode comments:
Thank you for the details. Yes, the histogram images were saved to the Anaconda environment, as that was where the .cif files were provided by default in the installed package. In the docs, I have included information about options to set the output path for users for clarity: # Optional: Specify the output directory where the .png file will be saved.
ensemble.generate_site_mixing_type_histogram(output_dir="path/to/directory")
# Optional: Call plt.show() to display the histogram on screen.
ensemble.generate_site_mixing_type_histogram(display=False) For API doc users, I have included docstrings: def generate_supercell_size_histogram(
self, display=False, output_dir=None
):
"""Generate a histogram of the 'supercell_count' property from CIF files.
This method creates a histogram based on the 'supercell_count' statistics of
the CIF files. If 'output_dir' is specified, the histogram image (.png) will be
saved to that directory. If 'output_dir' is not specified, the image will be saved
to the directory specified by 'self.dir_path'.
Parameters
----------
display : bool, optional
If True, the plot is displayed using plt.show(). Default is False.
output_dir : str, optional
The directory path where the histogram should be saved. If None,
the histogram is saved in the directory defined by 'self.dir_path'.
"""
Per your suggestion, I have added NumPy-style docstrings to the core
Thank you. I have addressed the issue via this PR: bobleesj/cifkit#47. To ensure compatibility, I created a test function to ensure that raw .cif files sourced from ICSD, COD can be parsed and the supercell can be generated: @pytest.mark.parametrize(
"cif_folder_path, expected_file_count, expected_supercell_stats",
[
("tests/data/cif/sources/ICSD", 4, {216: 2, 307: 1, 320: 1}),
("tests/data/cif/sources/COD", 2, {519: 1, 1383: 1}),
("tests/data/cif/sources/MP", 2, {108: 1, 594: 1}),
("tests/data/cif/sources/PCD", 1, {364: 1}),
("tests/data/cif/sources/MS", 1, {2988: 1}),
("tests/data/cif/sources/CCDC", 1, {3844: 1}),
],
) Documentation comments:
Thank you. I have included docstrings for the class CifEnsemble:
def __init__(
self,
cif_dir_path: str,
add_nested_files=False,
preprocess=True,
logging_enabled=False,
) -> None:
"""Initialize a CifEnsemble object, containing a collection of Cif
objects.
Parameters
----------
cif_dir_path : str
Path to the folder path containing .cif file(s).
add_nested_files : bool, optional
Option to include .cif files contained in sub-directories within cif_dir_path
, by default False
preprocess : bool, optional
Option to edit .cif files before initializing each into a Cif object,
by default True. Preprocess modifies atomic site labels in
atom_site_label. Some site labels may contain a comma or a symbol like M
due to atomic mixing. It reformats each atom_site_label so it can be
parsed into an element type matching atom_site_type_symbol. For PCD
databases, addresses in publ_author_address often have an incorrect
format requiring manual modifications. It also relocates any ill-formatted
files, such as those with duplicate labels in atom_site_label, missing
fractional coordinates, or files requiring supercell generation.
logging_enabled : bool, optional
Option to log while pre-processing Cif objects, by default False
Attributes
----------
dir_path: str
Path to the folder containing .cif files
file_paths: list[str]
List of file paths to .cif files
cifs: list[Cif]
List of Cif objects
file_count: int
Number of .cif files in the folder
logging_enabled: bool
Option to log while pre-processing Cif objects
"""
I have revised the documentation to enhance clarity and personally tested each example to ensure accuracy. Additionally, I have included comments indicating the location of default Example files provided in the package for first-time users: # In `cifkit` we provide .cif files that can be accessed through `from cifkit import Example` as shown below. For advancuser, these example .cif files are located under `src/cifkit/data` in the package.
from cifkit import Example
from cifkit import Cif
# Initialize with the example file provided
cif = Cif(Example.Er10Co9In20_file_path)
...
I have included detailed documentations in the API. https://github.com/bobleesj/cifkit/blob/dbaf32400b70f323ba5965526193704b2613ea7b/src/cifkit/models/cif.py#L619 and https://github.com/bobleesj/cifkit/blob/dbaf32400b70f323ba5965526193704b2613ea7b/src/cifkit/models/cif.py#L682.
Thank you for your feedback. I have replaced the name "Notebooks" but into 4 sections:
I have included a PR request template (https://github.com/bobleesj/cifkit/blob/main/.github/pull_request_template.md) as well as the Paper comments:
Please kindly read (longer) comment below in response to @ml-evs point
Thank you for this feedback. I have added higher-level functions that are considered novel in our package and included more explanatory comments.
Thank you for your review and recommendation! |
Response to @ml-evs
This is a valid comment considering most users end up using the packages that are built on top of
Based on reviewer feedback and internal discussions, we concluded that our package's strength lies not in high-throughput processing (ASE and pymatgen excel at this), but in its specific features for "coordination geometry" and "atomic site analysis." These features are crucial for experimental research and are often tediously acquired from GUI-based software. Consequently, I've modified the manuscript title to "cifkit: A Python package for coordination geometry and atomic site analysis" and removed "high-throughput." To clarify, our current implementation doesn't include batch processing. We process each .cif file individually. With approximately 1,000 atoms in the supercell, processing takes about 0.3 seconds for one file and 3,000 atoms—less than 3 seconds total. I've added to the documentation that processing roughly 10,000 .cif files on a standard laptop (iMac with M1 chip) takes about 30 to 60 minutes. At this rate, we can process nearly all .cif files within 1–2 days. However, we've decided not to include performance metrics in the manuscript. Our code may evolve further—perhaps using matrix multiplications to compute distances, as in ASE—and performance hasn't been a major bottleneck for our research. Regarding error handling, we've processed tens of thousands of binary and ternary .cif files, primarily from PCD and thousands from ICSD. We've aimed to identify the most common errors to ensure we can use as many .cif files from the database as possible. Of course, there are some cases where .cif files lack space group operations or fractional coordinates. In such instances, we move those files into separate folders before initializing them into objects. We've also tested with COD, CCSD, and other datasets to encounter more diverse errors, which users can report later for me to address.
Thank you for your suggestion. All dependency pinnings have been removed, and CI now supports Python 3.10, 3.11, and 3.12. Additionally, the mention of Codecov has been removed.
Based on your feedback, cifkit now supports CIF files directly downloaded from ICSD, COD, PCD, CCDC, and those created with Materials Studio. I have retained the
This is greatly appreciated. The Gemmi has been cited, along with PyVista, another JOSS paper. Additionally, citations for NumPy, SciPy, and Matplotlib have been included.
Thank you. Added in the manuscript.
Since Gemmi provided sufficient parsing capabilities for our tasks with minor fixes done by cifkit such as such as adding “#” to the first line to ICSD .cif files, we have not incorporated additional CIF parsers.
Yes, cikfit and its applications are actively used by approximately 4-5 research groups. The SAF and CBA modules each have their own descriptions in respective journal publications, making it challenging to consolidate them under a broad `cifkit' umbrella. We would prefer to serve as an 'engine' that provides advanced functionality. Therefore, we do not want to combine packages into one. However, features will continue to be developed on demand, primarily for experimental groups led by Prof. Oliynyk, both within his group and through his active collaborations. |
@rkurchin How can I re-render the manuscript here? It's in the |
![]() Fixed! |
@editorialbot set 10.5281/zenodo.14086943 as archive |
Done! archive is now 10.5281/zenodo.14086943 |
@bobleesj can you clarify what the contributions of Anton Oliynyk were in that case? We usually do want the authors of the archive and the paper to match if possible, though I believe there can be exceptions made; @openjournals/bcm-eics can perhaps jump in in that regard. |
Anton Oliynyk's contributions include not limited to conceptually designing the specific methods embedded in the applications mentioned in the manuscript. He also validated these functions using GUI-based software and also provided feedback when new bugs are encountered and how to navigate the problem together. |
Not sure if |
@editorialbot recommend-accept |
|
|
👋 @openjournals/bcm-eics, this paper is ready to be accepted and published. Check final proof 👉📄 Download article If the paper PDF and the deposit XML files look good in openjournals/joss-papers#6131, then you can now move forward with accepting the submission by compiling again with the command |
@bobleesj as AEiC for JOSS I will now help to process this submission for acceptance in JOSS. Below are some final checks I'm processing, some of which may need your attention. Checks on repository
Checks on review issue
Checks on archive
Checks on paper
Remaining points:As you can see, most seems in order, however the below are some points that require your attention 👇 :
|
@Kevin-Mattheus-Moerman Thank you!
Now, it is updated:
Now
|
@editorialbot generate pdf |
@Kevin-Mattheus-Moerman I also created a PR openjournals/joss#1394 here since one of the example |
@editorialbot accept |
|
Ensure proper citation by uploading a plain text CITATION.cff file to the default branch of your repository. If using GitHub, a Cite this repository menu will appear in the About section, containing both APA and BibTeX formats. When exported to Zotero using a browser plugin, Zotero will automatically create an entry using the information contained in the .cff file. You can copy the contents for your CITATION.cff file here: CITATION.cff
If the repository is not hosted on GitHub, a .cff file can still be uploaded to set your preferred citation. Users will be able to manually copy and paste the citation. |
🐘🐘🐘 👉 Toot for this paper 👈 🐘🐘🐘 |
🦋🦋🦋 👉 Bluesky post for this paper 👈 🦋🦋🦋 |
🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! 🚨🚨🚨 Here's what you must now do:
Any issues? Notify your editorial technical team... |
Thanks to everyone for your valuable time! @rkurchin @espottesmith @ml-evs @lancekavalsky @Kevin-Mattheus-Moerman |
@bobleesj congratulations on this JOSS publication! @rkurchin thanks for editing! And a special thanks to the reviewers: @espottesmith, @ml-evs, @lancekavalsky !! |
🎉🎉🎉 Congratulations on your paper acceptance! 🎉🎉🎉 If you would like to include a link to your paper from your README use the following code snippets
This is how it will look in your documentation: We need your help! The Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:
|
Just wanted to share the donation link doesn't seem to be appearing and it has been addressed here: openjournals/joss#1396 tagging @xuanxu as well |
Submitting author: @bobleesj (Sangjoon Lee)
Repository: https://github.com/bobleesj/cifkit
Branch with paper.md (empty if default branch): main
Version: 1.0.6
Editor: @rkurchin
Reviewers: @espottesmith, @ml-evs, @lancekavalsky
Archive: 10.5281/zenodo.14086943
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@espottesmith, @ml-evs, and @lancekavalsky, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review.
First of all you need to run this command in a separate comment to create the checklist:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @rkurchin know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Checklists
📝 Checklist for @espottesmith
📝 Checklist for @ml-evs
📝 Checklist for @lancekavalsky
The text was updated successfully, but these errors were encountered: