Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out how to support dynamic data_files #127

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 20 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,9 +108,28 @@ setup(cmdclass=cmdclass)
- This package does not work with the deprecated `python setup.py bdist_wheel` or `python setup.py sdist` commands, PyPA recommends using the [build](https://pypa-build.readthedocs.io/en/latest/index.html) package (`pip install build && python -m build .`).
- We recommend using `include_package_data=True` and `MANIFEST.in` to control the assets included in the [package](https://setuptools.readthedocs.io/en/latest/userguide/datafiles.html).
- Tools like [`check-manifest`](https://github.com/mgedmin/check-manifest) or [`manifix`](https://github.com/vidartf/manifix) can be used to ensure the desired assets are included.
- Simple uses of `data_files` can be handled in `setup.cfg` or in `setup.py`. If recursive directories are needed use `get_data_files()` from this package.
- Simple uses of `data_files` can be handled in `setup.cfg` or in `setup.py`. For more advanced usage, see below paragraph.
- Unfortunately `data_files` are not supported in `develop` mode (a limitation of `setuptools`). You can work around it by doing a full install (`pip install .`) before the develop install (`pip install -e .`), or by adding a script to push the data files to `sys.base_prefix`.

### Advanced Data Files Usage

If the simple data_files support of `setup.cfg` is not sufficient for your needs, use `get_data_files()` from this package. The simplest use case is for including checked in files in a recursive (glob's `**` pattern) manner. In this case, set the `data_files` argument to the `setup()` function in `setup.py` to the output of the `get_data_files()` function.

If you need to include files that are generated during the build, then add the following logic to the end of your `pre_dist` command:
```python
builder = npm_builder()
def build_with_datafiles():
builder()
# needed as some dirs are generated as part of the build (or lazy):
<need reference to dist here!>.data_files = get_data_files([
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the issue. How do we dynamically update the data files here? Do we need to pass a reference of the Command to the builder function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called after the build step, which means the glob should work on files that are there after building.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. To be clear, I expect get_data_files to work here, but the question is what we do with that list of files? What attribute do we set to tell the build system "hey! add these files to the collection of data files" ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets passed to setuptools.setup(). I much prefer the way flit handles data files now: you give it a populated directory and it maps it right to the data directory in the wheel, see #128

Copy link
Member

@minrk minrk Mar 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just ran into this in IPython Parallel. The problem here is that the data files aren't known until after builder() is called, and can change if builder is called to regenerate files already present (see ipython/ipyparallel#675). So the choice is:

  1. run builder() unconditionally before get_data_files() before passing it to setup
  2. run builder() in the normal fashion in pre_dist and then modify dist.data_files. This is why the deprecated implementations had the handle_files. This step is still required to produce correct output, but it has been removed.

It can be helped if these hooks were passed the distribution as an arg, rather than no args.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the simplest (and most flit-like) way to do it is instead of running the builder as a pre_dist, etc. step, run it at the top-level of setup.py (perhaps conditional on some sys.argv to avoid triggering it too often).

That's what I ended up doing in ipython/ipyparallel#680 with extra steps to avoid the unnecessary rebuilds that caused ipython/ipyparallel#675.

  • if generated files are not present: always run the build and skip the custom command classes (e.g. install from git or git-archive)
  • If they are present
    • and running from git: hook up command classes, since they may want rebuilds
    • and not running from git (probably sdist): skip rebuild

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @minrk, I did something similar in jupyterlab/jupyterlab#12208. I agree we should make this cleaner and provide better guidance.

("etc/jupyter", "jupyter-config", "**/*.json"),
])

cmdclass = wrap_installers(
pre_dist=build_with_datafiles,
)
```

## Development Install

```bash
Expand Down