Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data/config path entry_points with minimal examples #209

Closed
wants to merge 54 commits into from

Conversation

bollwyvl
Copy link
Contributor

@bollwyvl bollwyvl commented Nov 21, 2020

Background

Jupyter relies on a hierarchy of directories (user-level, environment-level, system-level, etc.) to store configuration and data. These directories are used by a number of Jupyter programs, for example:

  • Most applications based on the traitlets Configurable application class store configuration in JSON files in the configuration directories. They also aggregate conf.d-style configuration from these directories to determine settings of options.
  • Jupyter Notebook extensions copy their javascript assets into a data directory on installation for the server to serve
  • JupyterLab extensions copy their javascript assets into a data directory on installation for the server to serve.

Problem

Currently the environment level of this directory hierarchy is a fixed location based on sys.prefix. This means that packages need to copy their files into this directory at install time, which has several issues:

  • Copying files into a data directory uses the data_files feature of Python packages, which is deprecated in setuptools and is not supported in non-setuptools-based packagers like flit, poetry (see here), etc.
  • Data files are duplicated in the package bundle (once for copying into the data directory, once for being included in the actual package to install into site-packages). For some extensions, this a huge (like megabytes or tens of megabytes).
  • Development installs (pip -e) do not update data files when the source files change, so when developing a package, if something changes to the data files, you either have to copy them over again, or you have to run a command to make the appropriate data directory a symbolic link (not available on some platforms) to the source files.

(Also, it seems that sometimes these data file directories are not deleted. For example, in JupyterLab we actually create files at runtime in the data directory, and I think they don't get deleted when JupyterLab is uninstalled)

Proposed solution

Python has another mechanism that is explicitly designed for plugin systems called entry points. An entry point is a piece of metadata in a package that points to an arbitrary import from the package. This PR changes jupyter_core to look for two specific entry points in any installed package, each pointing to a list of paths, to augment the environment-level Jupyter config directories (the jupyter_config_paths entry point) and data directories (the jupyter_data_paths entry point). The result is:

  • Any package can add new environment-level Jupyter config and data directories. In practice, this means that a package can contain data or configuration in a directory that is installed in its site-packages directory, and can use the entry point to point Jupyter to that internal directory. Since this directory is internal to the package:
    • the files are not duplicated in the package tarball
    • development (un)installs automatically work, since the directory points to an internal directory in the package
    • other python package managers can be used, like poetry using its include/exclude mechanism for files
  • non-Python programs can access this (and all other paths) by shelling out to jupyter --paths --json

Problems with the proposed solution

  • Entry points are based on importing a module to get a value, which potentially could be very expensive. We explore parsing the file first for literal values, and then importing as a last resort, which seems to alleviate this problem in the common case (setuptools does something similar for its attr handler for setup.cfg values).
  • neither entry_point group is cached
    • an interactive installation with e.g. pip install or conda install would be able to update the search path, provided the application isn't doing its own caching...
      • this is important to maintain the observed behavior of data_files
      • because the import system is invoked, users of this system may wish to create a separate python_packages entry for these static assets, to avoid bringing in otherwise-unused runtime dependencies, e.g. pandas
    • adding some debug logging around this will help pinpoint slow startup times
      • turns out there is no logging this deep in the stack. we could either:
        • add a log=None argument to the various calls
        • add a logger controlled by a JUPYTER_CORE_LOGLEVEL
  • if an entry_point is added or (its target is changed) in a package with an editable install, it must be reinstalled
    • however, if only the return value of an existing entry_point is changed, no re-install is required
  • existing tools that were relying on indexing jupyter_*paths()
    • this occurs in the test suite for jupyter_core itself: if one of the example packages is installed, the tests break
    • these will have to be updated to inspect relative positions, e.g. was the user dir loaded before or after the env paths when JUPYTER_PREFER_ENV is set

Alternative solutions

setuptools also provides a way for a package to have custom metadata files in the egg or dist_info directories. This avoids the problems of importing or parsing an arbitrary python file to get the few strings that we need. However, it appears that this arbitrary metadata is not well supported outside of setuptools. See below for some experiments around this approach.

Example

See the setuptools example, specifically

[options.entry_points]
jupyter_config_paths =
entry-point-example-setuptools = entry_point_example_setuptools:JUPYTER_CONFIG_PATHS
jupyter_data_paths =
entry-point-example-setuptools = entry_point_example_setuptools:JUPYTER_DATA_PATHS

  • this approach requires a boilerplate MANIFEST.in and a setup.py in order to be installed from source

and the flit example, specifically

[tool.flit.entrypoints.jupyter_config_paths]
entry-point-example-flit = "entry_point_example_flit:JUPYTER_CONFIG_PATHS"
[tool.flit.entrypoints.jupyter_data_paths]
entry-point-example-flit = "entry_point_example_flit:JUPYTER_DATA_PATHS"

for examples of how to use these entry points.

  • pyproject.toml is the only boilerplate file needed, and generates a setup.py
  • flit can also generate binary reproducible whl files (for python >=3.7) given the same version of flit_core

Original issue description

Hey folks! Thanks for keeping this foundational technology working.

data_files are making me sad enough that I'm willing to bring this up again.

This is a low-downstream-impact way we could allow python packages to not require the ill-supported data_files technique.

To test:

pip install -e .
cd examples/entry_point_example
pip install -e .
jupyter --paths
# should see that development environment in place
pip uninstall entry_point_example
jupyter --paths
# it's gone

I don't know if it really works yet, down the the n-th downstream, but seems it should if they are relying on jupyter_*_dir, and handling multiple paths already.

@maartenbreddels
Copy link

I see this as a good alternative to using data_files without overhauling the config system. I am a bit worried that it's hard to debug when things go wrong (if 15 directories will be scanned). Could we maybe provide a richer debug facility to see a particular config key, and how each directory is changing it. Grepping in 15 directories will not be fun. Or do I see a problem that does not exist, and are the debug options sufficient?

@bollwyvl
Copy link
Contributor Author

Grepping in 15 directories will not be fun

Yep, there will be a lot of directories beyond the Big Four. No doubt some combination of jupyter --paths, jq, and xargs would make grep plausible, but that's no fun!

A JupyterApp base flag like --show-config which every app would inherit is a whacking good idea, even outside of this little draft. It could probably use difflib to generate a decently-readable representation of the config before each file was loaded, and show the final config, perhaps something like:

$> jupyter foo --show-config

environment variables:
- JUPYTER_PREFER_ENV_PATH: not set
- ...

paths:
- /etc/jupyter/jupyter_config.json: not found
...
- ~/my-project/src/my_project/etc/jupyter_foo_config.d/my-project.json:

    + SomeHasTraits:
    +   foo: bar
...
- ~/my-project/src/my_project/.venv/etc/jupyter_config.d/someone-elses-project.json:

      SomeHasTraits:
    -   foo: bar
    +   foo: baz

...
- ./jupyter_foo_config.json: not found

final:

    SomeHasTraits:
      foo: baz

sprinkle in some pygments (if available) and it would be pretty usable.

@maartenbreddels
Copy link

Indeed, exactly what I had in mind, that would help a lot

@bollwyvl
Copy link
Contributor Author

Gah, looking at it: a lot of the complexity is duplicated between jupyter_server and notebook... while both would work with this PR, there's no simple way to add the above config inspection.

Perhaps the better short-term approach would be to invert it, with a separate package/command, e.g. offered jupyter show-config notebook FooHasTraits.bar. I guess this would work by overloading/monkeypatching config_manager_class (gaaah) with an instrumented subclass, and call initialize but not start.

Because of that complexity, this could probably not land here, unless the ConfigManager pattern was brought upstream, which sounds hard to coordinate.

@bollwyvl
Copy link
Contributor Author

bollwyvl commented Nov 22, 2020

I have an unshaeably bad version of this, but it kinda works with notebook, jupyter_server, jupyterlab and voila installed:

getting jupyter_server_config from /etc/jupyter got {} getting jupyter_server_config from /usr/local/etc/jupyter got {} getting jupyter_server_config from /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter Reading file /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/jupyter_server_config.d/jupyterlab.json Reading file /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/jupyter_server_config.d/nbclassic.json Reading file /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/jupyter_server_config.d/voila.json got {'ServerApp': {'jpserver_extensions': {'jupyterlab': True, 'nbclassic': True, 'voila.server_extension': True}}} getting jupyter_server_config from /home/weg/.jupyter got {} getting page_config from /etc/jupyter/labconfig got {} getting page_config from /usr/local/etc/jupyter/labconfig got {} getting page_config from /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/labconfig got {} getting page_config from /home/weg/.jupyter/labconfig got {} [I 2020-11-22 17:50:37.177 ServerApp] jupyterlab | extension was successfully linked. getting jupyter_notebook_config from /home/weg/.jupyter got {} getting jupyter_notebook_config from /etc/jupyter got {} getting jupyter_notebook_config from /usr/local/etc/jupyter got {} getting jupyter_notebook_config from /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter Reading file /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/jupyter_notebook_config.d/jupyterlab.json Reading file /home/weg/projects/jupyter_showconfig_/envs/default/etc/jupyter/jupyter_notebook_config.d/voila.json got {'NotebookApp': {'nbserver_extensions': {'jupyterlab': True, 'voila.server_extension': True}}} getting jupyter_notebook_config from /home/weg/.jupyter got {} [I 2020-11-22 17:50:37.322 ServerApp] nbclassic | extension was successfully linked. [I 2020-11-22 17:50:37.322 ServerApp] voila.server_extension | extension was successfully linked. [I 2020-11-22 17:50:37.339 LabApp] JupyterLab extension loaded from /home/weg/projects/jupyter_showconfig_/envs/default/lib/python3.7/site-packages/jupyterlab [I 2020-11-22 17:50:37.339 LabApp] JupyterLab application directory is /home/weg/projects/jupyter_showconfig_/envs/default/share/jupyter/lab [I 2020-11-22 17:50:37.342 ServerApp] jupyterlab | extension was successfully loaded. [I 2020-11-22 17:50:37.345 ServerApp] nbclassic | extension was successfully loaded. [I 2020-11-22 17:50:37.347 ServerApp] voila.server_extension | extension was successfully loaded.

Update: here's some better stuff, generated with rich:

 op     ┃ section_name                                      ┃ path                              ┃ old_value ┃ new_value                                                  
━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 stage  │                                                   │                                   │           │ before-init                                                
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 patch  │                                                   │ io.open                           │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 patch  │                                                   │ BaseJSONConfigManager.get         │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 stage  │                                                   │                                   │           │ before-constructor                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 stage  │                                                   │                                   │           │ after-constructor                                          
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_server_config                             │ /etc/jupyter                      │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_server_config                             │ /etc/jupyter                      │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_server_config                             │ /usr/local/etc/jupyter            │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_server_config                             │ /usr/local/etc/jupyter            │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_server_config                             │ $SYS_PREFIX/etc/jupyter           │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 open   │ $SYS_PREFIX/etc/jupyter/jupyter_server_config.d   │ jupyterlab.json                   │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 open   │ $SYS_PREFIX/etc/jupyter/jupyter_server_config.d   │ nbclassic.json                    │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 open   │ $SYS_PREFIX/etc/jupyter/jupyter_server_config.d   │ voila.json                        │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_server_config                             │ $SYS_PREFIX/etc/jupyter           │           │ {                                                          
        │                                                   │                                   │           │   "ServerApp": {                                           
        │                                                   │                                   │           │     "jpserver_extensions": {                               
        │                                                   │                                   │           │       "jupyterlab": true,                                  
        │                                                   │                                   │           │       "nbclassic": true,                                   
        │                                                   │                                   │           │       "voila.server_extension": true                       
        │                                                   │                                   │           │     }                                                      
        │                                                   │                                   │           │   }                                                        
        │                                                   │                                   │           │ }                                                          
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_server_config                             │ $HOME/.jupyter                    │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_server_config                             │ $HOME/.jupyter                    │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ page_config                                       │ /etc/jupyter/labconfig            │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ page_config                                       │ /etc/jupyter/labconfig            │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ page_config                                       │ /usr/local/etc/jupyter/labconfig  │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ page_config                                       │ /usr/local/etc/jupyter/labconfig  │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ page_config                                       │ $SYS_PREFIX/etc/jupyter/labconfig │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ page_config                                       │ $SYS_PREFIX/etc/jupyter/labconfig │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ page_config                                       │ $HOME/.jupyter/labconfig          │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ page_config                                       │ $HOME/.jupyter/labconfig          │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_notebook_config                           │ $HOME/.jupyter                    │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_notebook_config                           │ $HOME/.jupyter                    │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_notebook_config                           │ /etc/jupyter                      │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_notebook_config                           │ /etc/jupyter                      │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_notebook_config                           │ /usr/local/etc/jupyter            │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_notebook_config                           │ /usr/local/etc/jupyter            │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_notebook_config                           │ $SYS_PREFIX/etc/jupyter           │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 open   │ $SYS_PREFIX/etc/jupyter/jupyter_notebook_config.d │ jupyterlab.json                   │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 open   │ $SYS_PREFIX/etc/jupyter/jupyter_notebook_config.d │ voila.json                        │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_notebook_config                           │ $SYS_PREFIX/etc/jupyter           │           │ {                                                          
        │                                                   │                                   │           │   "NotebookApp": {                                         
        │                                                   │                                   │           │     "nbserver_extensions": {                               
        │                                                   │                                   │           │       "jupyterlab": true,                                  
        │                                                   │                                   │           │       "voila.server_extension": true                       
        │                                                   │                                   │           │     }                                                      
        │                                                   │                                   │           │   }                                                        
        │                                                   │                                   │           │ }                                                          
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 get    │ jupyter_notebook_config                           │ $HOME/.jupyter                    │           │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 got    │ jupyter_notebook_config                           │ $HOME/.jupyter                    │           │ {}                                                         
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 change │ kernel_spec_manager                               │ ServerApp                         │           │                                             
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 change │ ssl_options                                       │ ServerApp                         │ {}        │                                                            
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 stage  │                                                   │                                   │           │ after-init                                                 
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 stage  │                                                   │                                   │           │ started                                                    
────────┼───────────────────────────────────────────────────┼───────────────────────────────────┼───────────┼────────────────────────────────────────────────────────────
 stage  │                                                   │                                   │           │ done              

@meeseeksmachine
Copy link

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/how-do-we-uninstall-extensions-that-have-been-installed-using-jupyter-labextension-develop-overwrite/7845/5

Since entry points come from packages installed in the environment, I think it makes sense that they are treated like the environment paths
@jasongrout
Copy link
Member

@bollwyvl - I made a PR to your PR with a few changes I thought would be good: bollwyvl#1. What do you think?

@bollwyvl
Copy link
Contributor Author

bollwyvl commented Mar 10, 2021

importable python attributes
...
if the module to be imported is an empty file

I'll see if i can get that together. It doesn't add any non-stdlib dependencies, and it gives us some wiggle room for the future. One issue with having dotted notation to the left of the : is that the top-level module will get imported:

If name is for a submodule (contains a dot), the parent module is automatically imported.

So I don't know yet how we might avoid the import behavior... i suppose tossing a warning might help folk back onto the happy path of, my_top_level_name:STATICALLY_PARSEABLE_STRING, but there's nothing for it with e.g. namespace modules.

Test to see how this scales

That'll be more fun 😝

@bollwyvl
Copy link
Contributor Author

With 1000 packages (so 2000 entry_points):

220.28ms jupyter_config_path loaded
7.39ms	jupyter_config_path	fake-mod-999
...
5877.88ms jupyter_config_path	TOTAL

251.32ms jupyter_data_path loaded
...
0.22ms	jupyter_data_path	fake-mod-0
6437.48ms jupyter_data_path	TOTAL

10.38user 2.12system 0:12.53elapsed 99%CPU (0avgtext+0avgdata 16536maxresident)k

@bollwyvl
Copy link
Contributor Author

Starting lab:

jupyter lab --no-browser --debug
# a delay
[D 2021-03-09 21:57:16.340 ServerApp] Searching 
[D 2021-03-09 21:58:16.178 ServerApp] 200 GET /api/contents?content=1&1615345084409 (127.0.0.1) 2.36ms

a minute to first pixels isn't too pretty 😢

@bollwyvl
Copy link
Contributor Author

throwing in a little bit of cache helps immeasurably... well, measurably... but i haven't measured it.

def _entry_point_paths(ep_group):
    return _cached_entry_point_paths(ep_group, math.floor(time.time() / 100))

@functools.lru_cache(maxsize=10)
def _cached_entry_point_paths(ep_group, epoch):
    ...

@jasongrout
Copy link
Member

a minute to first pixels isn't too pretty 😢

Ouch. I suppose it does have to open lots of files, which is going to be an even bigger pain on NFS and slower filesystems.

@jasongrout
Copy link
Member

For completeness in documenting discussions in Jupyter around entry points, see also jupyter/notebook#2894.

@jasongrout
Copy link
Member

Unfortunately, as far as I can tell, conda does not support general entry points, just console_script entry points: conda/conda#9951 (see also https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#python-entry-points). I think this means many of the packages in our ecosystem would have problems using entry points, as they provide official conda-forge builds as well.

@bollwyvl
Copy link
Contributor Author

conda does not support general entry points

Put back your pitchforks, No worries here! A number of ecosystems (like pytest) would fall entirely apart. The reason conda-build handles them explicitly is the default wrappers generated for console_scripts are usually broken. So if your package isn't adding a jupyter <mycmd> or something, it wouldn't even have that custom entry under build.

@bollwyvl
Copy link
Contributor Author

bollwyvl commented Mar 11, 2021

Also: tried the PEP 420 namespace package thing... might be a non-starter as totally unsurprisingly the files wouldn't be in place with a pth file (such as on windows) with an editable install. I think won't ever work for flit and won't work well on windows are two pretty damning findings.

@jasongrout
Copy link
Member

Put back your pitchforks, No worries here!

I was just testing things to see if what conda recipes call "entry points" are in reality just "console_script entry points", and if it just left all other entry points alone that were already in the dist_info directories.

...and yes, installing a conda environment with jupyterlab nbconvert vaex and using importlib_metadata reveals lots of entry points.

Pitchfork being sheathed :).

@jasongrout
Copy link
Member

jasongrout commented Mar 12, 2021

FYI @bollwyvl, it looks like to me that if you have many entry points with the same name, the entrypoints package gives you back just one, whereas importlib_metadata gives you back all of them.

Edit: oh, never mind, you just have to use the get_group_all instead of get_group_named.

@jasongrout
Copy link
Member

jasongrout commented Mar 12, 2021

Here are my timings for jupyter --paths (macOS Catalina, 2015 15" macbook pro) after following these instructions from my branch for installing 1000 packages with entry points (so 2000 total entry points): https://github.com/jasongrout/jupyter_core/tree/0310f4a199ba7da60abc54bd9115f7da9a9cec25/examples/scale

Using the entrypoints package:

% jupyter --paths > /dev/null
261.42ms jupyter_config_paths loaded
745.81ms jupyter_config_paths	TOTAL
265.96ms jupyter_data_paths loaded
403.42ms jupyter_data_paths	TOTAL

Using importlib_metadata to get the entry points

% JUPYTER_ENTRY_POINT_IMPORTLIB=1 jupyter --paths > /dev/null
427.97ms jupyter_config_paths loaded
944.01ms jupyter_config_paths	TOTAL
417.37ms jupyter_data_paths loaded
561.50ms jupyter_data_paths	TOTAL

Also, it seems that JupyterLab is slowed down by about a second if the entrypoint paths are cached:

@functools.lru_cache(maxsize=10)
def _entry_point_paths(ep_group):

@jaraco
Copy link

jaraco commented Mar 13, 2021

Because we're catering to other languages, with e.g. jupyter --paths --json, we need these directories to exist after the python process sys.exits.

I should perhaps clarify this in the importlib.resources docs. The access to resources on the file system is meant to be for the duration of the context manager and that any expectation of use outside of that should be implemented downstream. In other words, if having a copy after the interpreter exits is a goal, I'd recommend to build a routine that manages that lifecycle and copies the content to the more permanent location. The Python import system has little control over the state of the system between interpreter runs (including pip uninstalls) and there's no proposed spec that I'm aware that would enable management of resources across runs.

And it sounds like unless we:

* require an as-yet-unreleased new dependency

* implement our own fuse-like filesystem aware of zipballs, e.g. `jupyter_cored`

... the existing importlib_* stuff isn't going to move us towards the goal of simplifying packaging static data assets and cross-language config files.

It does seem like importlib.resources doesn't implement a solution for this use-case. I should point out that importlib.resources supports much more than just the usual FileLoader and ZipLoader, but provides a protocol for other custom loaders (imagine loading modules from a database or from an RPC) to provide resources. Jupyter may not want to support those cases, but if it did, it would need to honor the interface presented by importlib.resources.

I think an entrypoint is... whatever the definer of the entrypoint says it is?

There is a definition for entry points and that definition does state that the value should be an importable module and optional name inside that module :/.

It does feel like mild abuse to violate this stated intention.

If there were a clear and obvious way for a package to expose another form of arbitrary metadata, that would be my recommendation, but I'm not sure if such an approach is readily feasible in the current metadata design, as I've not seen it before.

But I just tested it, and I think this could work. Instead of using entry_points against their design, provide your own metadata file for your hooks. In jaraco/develop@demo-metadata-writer, I've added a egg_info.writer to that project, allowing that project to act as a plugin for setuptools and write additional metadata for any project that includes it during the build.

Then, in the irc project (chosen arbitrarily), I include that as a build dependency and demo how the metadata is in fact reachable at run time:

irc main $ git diff
diff --git a/pyproject.toml b/pyproject.toml
index b6ebc0b..e6e81dd 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,5 +1,5 @@
 [build-system]
-requires = ["setuptools>=42", "wheel", "setuptools_scm[toml]>=3.4.1"]
+requires = ["setuptools>=42", "wheel", "setuptools_scm[toml]>=3.4.1", "jaraco.develop@git+https://github.com/jaraco/jaraco.develop@demo-metadata-writer"]
 build-backend = "setuptools.build_meta"
 
 [tool.black]

irc main $ pip-run -q .
Python 3.9.2 (v3.9.2:1a79785e3e, Feb 19 2021, 09:06:10) 
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

>>> import importlib.metadata as md
>>> md.distribution('irc').read_text('jupyter hooks.txt')
'hook is at foo/bar.py'

You'd still need a way to solicit the exact hooks for each project. I'd recommend soliciting the hooks from a pyproject.toml, something like

[tool.jupyter.hooks]
...

In this way, you're following the same principles as setuptools uses to solicit and expose entry points, but you're defining a custom format for a distinct purpose. You would have to design and implement the syntax for the file and parse it yourselves, but you probably want that anyway. The advantage is you have imminent control over the syntax and experience and you're still using the same metadata mechanism as entry points and other packaging patterns.

I'd be willing to help guide this implementation if it sounds attractive.

@jaraco is this a regression 🤔

Yes, and an intended one with importlib_metadata 3.5. Essentially, in order to deduplicate distributions correctly, the metadata for each distribution needs to be loaded. There are plans in python/importlib_metadata#283 to improve performance in light of that concern.

@jasongrout
Copy link
Member

I should perhaps clarify this in the importlib.resources docs. The access to resources on the file system is meant to be for the duration of the context manager and that any expectation of use outside of that should be implemented downstream. In other words, if having a copy after the interpreter exits is a goal, I'd recommend to build a routine that manages that lifecycle and copies the content to the more permanent location. The Python import system has little control over the state of the system between interpreter runs (including pip uninstalls) and there's no proposed spec that I'm aware that would enable management of resources across runs.

Thanks for weighing in on this. Interestingly, one of the primary reasons for us to move to entry points over using data_files is that Python will manage the lifecycle of these files. Perhaps we're chasing a pipe dream if we need to build something generic enough to support any way a python module might be loaded, but also need the resources to be available outside of Python.

If there were a clear and obvious way for a package to expose another form of arbitrary metadata, that would be my recommendation, but I'm not sure if such an approach is readily feasible in the current metadata design, as I've not seen it before.

Nice, thanks! This looks like the approach I was attempting in the "Alternative Solutions" section in the issue description, in commit jasongrout@66351b0 (however, I was really fumbling to get the metadata out of the distributions, and I'm sure I made some inaccurate assumptions involving top_level.txt, for example). We decided to abandon this approach in favor of entry_points since various packagers like poetry, flit, etc., don't seem to support arbitrary metadata files, and having broad packager support was one of our design goals.

By the way, I've been thinking over the past few days about how to make finding a specific group of entry points potentially faster (I haven't benchmarked any experiments, so of course this should be treated with appropriate skepticism). It seems that getting a specific group of entry points requires reading in and parsing all entry point metadata files in the entire python installation, then filtering for the group I want. My hypothesis is that checking if a file exists is much faster than opening and parsing a file. If each group of entry points was stored in a separate file inside the dist_info/egg directory (for example, as files named by the group in a new entry_points directory), it may be much faster to scan for and parse just the data corresponding to a specific entry point group.

@gaborbernat
Copy link

Worth a try, but my guess is that the largest proportion of the slowness comes from the disk list operation 🤔 but would be great to see some benchmark numbers on discovery vs parsing overhead.

@bollwyvl
Copy link
Contributor Author

Had some other thoughts about our scale issue. And, for reference, a quick look revealed that we are talking about a rough venn diagram of:

  • 5,032 candidate packages on pypi that mention jupyter
  • 833 candidate npm packages that may already be in pypi, or will be

so these scale concerns are not entirely academic bikeshedding.

Regarding benchmarking: yeah, the above were all with entrypoints vs my_module:foo and not even checking (much less importing) what foo was (assuming it to be my_module/foo). I didn't even try the exercise with importlib*. Things were slightly better, once there was some caching in place, but outside of just the startup issues, a number of other parts of the stack, e.g. jinja2.FileSystemLoader and tornado.web.StaticFileHandler were never designed to be used with such large search paths.

@meeseeksmachine
Copy link

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/how-could-data-files-be-improved/8972/2

"""
spec = importlib.util.find_spec(ep.module_name)
module = importlib.util.module_from_spec(spec)
origin = pathlib.Path(module.__file__).parent.resolve()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

module.__file__ is None if there's no top-level __init__.py file in the module

@telamonian
Copy link

I brought this up in the jlab dev call today: https://hackmd.io/Y7fBMQPSQ1C08SDGI-fwtg?both#5-May-2021-Weekly-Meeting

@bollwyvl @jasongrout What's the status of the work on entry_points? After reading through this issue and a lot of related stuff, my sense is that no matter what we do, the fact that implementing jupyter entry_points requires checking 1000s of paths for plugins/config at runtime is going to cause problems we currently don't have

@meeseeksmachine
Copy link

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/package-managers-extension-paths/11723/2

@blink1073
Copy link
Contributor

Closing in favor of using shared-data in hatch. Thanks @bollwyvl and all for pushing on this front!

@blink1073 blink1073 closed this Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants