Skip to content

Numpy HPy migration notes: blockers and concerns

Stepan Sindelar edited this page Jul 13, 2022 · 7 revisions
  • NumPy uses the METH_FASTCALL | METH_KEYWORDS convention and has its own argument parser for that
  • metaclass support for heap types is missing in CPython GitHub issue for this
  • tp_vectorcall is not supposed to be used for heap types
  • NumPy accesses tp_ slots directly. Edit: not an issue since NumPy is moving away from that.
    • to compare them (We could provide bool HPyType_CheckSlot(HPyContext*, HPy, HPyDef expected))
    • to read things like tp_name, tp_base, tp_dict
    • for fast paths bypassing some CPython logic (e.g.: getting attribute without raising exception if missing)

Migration path concerns:

  • NumPy API: expose second capsule and header(s) with HPy based APIs?
    • legacy NumPy APIs would eventually delegate to the HPy versions
    • opportunity to get rid of legacy NumPy APIs/do some NumPy API cleanup
  • global caches
    • there are some global (as in C level global variables) caches, those need some indirection, e.g.: store them in capsule accessed via HPyGlobal, but then the cost of loading the HPy from HPyGlobal and loading of the capsule contents (2 HPy API calls) may spoil the caching benefit altogether
    • can be solved by module state and "arg clinic" which would directly pass the module state as an argument

Architecture/code style concerns:

  • PyArrayObject* -> HPy removes type information and type checking
    • Numpy uses the struct types in many helper/infrastructure functions, changing all those to HPy removes the type information, which makes the code less pleasant to work with and more error prone
    • Sometimes it is desirable to pass around additional argument for PyArrayObject* alongside the HPy handle if the struct was already retrieved - this is cumbersome
  • Some ideas:
    • generate additional struct that holds both the handle and the struct and helper methods for it, e.g., typedef struct { HPy handle; PyArrayObject *data; } HPyArray;
    • generate additional struct to wrap just the handle to "attach" type information to it + generate conversion helpers
    • depending on whether we see similar patters in other packages, this could be just infrastructure in numpy port codebase or provided by HPy
    • alternatively: always pass two arguments for everything, e.g.: foo(HPyContext *ctx, HPy h_arr, PyArrayObject *arr), where arr can be NULL and the callee will be responsible to lazily initialize it before using it, we can have some convenience macros for that
    • the "arg clinic" can pass the struct as an argument (https://github.com/hpyproject/hpy/issues/129), which would have also performance benefits, but does not solve the question of how to pass that around internal Numpy helper functions
Clone this wiki locally