-
-
Notifications
You must be signed in to change notification settings - Fork 52
c api next level changes
This document presents further details on the changes envisioned in Taking the C API to the Next Level.
A function returns a borrowed reference to a Python object obj if instead of returning a new reference to obj, it loans the caller an existing reference.
This saves the caller from having to call Py_DECREF(obj) but at the cost of exposing the lifetime of the reference as part of the API and preventing the Python implementation from knowing when the caller has finished using the borrowed reference.
As a simple example, imagine that a Python module contained t = (1, 2, 3) and that a Python implementation wished to efficiently store that tuple as int t[] = {1, 2, 3}. PyTuple_GetItem returns a borrowed reference, so calling PyTuple_GetItem(obj_t, 0) would require creating a new reference that could never be freed even though the caller would likely only require it for a short time.
A function steals a reference to a Python object obj when it takes over the responsibility of freeing the reference from the caller.
This exposes the lifetime of the stolen reference as part of the API.
For example, PyList_SetItem steals the reference to the item passed to it. The caller might then continue to use the reference (even though they shouldn't) and rely on the reference continuing to be valid for as long as the list exists.
Stolen references also make it harder to write correct code. Instead of being able to easily check where references are freed by reading the C code, one must also remember the long list of API functions that steal a reference. For example, PyList_SetItem steals a reference, but PyList_Insert and PyList_Append do not.
The current API exposes reference counting via Py_INCREF and Py_DECREF. Implementing the semantics of this API requires maintaining a counter for each object -- i.e. emulating reference counting.
It also requires references to be long-lived -- a reference must be valid for as long as the reference count is non-zero (i.e. for the object's entire lifetime).
It would be better to use an interface that allowed the caller of the API to explicitly communicate its own requirements via obj = Py_I_Need_A_New_Reference(...) and Py_I_Am_Done_With_This_Reference(obj) API functions. These would allow for shorter lived references that can be freed as soon as an individual caller is done with them.
In the existing API, the reference to an object is guaranteed to remain the same and to point to the same location in memory throughout the lifetime of an object. This allows one to conveniently check whether two references are references to the same object using if (ref1 == ref2) in C.
The downsides are that all the references for each object must be identical and must never change during the lifetime of the object.
Since the reference to the object is also a pointer, the objects location in memory and storage must never change.
The existing API exposes the memory layout of Python objects. For example, one can directly access PyListObject.ob_item[i] and PyObject.ob_type.
This makes it difficult to provide alternative implementations of the semantics of Python objects since the existing C memory structures need to be populated and maintained.
Traditionally new types were created by statically defining a PyTypeObject in C. In addition to exposing the memory layout of these types, these static types represent shared global state (within C) and behave differently to types (i.e. classes) defined from within Python code.
Types my also be created dynamically and allocated on the heap. These new types more closely match the behaviour of types defined in Python code and are not shared global state.
Not exposing static types would create a simpler more consistent API and avoid the fixed global state.
Ideally we'd like to expose the semantics of the Python language and avoid exposing implementation details where we can.
For example, the Python code a["x"] looks up item "x" on object a. The code and high-level semantics are the same regardless of whether a is a list, or a dictionary, or a user defined class.
We'd like the C API to reflect this and provide only one set of methods for accessing items in a regardless of the type of a. So, ideally the new API would implement only Py_GetItem and not PyDict_GetItem or PyList_GetItem.
Note:
This is not intended as an excuse to rewrite the C API, only as a guide when making design choices. Any new API should remain familiar to users of the existing API.
The C API should be an interface between the C language and the Python language. Its interfaces should consume and return values with common native C types such as int, char *, double.
We should avoid exposing C structs that are specific to a particular Python implementation.
For example, Py_GetItem_i(obj, i) that looks up C long on a reference to a Python object is a good API function because it provides Python language semantics via a generic C interface.
In contrast, Py_Type returns a PyTypeObject which is less good because it is an complex structure defined by a particular Python implementation version. A better Py_Type would return an ordinary reference to the object's type (i.e. a PyObject * in the existing C API and a more opaque reference in the new API).
C extension code implicitly executes inside a particular interpreter. Explicitly providing this context to API functions and C extension methods will allow C code to access the correct interpreter without having to maintain static global state.
For example, if the context were passed as ctx then the constants such as None or ValueError might be retrieved with ctx->None or ctx->ValueError and the C extension could be sure it had the correct instances for the interpreter it is executing under.
- 5 September 2024
- 4 April 2024
- 7 March 2024
- 1 February 2024
- 11 January 2024
- 7 December 2023
- 9 November 2023
- 5 October 2023
- 14 September 2023
- 3 August 2023
- 6 July 2023
- 1 June 2023
- 4 May 2023
- 13 April 2023
- 2 March 2023
- 2 February 2023
- 12 January 2023
- 1 December 2022
- 3 November 2022
- 6 October 2022
- 8 September 2022
- 4 August 2022
- 7 July 2022
- 2 June 2022
- 5 May 2022
- 7 April 2022
- 3 March 2022
- 3 February 2022
- 13 January 2022
- 2 December 2021
- 4 November 2021
- 7 October 2021
- 2 September 2021
- 12 August 2021
- 8 July 2021
- 6 May 2021
- 4 March 2021
- 7 January 2021
- 3 December 2020
- 5 November 2020