-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: update functional support fallback logic for a DPNP/DPCTL ndarray inputs #2113
FIX: update functional support fallback logic for a DPNP/DPCTL ndarray inputs #2113
Conversation
host numpy copies of the inputs data will be used for the fallback cases, since stock scikit-learn doesn't support DPCTL usm_ndarray and DPNP ndarray
/intelci: run |
1 similar comment
/intelci: run |
@samir-nasibli Unfortunately there is a subtlety to "support" vs "known to work" which is described in the linked sklearn documentation. What seems to be the issue is that dpnp/dpctl are not sufficiently array-api compliant, and array-api-compat does not cover them. If that is the case, could you provide an example of a failure in your description using dpnp and/or dpctl? We can then link this to those development teams. If its the case, we need to make a ticket, and then a comment in the code would be nice. Its not that the sycl_usm_array by design can't work with sklearn, more so that the two sycl_usm_array packages DPNP/DPCTL aren't sufficiently developed yet. This will be an important distinction which we must navigate when they become fully compliant. |
Good point. I added the comment in the code. I have already tried enable dpctl for scikit-learn array api conformance testing. See ced43bf DPCTL and DPNP maintainers already working on enabling these array for Array API. |
Additionally, I can share some thoughts on follow up. What on our side could be done: a. Explicitly test in sklearnex array_api tests the primitives that are used for array api enabled estimators on our side for dpctl inputs (for dpnp with some workarounds).
b. We should provide patch for sklearn testing for dpnp/dpctl inputs based on the comment ced43bf#r1792608548 to enable conformance testing on our side for dpnp/dpctl inputs, while it wasn't added to stock scikit-learn. Already covered by ticket. |
/intelci: run |
I was sort of hoping we could leave a reminder to ourselves @pytest.mark.allow_sklearn_fallback
@pytest.mark.skipif(not sklearn_check_version("1.4"), reason="requires sklearn array_api support")
@pytest.mark.parametrize("dataframe,queue", get_dataframes_and_queues("dpnp,dpctl", "cpu"))
def test_sua_fallback_support(dataframe, queue, monkeypatch):
'''This test verifies that sycl usm arrays must be handled separately'''
import onedal._device_offload as _do
def no_usm_data(func):
def wrapper(*args, **kwargs):
_, q, hostdata = func(*args, **kwargs)
return False, q, hostdata
return wrapper
monkeypatch.setattr(_do, "_transer_to_host", no_usm_data(_do._transfer_to_host))
from sklearnex import config_context
from sklearnex.decomposition import PCA
import numpy.random
with config_context(array_api_dispatch=True):
est = PCA()
# set data to unsupported dimensions shape[1]/shape[0] < 2
data = numpy.random.rand(99, 50)
usm_data = _convert_to_dataframe(data, sycl_queue=queue, target_df=dataframe)
# dpnp and dpctl should raise failures that should be caught here
# if not, then we need to revert the check in sklearnex._device_offload
# this still needs to have a check for dpnp (not sure what error is raised)
with pytest.raises(AttributeError):
est.fit(usm_data) Its not a complete test, I am not sure what gets done on dpnp side for error, and I think I monkeypatched right, not 100% sure |
Thank you @icfaust for suggestion but I think the target of such testing not clear since the behavior of the backend is undefined for the inputs. For now since this is not properly tested for dpnp/dpctl inputs on both scikit-learn or scikit-learn-intelex sides I am good to go. |
/intelci: run |
Then send me the list of tickets and lets get this merged. I think its ready to go. |
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets go!
Description
Stock scikit-learn only support such Array API inputs as
cupy.array_api, array-api-strict, cupy, and PyTorch
[scikit-learn's Array API support]. Host numpy copies of the input data will be used for the fallback cases, since stock scikit-learn doesn't support DPCTL usm_ndarray and DPNP ndarrayAligning with Documentation PR
Proposed changes
Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing
Performance