FIX: update functional support fallback logic for a DPNP/DPCTL ndarray inputs #2113

samir-nasibli · 2024-10-15T12:15:11Z

Description

Stock scikit-learn only support such Array API inputs as cupy.array_api, array-api-strict, cupy, and PyTorch [scikit-learn's Array API support]. Host numpy copies of the input data will be used for the fallback cases, since stock scikit-learn doesn't support DPCTL usm_ndarray and DPNP ndarray

Aligning with Documentation PR

Proposed changes

update the logic for scikit-learn fallback condition
Logger enhanced for patch message when data transferred to host

Checklist to comply with before moving PR from draft:

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have added a respective label(s) to PR if I have a permission for that.
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

Performance

I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
I have provided justification why performance has changed or why changes are not expected.
I have provided justification why quality metrics have changed or why changes are not expected.
I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

host numpy copies of the inputs data will be used for the fallback cases, since stock scikit-learn doesn't support DPCTL usm_ndarray and DPNP ndarray

samir-nasibli · 2024-10-15T13:23:34Z

/intelci: run

samir-nasibli · 2024-10-15T17:44:01Z

/intelci: run

icfaust · 2024-10-16T09:30:39Z

@samir-nasibli Unfortunately there is a subtlety to "support" vs "known to work" which is described in the linked sklearn documentation. What seems to be the issue is that dpnp/dpctl are not sufficiently array-api compliant, and array-api-compat does not cover them. If that is the case, could you provide an example of a failure in your description using dpnp and/or dpctl? We can then link this to those development teams.

If its the case, we need to make a ticket, and then a comment in the code would be nice. Its not that the sycl_usm_array by design can't work with sklearn, more so that the two sycl_usm_array packages DPNP/DPCTL aren't sufficiently developed yet. This will be an important distinction which we must navigate when they become fully compliant.

samir-nasibli · 2024-10-16T22:41:38Z

@samir-nasibli Unfortunately there is a subtlety to "support" vs "known to work" which is described in the linked sklearn documentation. What seems to be the issue is that dpnp/dpctl are not sufficiently array-api compliant, and array-api-compat does not cover them. If that is the case, could you provide an example of a failure in your description using dpnp and/or dpctl? We can then link this to those development teams.

If its the case, we need to make a ticket, and then a comment in the code would be nice. Its not that the sycl_usm_array by design can't work with sklearn, more so that the two sycl_usm_array packages DPNP/DPCTL aren't sufficiently developed yet. This will be an important distinction which we must navigate when they become fully compliant.

Good point. I added the comment in the code.

I have already tried enable dpctl for scikit-learn array api conformance testing. See ced43bf
Generally, scikit-learn testing is not well designed for dpctl kind usm ndarray inputs ced43bf#r1792608548 and also, as you already mentioned dpctl doesn't fully follow Array API, it failed for linalg module.

DPCTL and DPNP maintainers already working on enabling these array for Array API.

samir-nasibli · 2024-10-16T22:42:44Z

@samir-nasibli Unfortunately there is a subtlety to "support" vs "known to work" which is described in the linked sklearn documentation. What seems to be the issue is that dpnp/dpctl are not sufficiently array-api compliant, and array-api-compat does not cover them. If that is the case, could you provide an example of a failure in your description using dpnp and/or dpctl? We can then link this to those development teams.
If its the case, we need to make a ticket, and then a comment in the code would be nice. Its not that the sycl_usm_array by design can't work with sklearn, more so that the two sycl_usm_array packages DPNP/DPCTL aren't sufficiently developed yet. This will be an important distinction which we must navigate when they become fully compliant.

Good point. I added the comment in the code.

I have already tried enable dpctl for scikit-learn array api conformance testing. See ced43bf Generally, scikit-learn testing is not well designed for dpctl kind usm ndarray inputs ced43bf#r1792608548 and also, as you already mentioned dpctl doesn't fully follow Array API, it failed for linalg module.

DPCTL and DPNP maintainers already working on enabling these array for Array API.

Additionally, I can share some thoughts on follow up. What on our side could be done:

a. Explicitly test in sklearnex array_api tests the primitives that are used for array api enabled estimators on our side for dpctl inputs (for dpnp with some workarounds).

Currently, on the PRs ENH: Array API dispatching #2096 ENH: DBSCAN via Array API based on #2096 #2100 ENH: PCA via Array API based on #2096 #2106 I am already doing this.

b. We should provide patch for sklearn testing for dpnp/dpctl inputs based on the comment ced43bf#r1792608548 to enable conformance testing on our side for dpnp/dpctl inputs, while it wasn't added to stock scikit-learn. Already covered by ticket.

samir-nasibli · 2024-10-16T22:42:54Z

/intelci: run

icfaust · 2024-10-18T07:34:38Z

I was sort of hoping we could leave a reminder to ourselves

@pytest.mark.allow_sklearn_fallback
@pytest.mark.skipif(not sklearn_check_version("1.4"), reason="requires sklearn array_api support")
@pytest.mark.parametrize("dataframe,queue", get_dataframes_and_queues("dpnp,dpctl", "cpu"))
def test_sua_fallback_support(dataframe, queue, monkeypatch):
    '''This test verifies that sycl usm arrays must be handled separately'''
    import onedal._device_offload as _do

    def no_usm_data(func):
       def wrapper(*args, **kwargs):
            _, q, hostdata = func(*args, **kwargs)
            return False, q, hostdata
       return wrapper

    monkeypatch.setattr(_do, "_transer_to_host", no_usm_data(_do._transfer_to_host))

    from sklearnex import config_context
    from sklearnex.decomposition import PCA
    import numpy.random

    with config_context(array_api_dispatch=True):
        est = PCA()
        # set data to unsupported dimensions shape[1]/shape[0] < 2
        data = numpy.random.rand(99, 50)
        usm_data = _convert_to_dataframe(data, sycl_queue=queue, target_df=dataframe)

        # dpnp and dpctl should raise failures that should be caught here
        # if not, then we need to revert the check in sklearnex._device_offload
        # this still needs to have a check for dpnp (not sure what error is raised)
        with pytest.raises(AttributeError):
            est.fit(usm_data)

Its not a complete test, I am not sure what gets done on dpnp side for error, and I think I monkeypatched right, not 100% sure

samir-nasibli · 2024-10-18T14:53:32Z

I was sort of hoping we could leave a reminder to ourselves

@pytest.mark.allow_sklearn_fallback
@pytest.mark.skipif(not sklearn_check_version("1.4"), reason="requires sklearn array_api support")
@pytest.mark.parametrize("dataframe,queue", get_dataframes_and_queues("dpnp,dpctl", "cpu"))
def test_sua_fallback_support(dataframe, queue, monkeypatch):
    '''This test verifies that sycl usm arrays must be handled separately'''
    import onedal._device_offload as _do

    def no_usm_data(func):
       def wrapper(*args, **kwargs):
            _, q, hostdata = func(*args, **kwargs)
            return False, q, hostdata
       return wrapper

    monkeypatch.setattr(_do, "_transer_to_host", no_usm_data(_do._transfer_to_host))

    from sklearnex import config_context
    from sklearnex.decomposition import PCA
    import numpy.random

    with config_context(array_api_dispatch=True):
        est = PCA()
        # set data to unsupported dimensions shape[1]/shape[0] < 2
        data = numpy.random.rand(99, 50)
        usm_data = _convert_to_dataframe(data, sycl_queue=queue, target_df=dataframe)

        # dpnp and dpctl should raise failures that should be caught here
        # if not, then we need to revert the check in sklearnex._device_offload
        # this still needs to have a check for dpnp (not sure what error is raised)
        with pytest.raises(AttributeError):
            est.fit(usm_data)

Its not a complete test, I am not sure what gets done on dpnp side for error, and I think I monkeypatched right, not 100% sure

Thank you @icfaust for suggestion but I think the target of such testing not clear since the behavior of the backend is undefined for the inputs.
I would suggest to cover this branch for usm inputs with tickets and enable conformance testing first and when it will be done we can remove it from the conditions.

For now since this is not properly tested for dpnp/dpctl inputs on both scikit-learn or scikit-learn-intelex sides I am good to go.

samir-nasibli · 2024-10-18T15:20:19Z

/intelci: run

icfaust · 2024-10-21T04:44:00Z

I was sort of hoping we could leave a reminder to ourselves

@pytest.mark.allow_sklearn_fallback
@pytest.mark.skipif(not sklearn_check_version("1.4"), reason="requires sklearn array_api support")
@pytest.mark.parametrize("dataframe,queue", get_dataframes_and_queues("dpnp,dpctl", "cpu"))
def test_sua_fallback_support(dataframe, queue, monkeypatch):
    '''This test verifies that sycl usm arrays must be handled separately'''
    import onedal._device_offload as _do

    def no_usm_data(func):
       def wrapper(*args, **kwargs):
            _, q, hostdata = func(*args, **kwargs)
            return False, q, hostdata
       return wrapper

    monkeypatch.setattr(_do, "_transer_to_host", no_usm_data(_do._transfer_to_host))

    from sklearnex import config_context
    from sklearnex.decomposition import PCA
    import numpy.random

    with config_context(array_api_dispatch=True):
        est = PCA()
        # set data to unsupported dimensions shape[1]/shape[0] < 2
        data = numpy.random.rand(99, 50)
        usm_data = _convert_to_dataframe(data, sycl_queue=queue, target_df=dataframe)

        # dpnp and dpctl should raise failures that should be caught here
        # if not, then we need to revert the check in sklearnex._device_offload
        # this still needs to have a check for dpnp (not sure what error is raised)
        with pytest.raises(AttributeError):
            est.fit(usm_data)

Its not a complete test, I am not sure what gets done on dpnp side for error, and I think I monkeypatched right, not 100% sure

Thank you @icfaust for suggestion but I think the target of such testing not clear since the behavior of the backend is undefined for the inputs. I would suggest to cover this branch for usm inputs with tickets and enable conformance testing first and when it will be done we can remove it from the conditions.

For now since this is not properly tested for dpnp/dpctl inputs on both scikit-learn or scikit-learn-intelex sides I am good to go.

Then send me the list of tickets and lets get this merged. I think its ready to go.

samir-nasibli · 2024-10-21T15:27:59Z

I was sort of hoping we could leave a reminder to ourselves

@pytest.mark.allow_sklearn_fallback
@pytest.mark.skipif(not sklearn_check_version("1.4"), reason="requires sklearn array_api support")
@pytest.mark.parametrize("dataframe,queue", get_dataframes_and_queues("dpnp,dpctl", "cpu"))
def test_sua_fallback_support(dataframe, queue, monkeypatch):
    '''This test verifies that sycl usm arrays must be handled separately'''
    import onedal._device_offload as _do

    def no_usm_data(func):
       def wrapper(*args, **kwargs):
            _, q, hostdata = func(*args, **kwargs)
            return False, q, hostdata
       return wrapper

    monkeypatch.setattr(_do, "_transer_to_host", no_usm_data(_do._transfer_to_host))

    from sklearnex import config_context
    from sklearnex.decomposition import PCA
    import numpy.random

    with config_context(array_api_dispatch=True):
        est = PCA()
        # set data to unsupported dimensions shape[1]/shape[0] < 2
        data = numpy.random.rand(99, 50)
        usm_data = _convert_to_dataframe(data, sycl_queue=queue, target_df=dataframe)

        # dpnp and dpctl should raise failures that should be caught here
        # if not, then we need to revert the check in sklearnex._device_offload
        # this still needs to have a check for dpnp (not sure what error is raised)
        with pytest.raises(AttributeError):
            est.fit(usm_data)

Its not a complete test, I am not sure what gets done on dpnp side for error, and I think I monkeypatched right, not 100% sure

Thank you @icfaust for suggestion but I think the target of such testing not clear since the behavior of the backend is undefined for the inputs. I would suggest to cover this branch for usm inputs with tickets and enable conformance testing first and when it will be done we can remove it from the conditions.
For now since this is not properly tested for dpnp/dpctl inputs on both scikit-learn or scikit-learn-intelex sides I am good to go.

Then send me the list of tickets and lets get this merged. I think its ready to go.

Done

icfaust

lets go!

FIX: update functional support fallback logic a little bit

36cafcd

host numpy copies of the inputs data will be used for the fallback cases, since stock scikit-learn doesn't support DPCTL usm_ndarray and DPNP ndarray

samir-nasibli marked this pull request as ready for review October 15, 2024 13:02

samir-nasibli requested review from Alexsandruss and icfaust as code owners October 15, 2024 13:02

samir-nasibli requested a review from Vika-F October 15, 2024 13:02

samir-nasibli requested a review from ethanglaser October 15, 2024 21:23

samir-nasibli added 2 commits October 16, 2024 14:45

Merge branch 'main' into fix/functional_array_api

196318b

Added a clarifying comment

b6012c1

samir-nasibli added 2 commits October 18, 2024 06:52

Enhanced patch message for data transfer

4c742f9

removed debug code

e48a2db

return back accidently removed primitive

cff9056

samir-nasibli mentioned this pull request Oct 19, 2024

[EXPERIMENT] Enabling dpctl.tensor for scikit-learn array API conformance testing #2120

Closed

13 tasks

icfaust approved these changes Oct 22, 2024

View reviewed changes

samir-nasibli merged commit c0eb5ad into uxlfoundation:main Oct 22, 2024
26 checks passed

samir-nasibli deleted the fix/functional_array_api branch October 22, 2024 08:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: update functional support fallback logic for a DPNP/DPCTL ndarray inputs #2113

FIX: update functional support fallback logic for a DPNP/DPCTL ndarray inputs #2113

samir-nasibli commented Oct 15, 2024 •

edited

Loading

samir-nasibli commented Oct 15, 2024

samir-nasibli commented Oct 15, 2024

icfaust commented Oct 16, 2024 •

edited

Loading

samir-nasibli commented Oct 16, 2024

samir-nasibli commented Oct 16, 2024

samir-nasibli commented Oct 16, 2024

icfaust commented Oct 18, 2024 •

edited

Loading

samir-nasibli commented Oct 18, 2024

samir-nasibli commented Oct 18, 2024

icfaust commented Oct 21, 2024 •

edited

Loading

samir-nasibli commented Oct 21, 2024

icfaust left a comment

FIX: update functional support fallback logic for a DPNP/DPCTL ndarray inputs #2113

FIX: update functional support fallback logic for a DPNP/DPCTL ndarray inputs #2113

Conversation

samir-nasibli commented Oct 15, 2024 • edited Loading

Description

Proposed changes

samir-nasibli commented Oct 15, 2024

samir-nasibli commented Oct 15, 2024

icfaust commented Oct 16, 2024 • edited Loading

samir-nasibli commented Oct 16, 2024

samir-nasibli commented Oct 16, 2024

samir-nasibli commented Oct 16, 2024

icfaust commented Oct 18, 2024 • edited Loading

samir-nasibli commented Oct 18, 2024

samir-nasibli commented Oct 18, 2024

icfaust commented Oct 21, 2024 • edited Loading

samir-nasibli commented Oct 21, 2024

icfaust left a comment

Choose a reason for hiding this comment

samir-nasibli commented Oct 15, 2024 •

edited

Loading

icfaust commented Oct 16, 2024 •

edited

Loading

icfaust commented Oct 18, 2024 •

edited

Loading

icfaust commented Oct 21, 2024 •

edited

Loading