New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

feature: new tests added for tsne to expand test coverage #2229

Open

yuejiaointel wants to merge 7 commits into uxlfoundation:main from yuejiaointel:expand_tsne_test_coverage

+230 −0

yuejiaointel commented Dec 17, 2024 •

edited by icfaust

Loading

Description

Added additional tests in sklearnex/manifold/tests/test_tsne.py to expand the test coverage for t-SNE algorithm.

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.


          feature: new tests added for tsne to expand test coverage

c686edd

yuejiaointel requested review from Alexsandruss, samir-nasibli and icfaust as code owners

December 17, 2024 18:18

codecov bot commented Dec 17, 2024 •

edited

Loading

Codecov Report

All modified and coverable lines are covered by tests ✅

Flag	Coverage Δ
github	`83.18% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

ethanglaser marked this pull request as draft

December 17, 2024 19:02


          test: additional test for gpu and golden data embedding test for tsne

f3f5223

icfaust reviewed

View reviewed changes

sklearnex/manifold/tests/test_tsne.py Outdated Show resolved Hide resolved

Author

yuejiaointel commented Dec 18, 2024

/intelci: run

yue.jiao added 3 commits

December 18, 2024 08:10


          fix: fix format by running black and isort test_tsne.py

10da764


          fix: const test check shape instead of str output

2f3e9fa


          fix: test removing raise error test

739a90c

yuejiaointel marked this pull request as ready for review

December 19, 2024 00:00

Contributor

ethanglaser commented Dec 19, 2024

/intelci: run

ethanglaser requested review from Vika-F and ethanglaser

December 19, 2024 06:53

david-cortes-intel reviewed

View reviewed changes

sklearnex/manifold/tests/test_tsne.py Show resolved Hide resolved

sklearnex/manifold/tests/test_tsne.py Outdated Show resolved Hide resolved

sklearnex/manifold/tests/test_tsne.py Outdated

    
                  tsne_perplexity = TSNE(n_components=2, perplexity=9).fit(X_perplexity)

                  assert tsne_perplexity.embedding_.shape == (10, 2)

                  # Test large data

Contributor

david-cortes-intel Dec 19, 2024

It feels like this one is perhaps not needed, considering that there's already a similar test earlier on with shape (100,10).

Author

yuejiaointel Jan 6, 2025

Hi David, I removed this test. Best, Yue

Contributor

david-cortes-intel Jan 7, 2025

Thanks, although if the "Test reproducibility" test is also checking the shape, it still feels like "Test large data" could be removed altogether since it's not testing anything different. Or does the algorithm have some size-dependent behavior that would change between (50,10) and (1000, 50)`?

sklearnex/manifold/tests/test_tsne.py Outdated Show resolved Hide resolved

sklearnex/manifold/tests/test_tsne.py Outdated Show resolved Hide resolved

sklearnex/manifold/tests/test_tsne.py Outdated Show resolved Hide resolved

sklearnex/manifold/tests/test_tsne.py Outdated Show resolved Hide resolved

yue.jiao added 2 commits

December 19, 2024 08:37


          fix: fix test based on comments

822e614


          fix: parametize basic test, use rng for ramdom datasets for independe…

c6bf0bd

…nt results, merge previous deleted gpu test to complex test

Author

yuejiaointel commented Jan 6, 2025

/intelci: run

david-cortes-intel reviewed

View reviewed changes

sklearnex/manifold/tests/test_tsne.py

    
                          False,

                      ),

                      (

                          "Test reproducibility",

Contributor

david-cortes-intel Jan 7, 2025

I think the reproducibility test here was lost.

sklearnex/manifold/tests/test_tsne.py

    
                          assert (

                              embedding.shape == expected_shape

                          ), f"{description}: Incorrect embedding shape."

                          if device_filter == "gpu":

Contributor

david-cortes-intel Jan 7, 2025

I think this doesn't need to be specific to GPU.

sklearnex/manifold/tests/test_tsne.py Outdated

    
                  tsne_perplexity = TSNE(n_components=2, perplexity=9).fit(X_perplexity)

                  assert tsne_perplexity.embedding_.shape == (10, 2)

                  # Test large data

Contributor

david-cortes-intel Jan 7, 2025

Thanks, although if the "Test reproducibility" test is also checking the shape, it still feels like "Test large data" could be removed altogether since it's not testing anything different. Or does the algorithm have some size-dependent behavior that would change between (50,10) and (1000, 50)`?

david-cortes-intel reviewed

View reviewed changes

sklearnex/manifold/tests/test_tsne.py

    
                  "description,X_generator,n_components,perplexity,expected_shape,should_raise",

                  [

                      (

                          "Test basic functionality",

Contributor

david-cortes-intel Jan 7, 2025

pytest has a built-in placeholder for parametrization names - see for example here: https://docs.pytest.org/en/stable/example/parametrize.html#different-options-for-test-ids

sklearnex/manifold/tests/test_tsne.py

    
                          assert np.any(

                              embedding != 0

                          ), f"{description}: Embedding contains only zeros."

                      except Exception as e:

Contributor

david-cortes-intel Jan 7, 2025

This try-except could be removed after switching to pytest named parametrizations.

sklearnex/manifold/tests/test_tsne.py

    
                      ),

                      (

                          "Edge Case: Sparse-Like High-Dimensional Data",

                          lambda rng: rng.random((50, 500)) * (rng.random((50, 500)) > 0.99),

Contributor

david-cortes-intel Jan 7, 2025

For less-random reproducibility, perhaps it could add a test where one column is constant or an exact duplicate of another column.

Contributor

david-cortes-intel commented Jan 7, 2025

It looks like we don't have any test here nor in daal4py that would be checking that the results from TSNE make sense beyond having the right shape and non-missingness.

Since there's a very particular dataset here for the last test, it'd be helpful to add other assertions there along the lines of checking that the embeddings end up making some points closer than others as would be expected given the input data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

david-cortes-intel david-cortes-intel left review comments

icfaust icfaust left review comments

Alexsandruss Awaiting requested review from Alexsandruss Alexsandruss is a code owner

samir-nasibli Awaiting requested review from samir-nasibli samir-nasibli is a code owner

Vika-F Awaiting requested review from Vika-F

ethanglaser Awaiting requested review from ethanglaser

At least 1 approving review is required to merge this pull request.

Labels

None yet