-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative query integral image #426
Conversation
Found issue with parameters to randint, including 0 in the range can cause inefficient searches (or random garbage since c_search_for_space uses C arrays) Fixed by changing random number generation to: random.randint(1, hits) ------------------------------------------- xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ------------------------------------------- - Created subfunctions with cdef- c_search_for_space c_iterate_for_coordinates Profiling with line_profiler shows that these are faster however it's been difficult to do good timeit testing with timeit on my computer due to specs ------------------------------------------- xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ------------------------------------------- Created test_subfunctions for unit testing purposes
Added unit tests for c_search_for_space
Thanks for the contribution. Since we recently transition to pytest for writing our tests. Would it be possible to:
To execute the test, run
Waiting, we make a decision on #424 , you have to locally regenerate the file Then, testing will work as expected because we run |
Thanks @jcfr ! I'll give it a go. |
Tests number of hits on blank np.ndarray for: word size = (x, y) word size smaller than array.shape word size = array.shape[0] - 1, array.shape[1] - 1 word size = array.shape[0] + 1, array.shape[1] + 1
Changed the string passed to subprocess.check_call to be inclusive for windows users as it failed during my testing
Thanks for the follow up. To address the coding style issue locally, you can run:
in the top level directory of the project. |
Hmm. Seems like there's still a fair amount of work to do with this - I'll download CircleCI and try and work on it! Thanks for the suggestions @jcfr ! Closing for now. |
Sounds good. It may also be worth sharing some timing result. That would help @amueller assess if moving forward with the approach is sensible. |
With this setup:
Running timeit (10000 times) yields:
yields: Not sure about the dramatic difference in lprun but the jupyter notebook I used is here: I originally was attempting to implement an array storage of hit coordinates but found that this was reasonably fast! |
@Aierie cool! Can you maybe run the timing using an actual word-cloud? |
I've tried going with an actual wordcloud, and apparently it actually makes it slower.
I don't really understand why - any idea? I suppose this means that it's not too viable an approach.
I found this while working through the code: Edit: returns (0, 0) assuming the area is occupied |
Did you get a chance to do some timing comparison within and without this optimization ? |
Sorry, I'm confusing myself. 😬 i'm mixing up my code results with the original. With the original the worst that could happen is an overlap. I need a break I think 😂 |
Found a small kink in the goal generator - including 0 is inefficient
(and creates garbage with my subfunctions)
Tried cdef subfunctions to improve speed!
Also added a unit test file (WIP, not sure of the best way to handle Cython in unit testing).
I'm not very experienced in package creation, so I'm passing this over for a quick look. If it looks okay I'll give it a go!