Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore into async or multi processing for faster file processing #1

Closed
2 tasks done
brootware opened this issue Apr 19, 2022 · 2 comments
Closed
2 tasks done
Labels
enhancement New feature or request

Comments

@brootware
Copy link
Owner

brootware commented Apr 19, 2022

Checklist

  • There are no similar reports on existing issues (including closed ones).
  • I was in the master branch of the latest code.

Is your feature request related to a problem? Please describe

For large multiple files, the process time takes (n number of files * time).

Describe the solution you'd like

Explore into either async or multi processing for handling multiple files in parallel.

Describe alternatives you've considered

Threading is one other option

Additional context

@brootware brootware added the enhancement New feature or request label Apr 19, 2022
@brootware
Copy link
Owner Author

Update : Async does not really help as the task is not I/O bound but CPU bound after timing the program run time. Will have to use multiprocessing for multiple files redaction.

python3 pyredactkit.py ip_test.txt  39.31s user 0.16s system 100% cpu 39.412 total

@brootware
Copy link
Owner Author

More update after running benchmarks. Reading and Writing files to disk is already non blocking. The "concurrency" only comes in handy when you're doing requests over the network.

image

For running 5 text files

With threading
poetry run prk logtest  39.58s user 1.15s system 97% cpu 41.678 total

Without threading
poetry run prk logtest  20.05s user 0.43s system 93% cpu 21.897 total

@brootware brootware pinned this issue Jul 30, 2022
brootware pushed a commit that referenced this issue Aug 22, 2022
Added M series & 1 test case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant