You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pandas.read_excel read file in random access way, it does a lot of seek and read calls.
I suspected if on first HTTP request we read all file contents, subsequent read calls will be from some internal buffer,
but I still see that library under the hood continue to make HTTP requests inside small bytes range which already was read on 1-st HTTP request.
Can we improve it? Can we skip additional HTTP request if we already have all needed data from 1-st HTTP request?
smart_open's main use case is streaming. If your application does a lot of seeking, then it may be better for you to handle buffering separately (e.g. using tempfile).
Ideally, yes, smart_open would be smart enough to buffer the contents of the stream itself, but how do you determine the ideal size of the buffer? Automatically? Using some sort of parameter? It's a fair bit of work.
As for me it can be any buffer size with some LRU mechanism.
The main idea was - don't re-read data from upstream if it's already was read recenently as much as possible.
Yes I agree, it's can be pretty complex task which complicate librabry too much and can entroduce new errors.
Hello,
As for me
smart_open
http module can improve buffering, please look on code sample:pandas.read_excel
read file in random access way, it does a lot ofseek
andread
calls.I suspected if on first HTTP request we read all file contents, subsequent
read
calls will be from some internal buffer,but I still see that library under the hood continue to make HTTP requests inside small bytes range which already was read on 1-st HTTP request.
Can we improve it? Can we skip additional HTTP request if we already have all needed data from 1-st HTTP request?
Versions
Checklist
Before you create the issue, please make sure you have:
The text was updated successfully, but these errors were encountered: