Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add progress log to HTTP downloads #6065

Draft
wants to merge 2 commits into
base: dev-2.x
Choose a base branch
from

Conversation

leonardehrenfried
Copy link
Member

Summary

This adds a progress log for HTTP downloads which give you a bit of visibillity when OTP does large downloads.

Issue

Unit tests

Copy link

codecov bot commented Sep 13, 2024

Codecov Report

Attention: Patch coverage is 14.28571% with 6 lines in your changes missing coverage. Please review.

Project coverage is 69.79%. Comparing base (738969f) to head (2928fbd).

Files with missing lines Patch % Lines
...ipplanner/datastore/https/HttpsFileDataSource.java 0.00% 4 Missing ⚠️
...anner/datastore/https/HttpsDataSourceMetadata.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             dev-2.x    #6065      +/-   ##
=============================================
- Coverage      69.79%   69.79%   -0.01%     
  Complexity     17368    17368              
=============================================
  Files           1964     1964              
  Lines          74427    74430       +3     
  Branches        7632     7632              
=============================================
  Hits           51947    51947              
- Misses         19831    19834       +3     
  Partials        2649     2649              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -20,7 +20,7 @@
*/
public class HttpsDataSourceRepository implements DataSourceRepository {

private static final Logger LOG = LoggerFactory.getLogger(HttpsFileDataSource.class);
private static final Logger LOG = LoggerFactory.getLogger(HttpsDataSourceRepository.class);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

Comment on lines +93 to +99
return ProgressTracker.track(
"Downloading %s".formatted(uri.toString()),
1000,
size(),
in,
m -> LOG.info(m)
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently progress tracking is done one abstraction level up. E.g. this will result in double tracking of OSM loading data. The benefit of adding this one level up, is that we have better/more detailed control of the context like use-case, expected speed and source. If added to the data-source it should be added as a generic feature that is opt-in and with configurable log message. Adding it to one DataSource and not the others is inconsistent. If I load it from HTTP I get tracking, but not if I load it from Google Cloud. Since we have few data-sources I think keeping this responsibility out of the DataSource component is a good thing.

I suggests searching for the use of the asInputStream() and asOutputStream() and add progress tracking those locations instead - where the content is expected to take more than 5 seconds for a big source. In places where it is small, I would not add it - since it has a small overhead(synchronization - ignorable if we are fetching data outside mem).

Another reason for not adding it the the data-source is that the progress tracker always log, so for things that are so small that we do not want to track them, the tracker will create noice.

Another way to identify places to add tracking is to look through the logs and look for gaps in the logging that is more than 5 sec. If found, it is likely that there is a big task in the gap that shuld be tracked.

@t2gran
Copy link
Member

t2gran commented Sep 17, 2024

When I looked at this I identified on place witch is missing logging - reading the graph:

@SuppressWarnings("Convert2MethodRef")
  public static SerializedGraphObject load(DataSource source) {
    return load(source.asInputStream(), source.path());
    InputStream in = ProgressTracker.track("Reading graph '" + source.name() + "'",
      64*1024,
      source.size(),
      source.asInputStream(),
      // Keep lambda, do not convert to method-ref. Make the logger
      // report the correct class & line number.
      m -> LOG.info(m)
      );
    return load(in, source.path());
  }

(SerializedGraphObject:124)

@t2gran t2gran added this to the 2.7 (next release) milestone Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants