Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add progress log to HTTP downloads #6065

Draft
wants to merge 2 commits into
base: dev-2.x
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import org.apache.hc.core5.http.Header;
import org.apache.hc.core5.http.HttpHeaders;
import org.opentripplanner.datastore.api.DataSource;
import org.opentripplanner.framework.text.FileSizeToTextConverter;
import org.opentripplanner.framework.tostring.ToStringBuilder;

/**
Expand Down Expand Up @@ -87,8 +88,8 @@ public String toString() {
return ToStringBuilder
.of(this.getClass())
.addObj("contentType", contentType)
.addObj("contentLength", contentLength)
.addObj("lastModified", lastModified)
.addObj("contentLength", FileSizeToTextConverter.fileSizeToString(contentLength))
.addObj("lastModified", Instant.ofEpochMilli(lastModified).toString())
.toString();
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
*/
public class HttpsDataSourceRepository implements DataSourceRepository {

private static final Logger LOG = LoggerFactory.getLogger(HttpsFileDataSource.class);
private static final Logger LOG = LoggerFactory.getLogger(HttpsDataSourceRepository.class);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch


private static final Duration HTTP_HEAD_REQUEST_TIMEOUT = Duration.ofSeconds(20);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
import org.opentripplanner.datastore.file.ZipFileDataSource;
import org.opentripplanner.framework.io.OtpHttpClient;
import org.opentripplanner.framework.io.OtpHttpClientFactory;
import org.opentripplanner.framework.logging.ProgressTracker;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

Expand Down Expand Up @@ -89,7 +90,13 @@ public InputStream asInputStream() {
throw new IllegalStateException(e.getLocalizedMessage(), e);
}
} else {
return in;
return ProgressTracker.track(
"Downloading %s".formatted(uri.toString()),
1000,
size(),
in,
m -> LOG.info(m)
);
Comment on lines +93 to +99
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently progress tracking is done one abstraction level up. E.g. this will result in double tracking of OSM loading data. The benefit of adding this one level up, is that we have better/more detailed control of the context like use-case, expected speed and source. If added to the data-source it should be added as a generic feature that is opt-in and with configurable log message. Adding it to one DataSource and not the others is inconsistent. If I load it from HTTP I get tracking, but not if I load it from Google Cloud. Since we have few data-sources I think keeping this responsibility out of the DataSource component is a good thing.

I suggests searching for the use of the asInputStream() and asOutputStream() and add progress tracking those locations instead - where the content is expected to take more than 5 seconds for a big source. In places where it is small, I would not add it - since it has a small overhead(synchronization - ignorable if we are fetching data outside mem).

Another reason for not adding it the the data-source is that the progress tracker always log, so for things that are so small that we do not want to track them, the tracker will create noice.

Another way to identify places to add tracking is to look through the logs and look for gaps in the logging that is more than 5 sec. If found, it is likely that there is a big task in the gap that shuld be tracked.

}
}

Expand Down
Loading