-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing Close Events on Some Issues #189
Comments
Thanks for the detailed detective work! Based on the fact that there are some issues that are being closed, I'm inclined to think that perhaps we may be missing/dropping some activities through our polling mechanism. That said.. @annafil are you aware of any gotchas with close events, or can you think of any other reasons? |
@sehtia Thanks for the sleuthing you've done on this so far! Looking at the API output for this particular issue: https://api.github.com/repos/GoogleContainerTools/jib/issues/268/events there seems to be a closed event that should have registered. That lends support to @igrigorik's theory that this might be due to dropping events because of polling issues. Depending on what you're using the data for, and how much you need, there may be a workaround. Even if GHArchive is dropping some events from the /events API endpoint, the API actually provides you all the events for a given issue in a given repo if you fetch directly (like in the above example link). If you know the project, or set of projects you are looking for, you can manually go through the issues for a project via the API and grab all associated issue events (including some that are not sent by default via /events, like 'assigned' and 'labelled'). There is no limit on the history of those events, unlike the /events endpoint which only goes back 300 events per repo. If you need to do this for a large number of projects you'll probably run into rate limit issues, but you can use the built support mechanisms to help you throttle your requests to keep up with the rate limit. |
@igrigorik @annafil Ah yes, that makes a lot more sense. @annafil My goal was to get all the currently open issues for a specific set of repositories I was interested in for tracking/analysis purposes. I'm now attempting to solve this goal by mostly following your advice, specifically by making a connection to the specific repo API (api.github.com/repos/user/repoName/issues&status=open) and working from there. However, I'm open to suggestions if you recommend a different approach for my need and/or guides to throttle the rate limiting as it doesn't seem this will scale for a large number of repos. Also, @igrigorik is there any documentation where I can read about the polling mechanism/dropped events used in GHArchive to further understand the issue (out of curiosity)? |
@sehtia You can check out https://developer.github.com/v3/#rate-limiting for advice on working with REST API rate limits. If you can say a little more about what you're tracking/analyzing, and perhaps how many repos you estimate to poll, I can point you in a more specific direction :) |
@annafil @igrigorik I'm also experiencing issues with PRs that are missing events for "action=closed". For example: In my investigation, I found this to be the case for 2509 PRs. Here are some examples:
But, unlike "issues", I can't get the past events for PRs. This seems to be a bug that is happening recurrently and to this day. Are there any plans to resolve it? :) |
FYI #275 may at least explain why missed events haven't been identified/logged by the crawler, even if it doesn't actually solve the problem |
Hello!
I'm attempting to understand why some issues show up with an 'open' state whereas they are closed on GitHub. For example, in the image below.
As far as I understand, it seems that when certain issues are closed by a PullRequest merge using the keywords in the body, the issue is closed on the GitHub side, but no 'close' IssueEvent is created so to GH Archive that issue remains in its open state. However, there does seem to be a close record for some issues that were closed by a PullRequest, as shown below,
so that wouldn't entirely make sense. Would you be able to shed any light on this as I'm quite confused? Thank You!
The text was updated successfully, but these errors were encountered: