Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

only three issuesevent action type #183

Open
bluecoco opened this issue Apr 9, 2018 · 8 comments
Open

only three issuesevent action type #183

bluecoco opened this issue Apr 9, 2018 · 8 comments

Comments

@bluecoco
Copy link

bluecoco commented Apr 9, 2018

On the api documentation page, https://developer.github.com/v3/activity/events/types/#issuesevent, it says the payload.action type for issues event can be one of "assigned", "unassigned", "labeled", "unlabeled", "opened", "edited", "milestoned", "demilestoned", "closed", or "reopened". But when we look at the past few years data, this value is only one of the three 'open', 'closed' and 'reopened'. Are the other action types not captured by the GHArchive? Thanks a lot!

@igrigorik
Copy link
Owner

Someone else flagged this to me too recently. Looks like the Events API may be surfacing a subset of issue transitions.. @annafil could you sanity check on your end?

@annafil
Copy link
Collaborator

annafil commented Apr 10, 2018

Yes, that's correct @igrigorik.

This is a limitation on the API side, not GHArchive. While the documentation for the API says these events are surfaced, they are not available in the /events endpoint that GHArchive reads from -- only through webhooks. Will ask for this to be clarified in the docs :)

@igrigorik
Copy link
Owner

@annafil would it be possible to ask in reverse, and add those events to the API? I've heard a few requests for this now.. :)

@annafil
Copy link
Collaborator

annafil commented Apr 10, 2018

If it helps, even though the /events stream doesn't include these types of events by default, they are currently available in a slightly different form from the API :)

Each issue event has a unique API URI, and contains the additional issue activity types above. As far as I can see historical information is still available for those events: e.g. this event from the very active rails/rails circa 2011 that also includes the 'assigned' event: https://api.github.com/repos/rails/rails/issues/411/events. It should therefore be possible to reconstruct activity for issues of particular interest if the repo and issue have not been deleted.

@igrigorik We could consider updating the crawler to fetch these related events whenever it encounters an issue, to attempt to preserve the historical data around issues better, but I defer to you on whether this is in scope for Archive :)

@bluecoco
Copy link
Author

Thank you both for your help!

@igrigorik
Copy link
Owner

@igrigorik We could consider updating the crawler to fetch these related events whenever it encounters an issue, to attempt to preserve the historical data around issues better, but I defer to you on whether this is in scope for Archive :)

How would you see that working? Trigger the extra fetch when an issue is "closed" to backfill? A couple of gotchas that come to mind

  • Presumably issues can be updated even after they're updated, right? We would still miss data.
  • Today the activity is logged into the archive when it is detected, so the fetched data would be "misaligned" with the rest, and backfilling into old gzip archives and BQ tables would add a ton of complexity.
  • We're already up against the API limit. More fetches might make us lose more activity data.

@bluecoco
Copy link
Author

Similarly, it seems like for pull request events, only 'opened', 'closed', and 'reopened' are captured, not others such as 'assigned', 'unassigned', is it also expected?

@annafil
Copy link
Collaborator

annafil commented Apr 20, 2018

@igrigorik Very good point about backfilling to the gzip archives and the added complexity. I agree with you that the API limit is a concern, and a big blocker to grabbing more of this data in some systematic way. I suspect one of the reasons these additional events are not available through the /events endpoint is because they're relatively higher in volume than open/closed/reopened events and would make it harder to keep up with the feed.

@bluecoco good question! I would expect a consistent set of events to be put out for both PRs and Issues, so this seems right to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants