Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input formats #1

Open
igorbrigadir opened this issue Apr 23, 2021 · 0 comments
Open

Input formats #1

igorbrigadir opened this issue Apr 23, 2021 · 0 comments

Comments

@igorbrigadir
Copy link
Collaborator

Input format should accept unflattened as well as flattened data, but also a list of IDs for basic counts (ids encode a timestamp so we can give back counts over time for a list of IDs only)

Ideally we should also support compressed files, and a directory of files. Eg: your dataset can be is 1 big xz compressed file with 1 tweet per line, or a directory of gz files with 1 file per hour or per day, etc.

It would also be nice to support v1.1 format tweets but this would be a lower priority than v2 formats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant