homepage, github repo, point of contact
Briefly summarise the dataset, including what it's about, its recommended/intended use, how and why it was created and its size/scope
Any languages covered in dataset.
Are there any ML tasks this dataset would be especially suited for (e.g. classifcation or clustering)? Are there any limitations on its use for AI/ML?
JSON-formatted
List and describe the fields present in the dataset.
What need motivated the creation of this dataset?
Describe the source data (e.g. image titles, descriptions, tags, comments)
Describe data collection process, any data selection or filtering. If data was modified or normalised after collection, describe process and tools used.
Human or machine-generated. For human producers, use any self-reported demographic or identify information, but do not infer it.
This would include any rights, licenses or other obligations (e.g. if Local Contexts labels were applied by source communities)
Describe any data schemes used during establishment of the dataset.
Who or what are depiåcted in the dataset. If dataset depicts people, are any subgroups of people represented? Are specific individuals personally identifiable? Does this dataset pertain to a difficult history? Describe any precautions being taken. If any identity categories are used, describe where the information comes from.
Ideally written in collaboration with those responsible for the dataset. What are the positive impacts? What are the risks?
Describe specific biases that are likely to be reflected in the data, and state whether any steps were taken to reduce their impact.
List people involved in collecting the dataset. If funding information is known, include it here.
Licensing for dataset, with a link to the license webpage.
If dataset has a DOI, provide it here.
Any other contributors, including those who curated and published it.
Regularly updated / Actively Maintained / Limited maintenance / Deprecated