Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding dataset Kat-57 groundtruth dataset #173

Open
mirkh opened this issue Jan 20, 2025 · 0 comments
Open

Adding dataset Kat-57 groundtruth dataset #173

mirkh opened this issue Jan 20, 2025 · 0 comments

Comments

@mirkh
Copy link

mirkh commented Jan 20, 2025

Hello ! This is a dataset of transcribed card catalogue cards from Lund University Library.

Here is our dataset YAML file:

schema: https://htr-united.github.io/schema/2023-06-27/schema.json
title: Kat -57 ground truth dataset
url: https://zenodo.org/records/14679534
authors: []
institutions: []
description: >-
 Background


 Catalogue -1957 is an alphabetic library catalogue listing Lund University
 Library’s holdings up to 1957. A project has been ongoing to scan and
 transcribe the catalogue cards.


 About 10.000 cards were manually transcribed to create a ground truth dataset.


 From 2178 card drawers, one drawer for every letter in the alphabet was
 selected to transcribe, except for in the letter S, where two drawers were
 selected.


 The writing on the catalogue cards is a mix of typewriter and handwriting.
 There are more than 10 different hands.


 The cards were transcribed by a small team at the University Library.


 Dataset


 The set consists of PNG images with corresponding PAGE XML files. The
 transcriptions were made in eScriptorium.
language:
 - swe
 - deu
 - eng
 - fra
 - dan
 - nor
production-software: eScriptorium + Kraken
automatically-aligned: false
script:
 - iso: Latn
script-type: evenly-mixed
time:
 notBefore: '1880'
 notAfter: '1957'
hands:
 count: more-than-10
 precision: estimated
license:
 name: CC-BY 4.0
 url: https://creativecommons.org/licenses/by/4.0/
format: Page-XML
volume:
 - metric: pages
   count: 10000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant