Skip to content

Corpus para traducción del habla euskera-castellano

Notifications You must be signed in to change notification settings

Vicomtech/mintzai-ST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

mintzai-ST: A Parallel Corpus for Basque-Spanish Speech Translation

This repository contains the mintzai-ST corpus.

Table of Contents

  1. Description
  2. Citation
  3. License
  4. Contact

Description

The mintzai-ST corpus is a Basque-Spanish speech translation parallel corpus, based on the proceedings of the parliamentary session of the Basque Government between 2011 and 2018.

The corpus consists of audio files, transcriptions and translations, which may be used to train end-to-end or cascaded speech translation systems for Basque-Spanish in both directions.

The corpus can be downloaded via the following link: https://datasets.vicomtech.org/v2-mintzai-st/mintzai-st-corpus_v1.0.tar.gz

Please note that the file is 25GB and downloading may take some time.

Citation

If you use any part of the corpus in your own work, please cite the following paper:

@inproceedings{etchegoyhen-et-al2021mintzai-st,
  title={mintzai-ST: Corpus and Baselines for Basque-Spanish Speech Translation},
  author={Etchegoyhen, Thierry and Arzelus, Haritz and Gete Ugarte, Harritxu and
  Alvarez, Aitor and González-Docasal, Ander and Benites Fernandez, Edson}
  booktitle={Proceedings of IberSPEECH2020},
  location = {Valladolid, Spain}
  year={2021},
  pages = {TBD}
}

License

The mintzai-ST corpus is protected by copyright owned by Vicomtech:

Copyright (c) 2020 FUNDACION CENTRO DE TECNOLOGIAS DE INTERACCION VISUAL Y COMUNICACIONES VICOMTECH

The mintzai-ST corpus is distributed under the Creative Commons BY-NC-ND 4.0 license.
To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

Contact

If you have any question or suggestion, do not hesitate to contact us at the following addresses:

  • Thierry Etchegoyhen: tetchegoyhen [AT] vicomtech [DOT] org
  • Aitor Alvarez: aalvarez [AT] vicomtech [DOT] org

About

Corpus para traducción del habla euskera-castellano

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published