Support for Other Languages #32

ykhatami · 2021-03-05T07:53:34Z

The LDC has the Web 1T 5-gram 10 European Languages published at https://catalog.ldc.upenn.edu/LDC2009T25

Is there any plan to support these languages? If not, can I jump in and contribute? Would it be enough to parse the above data and get the unigram/bigram counts?

grantjenks · 2021-03-05T15:24:36Z

No, I don’t have plans to ship those corpuses at this time. The linked datasets do not appear to redistributable for free. Under “View Fees”, the costs is $150 for non-members.

willwade · 2024-01-18T00:06:45Z

Not sure if this is of any use but this maybe handy for this task https://github.com/Poio-NLP/poio-corpus (they used it to build a prediction engine - pressagio).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Other Languages #32

Support for Other Languages #32

ykhatami commented Mar 5, 2021

grantjenks commented Mar 5, 2021

willwade commented Jan 18, 2024

Support for Other Languages #32

Support for Other Languages #32

Comments

ykhatami commented Mar 5, 2021

grantjenks commented Mar 5, 2021

willwade commented Jan 18, 2024