You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there any plan to support these languages? If not, can I jump in and contribute? Would it be enough to parse the above data and get the unigram/bigram counts?
The text was updated successfully, but these errors were encountered:
No, I don’t have plans to ship those corpuses at this time. The linked datasets do not appear to redistributable for free. Under “View Fees”, the costs is $150 for non-members.
Not sure if this is of any use but this maybe handy for this task https://github.com/Poio-NLP/poio-corpus (they used it to build a prediction engine - pressagio).
The LDC has the Web 1T 5-gram 10 European Languages published at https://catalog.ldc.upenn.edu/LDC2009T25
Is there any plan to support these languages? If not, can I jump in and contribute? Would it be enough to parse the above data and get the unigram/bigram counts?
The text was updated successfully, but these errors were encountered: