NLLU Datasets

GitHub | Contact

Paracrawl English 15M

15 million English sentences randomly sampled from Paracrawl, translated using NLLB 3.3B and filtered.

Target Language Sentences Link
Italian ~14M Download
Dutch ~14M Download

Want more languages added to this list? Get in touch

License

The source text comes from Paracrawl (https://paracrawl.eu/).
We do not own any of the source text from which this data has been translated.
We license the translated text and packaging of this parallel data under the Creative Commons Attribution 4.0 International (CC BY 4.0). Please cite "LibreTranslate" if you use the translated data.

Last Updated

September 2023