--- title: README emoji: 📊 colorFrom: green colorTo: red sdk: static pinned: false license: afl-3.0 --- Useful HF resources and fantastic contributors for Dutch NLP are ## Individuals * [Pieter Delobelle](https://huggingface.co/pdelobelle), [homepage](https://pieter.ai/) and [git](https://github.com/ipieter) * [Bram van Roy](https://huggingface.co/BramVanroy) and [homepage](https://bramvanroy.github.io/) * [Robin Smits](https://huggingface.co/robinsmits) and [git](https://github.com/robinsmits) * [Janneke van de Zwaan](https://huggingface.co/jvdzwaan/ocrpostcorrection-task-1) and [git](https://github.com/jvdzwaan) * [Yeb Havinga](https://huggingface.co/yhavinga) and [git](https://github.com/yhavinga) * [Wietse de Vries](https://huggingface.co/wietsedv) and [git](https://github.com/wietsedv) * [François Remy](https://huggingface.co/FremyCompany), [homepage](http://fremycompany.com) and [git](https://github.com/FremyCompany) * [Maarten Grootendorst](https://huggingface.co/MaartenGr), [homepage](https://www.maartengrootendorst.com/) and [git](https://github.com/MaartenGr) * [Piek Vossen](https://vossen.info/) and [git](https://github.com/piekvossen) * [Eva Rombouts](https://huggingface.co/ekrombouts) and [git](https://github.com/ekrombouts) * [Joeran Bosma](https://huggingface.co/joeranbosma/) and [git](https://github.com/joeranbosma) ## Organisations * [University Medical Center Utrecht](https://github.com/umcu) * [NLPtown](https://huggingface.co/nlptown) and [homepage](http://nlp.town/) * [doc2query](https://huggingface.co/doc2query) * [LT3, language and translation technology team, University of Gent](https://huggingface.co/LT3) and [homepage](https://lt3.ugent.be/) * [Textgain](https://huggingface.co/textgain) and [homepage](https://www.textgain.com/) * [ML6](https://huggingface.co/ml6team), [homepage](https://ml6.eu/) and [git](https://github.com/ml6team) * [CLiPS](https://huggingface.co/clips), [homepage](https://www.uantwerpen.be/en/research-groups/clips/) and [git](https://github.com/clips) * [DTAI Research Group, KU Leuven](https://huggingface.co/DTAI-KULeuven), [homepage](https://dtai.cs.kuleuven.be/) and [git](https://github.com/ML-KULeuven) * [GroNLP](https://huggingface.co/GroNLP), [homepage](https://www.rug.nl/research/clcg/research/cl/) * [CLTL](https://huggingface.co/CLTL), [homepage](http://cltl.nl) and [git](https://github.com/CLTL) * [Nederlands Forensic Institute](https://huggingface.co/NetherlandsForensicInstitute), [homepage](https://forensicinstitute.nl/) and [git](https://github.com/NetherlandsForensicInstitute) * [Integraal Kanker centrum Nederland (iKNL)](https://github.com/iknl) * [Erasmus Medical Informatics](https://github.com/mi-erasmusmc) ## NLP Libraries relevant for (Dutch) clinical NLP: * [Clinlp](https://github.com/umcu/clinlp) ## Encoder models * [*RobBERT 2023*](https://huggingface.co/DTAI-KULeuven/robbert-2023-dutch-base) * [*BERTje*](https://huggingface.co/GroNLP/bert-base-dutch-cased) * [*BelabBERT*](https://huggingface.co/jwouts/belabBERT_115k) * [**MedRoBERTa.nl**](https://huggingface.co/CLTL/MedRoBERTa.nl) * [**CardioBERTa.nl**](https://huggingface.co/UMCU/CardioBERTa.nl_clinical) * [**CardioDeBERTa.nl**](https://huggingface.co/UMCU/CardioDeBERTa.nl) * [**DRAGON-longformer-large-domain-specific**](https://huggingface.co/joeranbosma/dragon-longformer-large-domain-specific) * [**DRAGON-longformer-base-domain-specific**](https://huggingface.co/joeranbosma/dragon-longformer-base-domain-specific) * [**DRAGON-roberta-large-domain-specific**](https://huggingface.co/joeranbosma/dragon-roberta-large-domain-specific) * [**DRAGON-roberta-base-domain-specific**](https://huggingface.co/joeranbosma/dragon-roberta-base-domain-specific) * [**DRAGON-bert-base-domain-specific**](https://huggingface.co/joeranbosma/dragon-bert-base-domain-specific) ## Contrastive encoder models * [BioLord 2023-M Dutch](https://huggingface.co/FremyCompany/BioLORD-2023-M-Dutch-InContext-v1) ## Decoder models * [*GPT-2 on mC4*](https://huggingface.co/yhavinga/gpt2-large-dutch), [GPT-2 finetuned on Dutch](https://huggingface.co/GroNLP/gpt2-medium-dutch-embeddings) * [*GPT-neo on mC4*](https://huggingface.co/yhavinga/gpt-neo-1.3B-dutch) * [*GEITje (based on Mistral)*](https://github.com/Rijgersberg/GEITje) * [*Fietje (based on Phi-2)*](https://huggingface.co/BramVanroy/fietje-2), [**Zust_fietje**](https://huggingface.co/ekrombouts/zuster_fietje) * [**J1**](https://huggingface.co/Juvoly/J1-Llama-8B-exp) ## NTMs * [NLLB200](https://huggingface.co/facebook/nllb-200-3.3B) * [UL2, en-nl](https://huggingface.co/yhavinga/ul2-large-en-nl), [UL2, nl-en](https://huggingface.co/yhavinga/ul2-large-dutch-english) * [OPUS MT, en-nl](https://huggingface.co/Helsinki-NLP/opus-mt-en-nl), [OPUS MT, nl-en](https://huggingface.co/Helsinki-NLP/opus-mt-nl-en), [OPUS MT Healthcare, nl-en](https://huggingface.co/FremyCompany/opus-mt-nl-en-healthcare) * [Llama 2 MT, nl-en](https://huggingface.co/kaitchup/Llama-2-7b-mt-Dutch-to-English) ## Datasets * [SoNaR](https://taalmaterialen.ivdnt.org/download/tstc-sonar-corpus/) * [COW](https://rolandschaefer.net/archives/142) * [mc4 cleaned](https://huggingface.co/datasets/yhavinga/mc4_nl_cleaned) * [TWnC](https://research.utwente.nl/en/publications/twnc-a-multifaceted-dutch-news-corpus) * [Gigacorpus](http://gigacorpus.nl/) * [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX) * [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) * [FineWeb 2](https://github.com/huggingface/fineweb-2)