Existing language models are either focused on English or multilingual, with reasonable performance in the bigger languages. However, many languages are not supported or perform a lot worse than when prompted in English. To address this, we are creating 23 new language models for all EU languages.
Tokens are what makes language models understand language. Each sentence gets split into tokens and then converted to embeddings. So we want good tokens that cover as much of a language's words as possible, with our limited vocabulary. Luckily there are many libraries, like SentencePiece. However, the configuration is not trivial to get decent results on noisy data.
A lot of NLP technologies are easy to use for beginners, but creating and deploying a chatbot is still a bit tricky. Let's make a Python CLI toolkit to quickly create a chatbot with a web-based user interface.
With the current state of transformer models for text and speech, I believe that there is an opportunity to make fully immersive language learning apps that can tailor their content to what the user wants to learn. In this series, I try to work out a demo using different NLP technologies.
With the current state of transformer models for text and speech, I believe that there is an opportunity to make fully immersive language learning apps that can tailor their content to what the user wants to learn. In this series, I try to work out a demo using different NLP technologies.
With the current state of transformer models for text and speech, I believe that there is an opportunity to make fully immersive language learning apps that can tailor their content to what the user wants to learn. In this series, I try to work out a demo using different NLP technologies.
The AdamW implementation from HuggingFace is deprecated and can even lead to errors. This short blog post suggests a drop-in replacement.
In this second blogpost on updating RobBERT, I discuss the training and analyse how well the model performs for old benchmarks and new tasks.
A few things happened since our Dutch language model RobBERT was trained in 2019. In this blogpost, I explore how to update RobBERT efficiently to include these new words and shifting word usages.