Get an overview of product news, blog posts, and events here.
Sort by

Language Identification and Language Chunking

Identifying the language of a given text is a crucial preprocessing step for almost all text analysis methods. It is considered as a solved problem since more than 20 years. Available solutions build on the simple observation that for all languages typical letter sequences (letter n-grams) exist, that occur significantly more frequent in this language than in other languages.

The difference between stemming and lemmatization

"Stemming" as well as "Lemmatization" are commonly used buzzwords in the field of Information Retrieval (IR), particularly in the development of powerful search engines. [...]

So what exactly is the difference between these two methods? What are the advantages and disadvantages and which one should be preferred? [...]

Approximative data structures for natural language processing

Some say software developers draw their motivation from minimizing or maximizing numbers in any given problem. That's a smug innuendo. From my experience, developers are always on the lookout for beautiful solutions, of which numbers are but a symptom. The usage of approximative data structures for language processing is one such example of a beautiful idea with nice numbers.

Press Area

Go here for our press releases and information for journalists.