A corpus is a collection of texts from written or spoken language. Generally, these texts are put together according to predefined criteria to fit intended aims. Building a corpus is a hard, tedious and time consuming task. The data should be processed and served carefully.
TS Corpus project started with the idea of “building an online available, part-of-speech tagged Turkish Corpus“, which didn’t exist then. In order to do this, we focused on existing NLP tools that were already out there; we enhanced some of these tools. When necessary we developed our own scripts or tools.
In 2011, we had published, the very first corpus, which was the first Turkish corpus available online with part-of-speech and morphological tagging.
Since then we released 10 different corpora under our project with different aims and functionalities.
Today, TS Corpus is a world-wide well-known project which is used by scientist and researchers in scientific studies all around the world.