A corpus is a collection of texts from written or spoken language. Generally, these texts are put together according to predefined criteria to fit intended aims. Building a corpus is a hard, tedious and time consuming task. The data should be processed and served carefully.

TS Corpus project started with the idea of “building an online available, part-of-speech tagged Turkish Corpus“, which didn’t exist then. In order to do this, we focused on existing NLP tools that were already out there; we enhanced some of these tools. When necessary we developed our own scripts or tools.

In 2011, we had published, the very first corpus, which was the first Turkish corpus  available online with part-of-speech and morphological tagging.
Since then we released 10 different corpora under our project with different aims and functionalities.
Today, TS Corpus is a world-wide well-known project which is used by scientist and researchers in scientific studies all around the world.

Million Tokens in 10 Corpora
Queries Users Ran and Counting More

If you have registered to TS Corpus

Login Now

If you haven’t registered you can sign up now

Sign Up Now

If you’re not familiar with corpora and CQP queries please visit our documentation pages for query tips.
You may also find quick answers to frequently asked questions from FAQ pages.