A corpus is a collection of texts from written or spoken language. Generally, these texts are put together according to predefined criteria to fit intended aims. Building a corpus is a hard, tedious and time consuming task. The data should be processed and served carefully.
TS Corpus project started with the idea of “building an online available, part-of-speech tagged Turkish Corpus“, which didn’t exist then. In order to do this, we focused on existing NLP tools that were already out there; we enhanced some of these tools. When necessary we developed our own scripts or tools.
In 2011, we had published, the very first corpus, which was the first Turkish corpus available online with part-of-speech and morphological tagging.
Since then we released 10 different corpora under our project with different aims and functionalities.
Today, TS Corpus is a world-wide well-known project which is used by scientist and researchers in scientific studies all around the world.
Corpora Released by TS Corpus Project
TS Corpus v2
TS TimeLine Corpus
TS Wikipedia Corpus
TweetS Corpus
Columns Corpus
Abstract Corpus
Syllable Corpus
TS Gezi Corpus
Constitution Corpus
Idioms&Proverbs Corpus
If you have registered to TS Corpus
If you haven’t registered you can sign up now
If you’re not familiar with corpora and CQP queries please visit our documentation pages for query tips.
You may also find quick answers to frequently asked questions from FAQ pages.