TS Corpus is a Free&Independent Project that aims to build Turkish corpora, NLP tools and linguistic datasets…

What is TS Corpus Project?

TS Corpus is a Free&Independent Project that aims building Turkish corpora, developing Natural Language Processing tools and compiling linguistic datasets. The project started in 2011 and in March 2012 the first corpus named TS Corpus Version 1 had published. Later in August 2012 the updated TS Corpus version 2 had released. This was the first online available, part of speech tagged Turkish corpus ever released.

Since then many other corpora, NLP tools and linguistic datasets had published. Please check relevant pages for further information.

The project is free for academic studies and researches. All the corpora and NLP tools published by the project are presented without any usage limitations. Users are free to run queries, save queries and download the hit sets to their computers. All the 14 published corpora serves a dataset of over 1.3 billion tokens derived from various sources; online newspapers, forums, social media, academic papers etc.

TS Corpus is a growing project. We strongly believe that, information and data should be shared freely. Therefore, TS Corpus Project is build upon free software.