TS Corpus
A free and independent project that aims to build Turkish corpora, lexical tools, and linguistic datasets.
A central hub for Turkish corpora and linguistic research

The TS Corpus Project is a free and independent initiative dedicated to building Turkish language corpora,
developing natural language processing (NLP) tools, and compiling linguistic datasets.
The project began in 2011, and in March 2012, the first corpus was released —
marking a significant milestone as the first publicly available, part-of-speech-tagged Turkish online corpus.
Since then, the project has continued to grow, releasing new corpora, tools, and datasets.
Today, TS Corpus includes over 25 corpora comprising more than
1.8 billion tokens, sourced from a wide variety of domains such as
online newspapers, news, forums, social media, text-books, and academic texts.
All resources are provided openly, without restrictions, for academic study and research.
Users are free to run queries, save their results, and download datasets for their own analyses.
At its core, TS Corpus is guided by the belief that linguistic resources and knowledge should be shared freely.
For this reason, the project is built upon free software and continues to expand with contributions
to Turkish computational linguistics and language technology.
Corpora
Access diverse Turkish corpora across multiple genres, designed for linguistic research, computational analysis, and academic study.
LexiTR
LexiTR is a specialized platform offering advanced lexical tools built on large-scale Turkish corpora, designed to support linguistic research and analysis.
TS Tools
Explore tokenizers, frequency analyzers, and more — practical NLP tools created specifically for processing and studying Turkish.