TS Abstract Corpus

This corpus samples academic writing from various disciplines. The data is presented by two major domains, social and physical  sciences and six genres, humanities&arts, medicine, natural sciences, politics&law&education, social sciences, technology&engineering. Also text-type classification includes 32 scientific disciplines that the data is formed by.

TS Abstract Corpus is specially a useful source for text genre classification studies. A list of frequency list for each discipline could be downloaded by this link.

The source data of this corpus is obtained from the dataset form for Turkish Labeled Text Corpus by Öztürk et. al.

Million Tokens

If you have registered to TS Corpus

Login Now

If you haven’t registered you can sign up now

Sign Up Now