TS Wikipedia Corpus

TS Wikipedia Corpus is composed from July 2013 dump of Turkish Wikipedia pages. The corpus includes 215,068 entries from Wikipedia. Wikipedia is a useful source as for building a general purpose corpus as it derives texts from various  subjects. This corpus presents 1,779,228 word types.

The source data had preprocessed in order to eliminate auto-generated empty entries at first. Then external URL, image, table and other non-text contents had deleted.

Like other corpora, TS Wikipedia Corpus has part of speech tagging and morphological annotation.

0
Million Tokens
0
Word Types

If you have registered to TS Corpus

Login Now

If you haven’t registered you can sign up now

Sign Up Now