TweetS Corpus
In the last decade, social media became a part of our daily life. Furthermore, social media illustrates authentic language from wide range of people. Obviously, this diversity turns social media to a very valuable linguistic source. Moreover, unlike many other sources such as newspapers, magazines or books, social media does not pass through an editorial process.
TweetS Corpus uses a unique part of speech tag set for Turkish, including YY (misspelling), intAbbr (Internet Abbreviations), Emoticons (Smileys), intEmphasis (Internet Emphasis) and intSlang (Internet Slang). A list of internet slangs harvested from TweetS Corpus could be find by this link.
0
Million Tokens
0
1 Million TweetS