NLP studies are hardly connected with the properties and requirements of the target language that each piece of software or script should be designed and coded according to fit the intends. Below, you’ll find a set of online NLP tool, that all are implemented to fit features of Turkish.
In natural language processing, texts should be prepared for further processing such as part-of-speech tagging or morphological parsing. TS Tokenizer is an enhanced tokenizer for Turkish.
Sentence segmentation is an important task for NLP studies. Our “sentencer” script is based on Python NLTK library. The given texts are processed for generating an XML output with incremental id’s for sentences and the tokens it includes.
TS Syllable Tagger, takes advantage of TRMorph for hyphenation then attaches a tag to the given word. The used tags are simply V, CV, VC, VCC, CVC and CVCC where V refers to vowel and C refers to consonant.