Sentence Boundary Disambiguation for Indonesian Language
Sprache des Titels:
Proceedings of the 19th International Conference on Information Integration and Web-based Applications & Services (iiWAS 2017)
Sentence boundary detection is essential for natural language processing (NLP). Sentence boundary detection in the Indonesian language has lots of problems, which includes punctuation, abbreviation, and character in the bracket. The disambiguation should be detected as sentence boundary. Thus the sentence boundary system can divide the sentences accurately. This study presents the development of a training dataset for the existing model to optimize supervised sentence boundary detection for the Indonesian language. Indonesian Translation of the Quran (ITQ) data set was used in this study by using the supervised method. The following is the process briefly: create the training data, apply sentence detection to separate sentences on ITQ, and calculate precision, recall, and F-measure. The result is quite promising, it gives as follows: Precision of 91.7%, Recall 81.6%, and F-Measure 86.4 %, respectively.