Syopiansyah Jaya Putra, Muhamad Nur Gunawan, Ismail Khalil, Teddy Mantoro,
"Sentence Boundary Disambiguation for Indonesian Language"
: Proceedings of the 19th International Conference on Information Integration and Web-based Applications & Services (iiWAS 2017), 2017
Original Titel:
Sentence Boundary Disambiguation for Indonesian Language
Sprache des Titels:
Englisch
Original Buchtitel:
Proceedings of the 19th International Conference on Information Integration and Web-based Applications & Services (iiWAS 2017)
Original Kurzfassung:
Sentence boundary detection is essential for natural language processing (NLP). Sentence boundary detection in the Indonesian language has lots of problems, which includes punctuation, abbreviation, and character in the bracket. The disambiguation should be detected as sentence boundary. Thus the sentence boundary system can divide the sentences accurately. This study presents the development of a training dataset for the existing model to optimize supervised sentence boundary detection for the Indonesian language. Indonesian Translation of the Quran (ITQ) data set was used in this study by using the supervised method. The following is the process briefly: create the training data, apply sentence detection to separate sentences on ITQ, and calculate precision, recall, and F-measure. The result is quite promising, it gives as follows: Precision of 91.7%, Recall 81.6%, and F-Measure 86.4 %, respectively.