Abstract

Aim: Presentation of a specialist text segmentation technique. The text was derived from reports (a form “Information about the event”, field “Information about the event - descriptive data”) prepared by rescue units of the State Fire Service after firefighting and rescue operations.

Methodology: In order to perform the task the author has proposed a method of designing the knowledge base and rules for a text segmentation tool. The proposed method is based on formal concept analysis (FCA). The knowledge base and rules designed by the proposed method allow performing the segmentation process of the available documentation. The correctness and effectiveness of the proposed method was verified by comparing its results with the other two solutions used for text segmentation.

Results: During the research and analysis rules and abbreviations that were present in the studied specialist texts were grouped and described. Thanks to the formal concepts analysis a hierarchy of detected rules and abbreviations was created. The extracted hierarchy constituted both a knowledge and rules base of tools for segmentation of the text. Numerical and comparative experiments on the author's solution with two other methods showed significantly better performance of the former. For example, the F-measure results obtained from the proposed method are 95.5% and are 7-8% better than the other two solutions.

Conclusions: The proposed method of design knowledge and rules base text segmentation tool enables the design and implementation of software with a small error divide the text into segments. The basic rule to detect the end of a sentence by the interpretation of the dots and additional characters as the end of the segment, in fact, especially in case of specialist texts, must be packaged with additional rules. These actions will significantly improve the quality of segmentation and reduce the error. For the construction and representation of such rules is suitable presented in the article, the formal concepts analysis. Knowledge engineering and additional experiments can enrich the created hierarchy by the new rules. The newly inserted knowledge can be easily applied to the currently established hierarchy thereby contributing to improving the segmentation of the text. Moreover, within the numerical experiment is made unique: a set of rules and abbreviations used in reports and set properly separated and labeled segments.

Keywords: formal concept analysis, FCA, project of knowledge database, segment extraction, text processing

Type of article: original scientific article