Semi-automatic processing of unstructured short text in maintenance records

2018 MPE project – Yiyang Gao

The short text entered into the Computerized Maintenance Management System (CMMS) by maintainers contains valuable information relating to the items maintained, maintenance action performed, and the condition of the items observed. This information in maintenance records is of critical importance to failure pattern identification, maintenance strategy improvement, or equipment reliability analysis. However, this textual information is unstructured, error-prone and domain-specific due to its human-generated nature. These contextual challenges have rendered information extraction a particularly challenging task.
This paper proposes a semi-automatic rule-based pipeline to identify the three key elements, action, item and item condition within the short text of maintenance records. A dataset of work order records from Heavy Mobile Equipment (HME) is used. This pipeline makes use of Machine Learning technique Word2Vec, Statistical Natural Language Processing (NLP) Technique Bigram detection, and human domain knowledge.
Over 600,000 HME maintenance records were used to develop this approach. A test set of 360 records were randomly selected and tagged. Jaccard index was utilized for similarity measurement between the pipeline output and manual tagging result. It was found that the proposed pipeline is able to achieve an average Jaccard index of 60% over the 360 selected records compared to a 12.8% conventional NLP approach.