Combining ontologies and statistics for sensor data quality improvement

Solomakhina, Nina

doi:10.34726/hss.2014.23187

Record link:

https://doi.org/10.34726/hss.2014.23187
http://hdl.handle.net/20.500.12708/5866

Title:

Combining ontologies and statistics for sensor data quality improvement

Citation:

Solomakhina, N. (2014). Combining ontologies and statistics for sensor data quality improvement [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2014.23187

reposiTUm DOI:

10.34726/hss.2014.23187

CatalogPlus:

AC11623885

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Solomakhina, Nina

Advisor:

Eiter, Thomas

Organisational Unit:

E184 - Institut für Informationssysteme

Date (published):

2014

Number of Pages:

Keywords:

Abstract Argumentation; Computational Logic

Abstract:

In large industries usage of advanced technological methods and modern equipment comes with the problem of storing, interpreting and analyzing huge amount of information. Typical sources for this data include a myriad of sensors mounted at the industrial machinery, measuring qualities such as temperatures, movement and vibration, pressure, and many more. However, these sensors are complex technical devices, which means that they can fail and their readings can become unreliable, or -dirty-. Low quality data makes it hard to solve the original task of assessing system and process status and controlling the system behavior. So, data quality is one of the major challenges considering a rapid growth of information, fragmentation of information systems, incorrect data formatting and other issues. The aim of this thesis is to propose a novel approach to address data quality issues in industrial datasets, in particular, measurements of sensors mounted at power generation facilities. The most common approach to detect anomalies in data is the analysis by means of the statistical and machine learning techniques. However, analyzing data alone can not always give satisfactory results. For instance, suspicious sensor readings may not indicate at bad quality of data, but at an appliance functioning abnormality detected by this sensor. Therefore, we propose to use additional available information on the domain. The approach presented in this work brings together several well-known techniques, which come from the worlds of computational logic and statistics, improving the results of data quality assessment and improvement procedure. The application domain and the dependencies between its objects are represented as a knowledge-based model, while statistics identifies data anomalies, such as outlying or missing values, in sensor measurement data. In this work we represent domain knowledge in OWL ontology, which covers the topology of an industrial equipment and an information about measuring devices installed. Providing statistical computations with the additional information from the model allows to validate and improve the results. Thus, comparing and analyzing readings provided by sensors of the same type and mounted at the same component of an appliance helps to identify possibly damaged sensors, as well as to distinguish between data quality inconsistencies found in single sensor readings from anomalies in machinery functioning detected by other measuring devices. Based on the proposed approach a software demonstrator has been implemented and tested, proving that the usage of the additional information provided by the semantic model improves the results of statistical analysis.

Additional information:

Abweichender Titel laut Übersetzung der Verfasserin/des Verfassers
Zsfassung in dt. Sprache. - Literaturverz. S. 83 - 91

License:

In Copyright

Appears in Collections:

Thesis