Data mining on empty result queries

Lee, Mei Sin

Record link:

https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-28140
http://hdl.handle.net/20.500.12708/11084

Title:

Citation:

Lee, M. S. (2008). Data mining on empty result queries [Master Thesis, Technische Universität Wien]. reposiTUm. https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-28140

CatalogPlus:

AC05037622

Publication Type:

Thesis - Masterarbeit

Language:

English

Authors:

Lee, Mei Sin

Advisor:

Musliu, Nysret

Co-advisor:

Wei, Fang

Organisational Unit:

E184 - Institut für Informationssysteme

Date (published):

2008

Number of Pages:

Keywords:

Data Mining; Abfrage; Datenbank; Abfragen mit leerem Resultat; Schwachstellen; leere Kombinationen; leere Regionen.

Data Mining; query; database; empty result queries; holes; empty combinations; empty regions.

Abstract:

A database query could return an empty result. According to statistics, empty results are frequently encountered in query processing. Accordingly, one wishes to detect such a query from the beginning in the DBMS, before any real query evaluation is executed.<br />This will not only provide a quick answer, but it also reduces the load on a busy DBMS. Many data mining approaches deal with mining high density regions, or frequent data values. A complimentary approach is presented here, in which we are mining for combination of values or range of values that do not appear together, resulting in empty result queries. We focus our attention on mining not just simple two dimensional subspace, but also in multi-dimensional space. We are able to mine heterogeneous data values. Our goal is to find the maximal empty hyper-rectangle. Our method mines query selection criteria that returns empty results, without using any prior domain knowledge. Mined results can be used in a few potential applications in query processing. In the first application, queries that has selection criteria that matches the mined rules will surely be empty, returning an empty result. These queries are not processed to save execution. In the second application, these mined rules can be used in query optimization. It can also be used in detecting anomalies in query update. We study the experimental results obtained by applying our algorithm to both synthetic and real life datasets.

License:

In Copyright

Appears in Collections:

Thesis