Data mining on empty result queries / Lee Mei Sin
VerfasserLee, Mei Sin
Begutachter / BegutachterinMusliu, Nysret ; Wei, Fang
UmfangIX, 67 Bl. : graph. Darst.
HochschulschriftWien, Techn. Univ., Mag.-Arb., 2008
Schlagwörter (DE)Data Mining, Abfrage, Datenbank, Abfragen mit leerem Resultat,Schwachstellen, leere Kombinationen, leere Regionen.
Schlagwörter (EN)Data Mining, query, database, empty result queries, holes, empty combinations, empty regions.
URNurn:nbn:at:at-ubtuw:1-28140 Persistent Identifier (URN)
 Das Werk ist frei verfügbar
Data mining on empty result queries [1.32 mb]
Zusammenfassung (Deutsch)

A database query could return an empty result. According to statistics, empty results are frequently encountered in query processing. Accordingly, one wishes to detect such a query from the beginning in the DBMS, before any real query evaluation is executed.

This will not only provide a quick answer, but it also reduces the load on a busy DBMS. Many data mining approaches deal with mining high density regions, or frequent data values. A complimentary approach is presented here, in which we are mining for combination of values or range of values that do not appear together, resulting in empty result queries. We focus our attention on mining not just simple two dimensional subspace, but also in multi-dimensional space. We are able to mine heterogeneous data values. Our goal is to find the maximal empty hyper-rectangle. Our method mines query selection criteria that returns empty results, without using any prior domain knowledge. Mined results can be used in a few potential applications in query processing. In the first application, queries that has selection criteria that matches the mined rules will surely be empty, returning an empty result. These queries are not processed to save execution. In the second application, these mined rules can be used in query optimization. It can also be used in detecting anomalies in query update. We study the experimental results obtained by applying our algorithm to both synthetic and real life datasets.