Subject of this diploma thesis is the robust estimation of the plausibility of the content of databases, as well as the development of novel algorithms for estimating missing values.
The estimation of the plausibility of a set of observations basically depends on the main structure which stands behind these data.
Observations which fit into this estimated structure seem more plausible, than observations with large distance to such structure estimates. For representing the structure of a data set principal components are used in this context. Since single observations which do not follow the main structure of a data set (outliers) should not influence such an estimation, robust methods are considered primarily in this context The estimation of missing values is based on principal component analysis as well. The values of missings are chosen in a way, such that they are located as good as possible on the principal components.
Iteratively principal components are estimated, and the observations are projected onto them until convergence of this process. In this context existing algorithms have been improved concerning the quality of imputation and runtime behavior. In particular this improvement focuses on the projection methods which are used to project observations containing missings onto principal components.
A package which implements this functionality for the statistical software R is being developed right now, and will appear soon.